Here is an example from DEV-MUC3-0666 showing how to use RDFa to embed OWL/RDF triples in a document. Use Inspect Element in your browser to see the underlying markup, and copy html into the RDFa Play tool to see the triples as a graph.
1. Assume we have an NLP process that recognizes proper names and noun phrases that describe entities. We can capture the results using HTML span elements to markup the parsed text. Note that one span can be nested inside another. This is useful as descriptions ofen relate entities (How many people are mentioned? How many were arrested?)
In this example, span elements have property attributes that are specified by RDFa. This doesn't make them usable as RDF (yet), but does add some level of "semantics", and can be used to support CSS styling:
ON 19 OCTOBER, THE POLICE ARRESTED COLOMBIANS RODRIGO DIZUNZA OCHOA, NEPHEW OF DRUG TRAFFICKER FABIO OCHOA, AND ALFREDO AGUILAR CASTRO, NEPHEW OF COLOMBIAN DRUG TRAFFICKER RODRIGUEZ GACHA, WHO HAD 10 KG OF PURE COCAINE IN THEIR POSSESSION.
2. Next, we can markup the actual entities, collecting together names and descriptions using yet more span elements. These have the RDFa typeof attribute denoting the entity type, and the surrounding div element has a vocab attribute to set the RDF schema to schema.org. Since this schema has an Action type, we can use that to markup the verb "arrest" as well.
The resulting markup is now RDFa (visualize the results in RDFa Play).
ON 19 OCTOBER, THE POLICE ARRESTED COLOMBIANS RODRIGO DIZUNZA OCHOA, NEPHEW OF DRUG TRAFFICKER FABIO OCHOA, AND ALFREDO AGUILAR CASTRO, NEPHEW OF COLOMBIAN DRUG TRAFFICKER RODRIGUEZ GACHA, WHO HAD 10 KG OF PURE COCAINE IN THEIR POSSESSION.
3. The RDFa entity nodes so far are unlabelled "blank" nodes. If we want to refer to them, we need to assign them URI's. The URI can be anything that is unique [TO DO: some best practice needed here]. However, if the entity can be properly identified, then there may be a well known, globally unique URI that can be used (such as a DBpedia resource URI).
ON 19 OCTOBER, THE POLICE ARRESTED COLOMBIANS RODRIGO DIZUNZA OCHOA, NEPHEW OF DRUG TRAFFICKER FABIO OCHOA, AND ALFREDO AGUILAR CASTRO, NEPHEW OF COLOMBIAN DRUG TRAFFICKER RODRIGUEZ GACHA, WHO HAD 10 KG OF PURE COCAINE IN THEIR POSSESSION.
4. URI's can be used to disambiguate entities, both within documents and between documents (using a global URI). For example, GONZALO RODRIGUEZ GACHA is mentioned earlier in DEV-MUC3-0666 - giving alternative names and descriptions that can be collected under the same URI:
MEXICO CITY (MEXICO), 10 NOV 89 (AFP) -- [TEXT] MEXICAN NARCOTICS POLICE HAVE ARRESTED COLOMBIAN CITIZEN JORGE HUMBERTO CHALARIA, KNOWN AS "EL NEGRO," THE REPRESENTATIVE IN MEXICO OF COLOMBIAN DRUG TRAFFICKER GONZALO RODRIGUEZ GACHA, OTHERWISE KNOWN AS "EL MEXICANO," THE TOP CHIEF OF THE MEDELLIN CARTEL THE ATTORNEY GENERAL'S OFFICE (PGR) MADE KNOWN TODAY.
...
ON 19 OCTOBER, THE POLICE ARRESTED COLOMBIANS RODRIGO DIZUNZA OCHOA, NEPHEW OF DRUG TRAFFICKER FABIO OCHOA, AND ALFREDO AGUILAR CASTRO, NEPHEW OF COLOMBIAN DRUG TRAFFICKER RODRIGUEZ GACHA, WHO HAD 10 KG OF PURE COCAINE IN THEIR POSSESSION.
5. The RDFa rel and rev attributes can be used to record relationships between the entities we've picked out so far. For example, we can link relevant entities to the schema.org Action.
ON 19 OCTOBER, THE POLICE ARRESTED COLOMBIANS RODRIGO DIZUNZA OCHOA, NEPHEW OF DRUG TRAFFICKER FABIO OCHOA, AND ALFREDO AGUILAR CASTRO, NEPHEW OF COLOMBIAN DRUG TRAFFICKER RODRIGUEZ GACHA, WHO HAD 10 KG OF PURE COCAINE IN THEIR POSSESSION.
There are modelling decisions to be made about the schema to use, and how far to go with markup. The base schema.org Thing type has name, description and alternateName properties. These may be enough to label and disambiguate named entities in text - leaving resolving to type and identifying relationships to further processing ("deeper" NLP that can capitalize on this markup perhaps). One example of such a choice is the mention "THE TOP CHIEF OF THE MEDELLIN CARTEL" at (4) above. As given, this is marked up as a Person "description" that happens to contain an Organization (which implies a relationship between the two entities anyway). It could easily be changed (editing a single attribute) to replace the Person "description" with a "member" relationship to the Organization. This simple change would lose the description, but the underlying span is still there, and gives a "link reason" for the relation. If desired, it would be possible to introduce the "member" relationship and still preserve the "description" - but this requires extra markup, and there comes a point where it all gets too complex...
Note that the ability to assign global URI's to marked up entities (or not) indicates entities in the text that are known (or "known unknowns" if not). This allows the "deeper NLP" to be focussed where it is needed, and guided by "knowledge" derived from linked data.