Posts Tagged ‘Identity’

Observational Identity

We argued previously that there is a need for a system of identity for Semantic Web Agents, particularly in the process of making judgements of trust.

Examining the requirements of a system of identity, we recognise that such a system cannot count on universal uptake among Semantic Web agents, and therefore it cannot require each agent to state an identity for itself. Additionally even if universal uptake could be relied upon, we cannot count on the honest and benevolent behaviour of every Semantic Web agent. Thus, as we briefly mentioned at the end of our previous post, a system of identity for the Semantic Web must be primarily built around observable characteristics as a measure of identity.

As an analogy; when surfing the Web you would not rely on a Website’s claim that it is your bank’s online portal, you would rely on the factors you can observe (such as the domain name and also the digital certificate) to inform your judgement. Digital certificates are especially important if you are connected to the Internet over an untrusted network connection.

Building on our earlier example of a rudimentary HTTP-based Semantic Web agent, suppose we request a URI from it, and receive some RDF in response. The data we collect about the identity of the agent may look something like the following:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
@prefix ex: <http://example.com/ont/>.

_:agent1
	rdf:type 	ex:HTTPAgent;
	ex:port  	80;
	ex:host  	"agent.example.com";
	ex:ip    	"10.0.0.1";
	ex:time  	"2010-04-14T14:37:37Z"^^xsd:dateTime.

Suppose at some later date we again communicate with the agent at the domain agent.example.com, and in the process observe that the DNS entry has changed, and the domain now refers to a new IP address. Do we then consider this to be the same agent which we have previous experience of? Further, is the information we have sufficient to make such a decision? Other attributes may influence the judgement of similarity if they significantly alter the behaviour of the agent, software version numbers or digital certificates, for example.

Returning to our analogy, if your browser stored the credentials for your bank’s online banking portal, you would specify very strict criteria, very similar to what we described above, to dictate which websites are permitted to see this information.

Below follows a second observation record, for an interaction with the same agent at a different IP address.

_:agent2
	rdf:type 	ex:HTTPAgent;
	ex:port  	80;
	ex:host  	"agent.example.com";
	ex:ip    	"10.0.0.2";
	ex:time  	"2010-04-14T14:37:37Z"^^xsd:dateTime.

It is possible to encode our criteria for equivalence using OWL (to some degree) such that a reasoner can identify that two agents are in fact the same entity. This involves declaring a class of all things which meet the criteria of being a particular agent such that those which meet the necessary and sufficient criteria may be considered the same.

Unfortunately the equivalence afforded by OWL causes the effective merging of the identifiers, such that, as below, the metadata from the two different requests becomes inseparable.

_:agent1
	owl:sameAs   	_:agent2;
	rdf:type 	ex:HTTPAgent;
	ex:port  	80;
	ex:host  	"agent.example.com";
	ex:ip    	"10.0.0.1";
	ex:ip    	"10.0.0.2";
	ex:time  	"2010-04-18T10:24:12Z"^^xsd:dateTime;
	ex:time  	"2010-04-14T14:37:37Z"^^xsd:dateTime.

The problem with this approach is not the use of OWL classification (though it is somewhat ill suited to this task), rather it is the result of a simplistic ontology design. We acknowledge that this crude example ontology has many flaws (the assumption that a HTTP agent operates on a sole port and network address, for example), however to fully satisfy our potential requirements we must adopt an event-based ontology design, as these observations are inherently temporal in nature.

Posted: April 19th, 2010
Categories: Research, Semantic Web
Tags: , , ,
Comments: No Comments.

Trust and identity on the Semantic Web

Open Data movements are gradually gaining traction; government transparency efforts in the US and the UK have begun to release data-sets, some of which are published in Linked Data form. As the range and variety of Semantic Web data publishers grows, it is increasingly important that we address the problem of trust.

Previously we discussed the challenges of a trust layer for the Semantic Web, and more recently, how we think these challenges should be faced. We are convinced that provenance and reputation information will be a crucial basis for Semantic Web trust decisions.

Reputation and provenance are by no means new subjects in the domain of Computer Science, both are grounded in substantial bodies of literature. Existing techniques will likely require some adaption in order to match the challenges of the Web of Linked Data.

Hartig and Zhao‘s provenance vocabulary for Linked Data does exactly this, taking existing provenance techniques in a Web-friendly direction, recognising the distinctions between data curation, publishing and access. To do similar for reputation mechanisms will not be prohibitively difficult, however there remains a missing piece of the technological puzzle: a system of identity.

A notion of identity is necessary for any judgement of trust in order to fully link together available information. The FOAF vocabulary gives us identifiers for people, and the FOAF+SSL proposals allow us to prove the ownership of (Web of Trust, or PKI style) digital certificates, however there is of yet no accepted means of identifying a Semantic Web software agent (e.g. a Webserver) beyond the foaf:Agent type.

In order to properly describe the identity of a Semantic Web agent we require more information than a single URI. For example, in the case of a HTTP-Based Semantic Web agent (a Webserver), metadata such as the hostname and network port is to some purposes integral to the identity of the agent. To avoid coining a new identity with every HTTP request we must have some criteria by which we judge that the other parties of different data exchanges are the same entity.

An important point to make here is that we cannot rely on declarative identities, that is we cannot count on universal uptake among Semantic Web agents of a vocabulary in which to assert identity. Thus an appropriate identity mechanism must consider both observational identities (identities coined by another agent based on its observations) and declarative identities.

Posted: April 9th, 2010
Categories: Research, Semantic Web
Tags: , , ,
Comments: 1 Comment.