Privacy, Confidentiality and Linked Data

Our first post on Linked Closed Data missed some important factors when we argued the inevitability of closed Linked Data publishing, namely — as the title of this post implies — privacy and confidentiality.

The need to provide access to sensitive data while maintaining confidentiality will be a major motivation for Closed Linked Data publishing. Rather than adopt a second format for publishing sensitive data, publishers will be keen to re-use existing Linked Data publishing infrastructure. The Linked Data community needs to converge on standards and develop implementations to support this as soon as possible.

Chris Gutteridge highlighted this in a post on the institutional data of the University of Southampton. For a university there are a number of uses a student might have for their own personal data, data which is confidential and thus cannot be published publicly. 

Chris points out that in this domain there are also some complicated issues regarding student sponsors which might arise if certain assesment data was available electronically. These issues are of course not specific to Linked Data publishing, but it is good to know that people give these issues thought.

 

Linked Data as an Economic Good

To consider how reveue models for Linked Data might work, it is helpful to consider how Linked Data fits into the classification as an economic good. To begin with we will consider the simplest case, Linked Open Data where the dataset has been declared public domain.

  • Non-rivalrous
    Information, and thus Linked Data, is a non-rival good; a good which can be enjoyed simultaniously by any number of consumers (ignoring technological limitations such as network bandwidth and processing power).
  • Durable
    Informational goods are also durable; one person's use of a piece of information does not expend that resource and subsequently prevent any others from using it.
  • Non-excludable
    A Linked Open Dataset which has no restriction on access is a non-excludable good; it is not possible to prevent people who have not paid for it from enjoying access to it.
  • Intangiable
    Information goods are generally all intangiable goods, good which are themselves not physical objects. Intangiable goods are commonly also nonrival and non-excludable goods.

Goods which are both non-rivalrous and non-excludable are classed as 'public goods' in economic terms. Goods which are both rival and excludable, which are the more common sort of good, are known as 'private goods'.

Public goods are understood to be difficult to charge for directly, as the non-excludability prevents payment for access revenue models. Indeed, economists believe that markets are neither a practical or efficient means of allocating pure public goods.

Naturally, producers and sellers of public goods have a vested interest in ensuring their continued income. Historically, technology and legislation have been the methods used to achieve this; attempting to make what was a public good into something which behaves more like a private good, by making it rival and/or excludable. Digital Rights Management software and copyright law are examples of these technololical and legal methods.

Alternatively, content holders may seek revenue through other means, to offset the impact of freeloading. Advertising is perhaps the most common method, whereby paid adverts are placed alongside or sometimes integrated with the content. Sponsorship is another method, where costs are covered by from investment from another party which does not seek advertising in return, for example, government funding.

This post elaborates on our arguments on the economic nature of Linked Data, from our paper on Linked Closed Data which we recently posted about.

Linked Closed Data

The use of Linked Open Data is becoming increasingly widespread, boosted by recent moves to increase government transparency and efficiency by publishing non-sensitive datasets for free online. There is now a large 'cloud' of interlinked datasets, as evidenced by efforts to catalogue and visualise the Web of Linked Data.

Content owners governments and research institutions are in a unique position; they have the means to invest in the creation of datasets, yet none of the financial pressures of private companies which require them to turn a profit from such investments. So far, all datasets published as Linked Data have been published for free, without access restrictions. However as Linked Data technology moves beyond the Research and Development stage, and is incorporated into commercial products and services, pressures to generate return on investments will increase. In the face of those pressures it is inevitable that some will seek to monetize Linked Data.

In response to these pressures we can expect to see the rise of Linked Closed Data, datasets which are linked in adherence to Linked Data principles, but to which access or some content is restricted to paying members. It may be possible to meet these financial pressures through other means, such as advertising, however we are sceptical of this (this will be the subject of a later post).

Linked Closed Data will not mean the end of the Web of open Data; closed datasets are unlikely to displace the free alternatives, as commercial datasets are sold on their quality and depth, something which free datasets do not generally assure. It will however enable a market for high quality Semantic data, which may benefit to both companies and consumers.

My colleagues and I recently submitted a paper discussing this subject to the Consuming Linked Data workshop (COLD2010), which unfortunately was not accepted. This post explores our ideas about Linked Closed Data from the paper.