Why Nonprofits Should Care about Linked Data and the Semantic Web
Nonprofit organizations were noticeably absent from the MediaBistro Semantic Tech Event, held in Washington DC early this month. The steep admission cost may have had a lot to do with it. I suspect it was also due to the fact that the meaning and implications of the semantic web have not been understood by many people outside of a very small cluster who have embraced it since World Wide Web Founder Tim Berners-Lee announced at TED in 2009 his idea, (outlined years earlier in the journal Nature). He argued that it would be useful if the internet did more than hold unrelated documents--if it instead began to hold more data, and link related data together in a logical framework. He called it "linked data," and articulated the idea of the Semantic Web, an internet made up of linked data.
So here we are, two years later, and without the knowledge of most of the people who use the internet, the amount of data available, and the data that has been "linked" to other data, has literally exploded. The linked data cloud looks a like a growing brain, with increasing capacity to "reason"--that is, to make inferences about the information that is contained within it to create new knowledge. This is a big part of what some refer to as Web 3.0, the next stage of evolution for the internet.
This is possible because of a number of standardized languages and rules that make it possible to define relationships between data, regardless of where this information is held. Among them are: URIs (Uniform Resource Identifiers), which are unique identifiers for people, concepts, or things on the web (like a URL, but not just documents--the information contained within documents); RDF, (Resource Description Framework), a way to describe and link data so that machines can process it in a way humans can use; and our old familiar HTTP, the backbone of the internet. Given these resources, you can start linking data across platforms (sometimes called "mashing-up" data), dismantling data silos and making information more meaningful.
OK, Great. But what does this all mean for me?
It means that extremely precise searches for specific pieces of information will become the norm over the next few years, but that's not all:
- Together with the Open Government Partnership, linked data is helping increase transparency and inform civic engagement with governments all over the world.
- Facebook's open graph is one big semantic application that maps relationships between you, your friends, and content you like on the internet.
- FlipBoard is an iPad app/magazine that uses semantic search to customize news stories to match your preferences.
- In "Hack Days" such as this New York Times' event developers meet to speedily code apps that mash-up linked data to create tools that serve the public. Check out this app called HappyStance, which mashes up geographical data with "sentiment" data provided by users to report in real-time how people are feeling about NYC subways.
- Using basic geocoding, news from the city street or block that you live in is collected from all over the internet and presented to you as it is published--see Everyblock.com
- In the UK, linked data in health care will now allow health care professionals unparalleled access to "information about the journeys of patients through the care system and the outcomes of different treatments."
- The NoTube Project is applying linked terms to television in order to create apps that will enable users to interact more directly with television programs and with each other, part of the emerging "Social TV" movement. (Shameless plug alert: see how the GoodSpeaks Project plans to use SocialTV and linked data to link nonprofit video to news stories).
Wow, So What Should I Do to Take Part?
Just being aware of the possibilities linked data present is enough to inspire your organization to think of ways to use it. The first question to ask is, does your organization collect and publish data? If the answer to this question is yes, then finding a way to present your data on the internet as linked data is a good start. The next question to ask is, could your organization use open, linked data somehow within your publishing platform? If the answer is yes, there are a lot of ways to incorporate open, linked data into different Content Management Systems (CMS).
If you are a large health, media, or research institution, you are probably already investing in semantic applications and consultants (and if not you should be considering it).
If you don't have the need or budget for a team of ontologists, taxonomists, and developers who understand RDF, OWL, SPARQL, and other semantic languages, you might consider experimenting with the production and use of linked data by using some of these lower-end, but very useful, approaches:
Schema.org: A collaboration between Google, Microsoft, and Yahoo, Schema.org was introduced with HTML5 as a way to structure data within HTML documents so it could be better understood and used by major search engines. To use it, see Getting Started with Schema.org. If Drupal 7 is your content management system, you can use a "shortcut" to marking up your content with the Schema.org module. There is a plugin for Wordpress, too. And it looks as though a Joomla has one either released or in development. Efforts are being made to make schema.org more compatible with linked data schemes since microdata is less robust than RDF, and this is an ongoing, ever-evolving effort. In any case, by using microdata you are identifying and stucturing data within your content, and will do a lot to optimize your content for search. Shiv Kumar Ganesh has written a great summary of other web tools for Schema.org implementation.
Freebase: Owned by Google, Freebase is an open source linked database with almost 100 million named and linked "entities." Including your organization and any data you would like to share in this linked database means that anyone using the Freebase API to pull in linked information (such as BioVenturist, for example,an open source biotechnology research website), will automatically have access to the information you provide. The best thing about Freebase is that you can design your own schema--that is, you decide for yourself how the information you are providing relates to everything else in the database. Here is a primer to get started adding topics in Freebase. As an example, take a look at SpaceforGood's Freebase entry (a work in progress). Now, as for using Freebase linked data withing your CMS, if you are using Drupal, you can start to enter some Freebase topics directly into text fields using this module, and in Wordpress, this plugin will autodescribe tags using Freebase.
OpenCalais: Developed by Thompson Reuters, OpenCalais is an open-source application which automatically searches your content and creates rich semantic metadata that you can embed in your CMS. This can be used, in the words of Drupal expert Angie Byron "for things like assisting with SEO, getting better search results, creating an 'Other articles like this' block, pulling in external data from other sources that speak RDF, or whatever else you can imagine doing with this kind of information." For Drupal, use this module, If you use WordPress, Tagaroo works well, and you can even use this OpenCalais Plugin to automatically search your text to find related Freebase info/pictures and video content into you site.
Kasabi: Kasabi is a warehouse for linked datasets, including the CIA World Factbook, BBC Programs, and NASA spacecraft dataset. It has lots and lots of data in an easy to consume format, and plans to build APIs to allow developers access to this data for a variety of purposes. Drupal has a module in development.
If your organization produces a lot of linked data in RDF format, consider adding what you have Kasabi and to Sindice.org, an open-source linked data directory.
Semantics is a burgeoning, quickly-evolving field, and even as I write this new apps are being developed, new mash-ups created, new ontologies built, and new standards agreed upon. If there is something you'd like to share to help the nonprofit community keep up with these developments, please post it in the comments below! Nonprofit organizations have much to offer the world of open, linked data--most importantly the spirit of openness, fairness, and community that was, and hopefully will remain, the foundation of the World Wide Web.