In this third part of the series I am going to concentrate less on the science of working with the technology of Schema.org and more on what you might call the art of extension.
It builds on the previous two posts The Bits and Pieces which introduces you to the mechanics of working with the Schema.org repository in GitHub and your own local version; and Working Within the Vocabulary which takes you through the anatomy of the major controlling files for the terms and their examples, that you find in the repository.
Art maybe an over ambitious word for the process that I am going to try and describe. However it is not about rules, required patterns, syntaxes, and file formats – the science; it is about general guidelines, emerging styles & practices, and what feels right. So art it is.
OK. You have read the previous posts in this series. You have said to yourself I only wish that I could describe [insert you favourite issue here] in Schema.org. You are now inspired to do something about it, or get together with a community of colleagues to address the usefulness of Schema.org for your area of interest. Then comes the inevitable question…
Where do I focus my efforts – the core vocabulary or a Hosted Extension or an External Extension?
Firstly a bit of background to help answer that question.
The core of the Schema.org vocabulary has evolved since its launch by Google, Bing, and Yahoo! (soon joined by Yandex), in June 2011. By the end of 2015 its term definitions had reached 642 types and 992 properties. They cover many many sectors commercial, and not, including sport, media, retail, libraries, local businesses, heath, audio, video, TV, movies, reviews, ratings, products, services, offers and actions. Its generic nature has facilitated is spread of adoption across well over 10 million sites. For more background I recommend the December 2015 article Schema.org: Evolution of Structured Data on the Web – Big data makes common schemas even more necessary. By Guha, Brickley and Macbeth.
That generic nature however does introduce issues for those in specific sectors wishing to focus in more detail on the entities and relationships specific to their domain whist still being part of, or closely related to, Schema.org. In the spring of 2015 an Extension Mechanism, consisting of Hosted and External extensions, was introduced to address this.
Reviewed/Hosted Extensions are domain focused extensions hosted on the Schema.org site. They will have been reviewed and discussed by the broad Schema.org community as to style, compatibility with the core vocabulary, and potential adoption. An extension is allocated its own part of the schema.org namespace – auto.schema.org & bib.schema.org being the first two examples.
External Extensions are created and hosted separate from Schema.org in their own namespace. Although related to and building upon [extending] the Schema.org vocabulary these extensions are not part of the vocabulary. I am editor for an early example of such an external extension BiblioGraph.net that predates the launch of the extension mechanism. Much more recently GS1 (The Global Language of Business) have published their External Extension – the GS1 Web Vocabulary at http://gs1.org/voc/.
An example of how gs1.org extends Schema.org can be seen from inspecting the class gs1:WearableProduct which is a subclass of gs1:Product which in turn is defined as an exact match to schema:Product. Looking at an example property of gs1:Product, gs1:brand we can see that it is defined as a subproperty of schema:brand. This demonstrates how Schema.org is foundational to GS1.org.
Choosing Where to Extend
This initially depends on what and how much you are wanting to extend.
If all you are thinking of is adding the odd property to an already existent type, or to add another type to the domain and/or range of a property, or improve the description of a type or property; you probably do not need to create an extension. Raise an issue, and after some thought and discussion, go for it – create the relevant code and associated Pull Request for the Schema.org Gihub repositiory.
More substantial extensions require a bit of thought.
When proposing extension to the Schema.org vocabulary the above-described structure provides the extender/developer with three options. Extend the core; propose a hosted extension; or develop an external extension. Potentially a proposal could result in a combination of all three.
For example a proposal could be for a new Type (class) to be added to the core, with few or no additional properties other than those inherited from its super type. In addition more domain focused properties, or subtypes, for that new type could be proposed as part of a hosted extension, and yet more very domain specific ones only being part of an external extension.
Although not an exact science, there are some basic principles behind such choices. These principles are based upon the broad context and use of Schema.org across the web, the consuming audience for the data that would be marked up; the domain specific knowledge of those that would do the marking up and reviewing the proposal; and the domain specific need for the proposed terms.
A decision as to if a proposed term should be in the core, hosted extension or external extension can be aided by the answers to some basic questions:
- Public or not public? Will the data that would be marked up using the term be normally shared on the web? Would you expect to find that information on a publicly accessible web page today?If the answer is not public, there is no point in proposing the term for the core or a hosted extension. It would be defined in an external extension.
- General or Specific? Is the level of information to be marked up, or the thing being described, of interest or relevant to non-domain specific consumers?If the answer is general, the term could be a candidate for a core term. For example Train could be considered as a potential new subtype of Vehicle to describe that mode of transport that is relevant for general travel discovery needs. Whereas SteamTrain and its associated specific properties about driving wheel configuration etc. would be more appropriate to a railway extension.
- Popularity? How many sites on the web would potentially be expected to make use of these term(s) How many webmasters would find them useful?If the answer is lots, you probably have a candidate for the core. If it is only a few hundred, especially if they would be all in a particular focus of interest, it would be more likely a candidate for a hosted extension. If it is a small number, it might be more appropriate in an external extension.
- Detailed or Technical? Is the information, or the detailed nature of proposed properties, too technical for general consumption?If yes, the term should be proposed for a hosted or external extension. In the train example above, the fact that a steam train is being referenced could be contained in the text based description property of a Train type. Whereas the type of steam engine configuration could be a defined value for a property in an external extension.
When defining and then proposing enhancements to the core of Schema, or for hosted extensions, there is a temptation to take an area of concern, analyse it in detail and then produce a fully complete proposal. Experience has demonstrated that it is beneficial to gain feedback on the use and adoption of terms before building upon them to extend and add more detailed capability.
Based on that experience the way of extending Schema.org should be by steps that build upon each other in stages. For example introducing a new subtype with few if any new specific properties. Initial implementers can use textual description properties to qualify its values in this initial form. In a later releases more specific properties can be proposed, their need being justified by the take-up, visibility, and use of the subtype on sites across the web.
Several screen-full’s and a few weeks ago, this started out as a simple post in an attempt to cover off some of the questions I am often asked about how Schema.org is structured, and how it can be made more appropriate for this project or that domain. Hopefully you find the distillation of my experience and my personal approach, across these three resulting posts on Evolving Schema.org in Practice, enlightening and helpful. Especially if you are considering proposing a change, enhancement or extension to Schema.org.
My association with Schema.org – applying the vocabulary; making personal proposals; chairing W3C Community groups (Schema Bib Extend, Schema Architypes, The Tourism Structured Web Data Community Group); participating in others (Schema Course extension Community Group, Sport Schema Community Group, Financial Industry Business Ontology Community Group, Schema.org Community Group); being editor of the BiblioGraph.net extension vocabulary; working with various organisations such as OCLC, Google’s Schema.org team, and the Financial Industry Business Ontology (FIBO); and preparing & presenting workshops & keynotes at general data and industry specific events – has taught me that there is much similarity between, on the surface disparate, industries and sectors when it comes to preparing structured data to be broadly shared and understood.
Often that similarity is hidden behind sector specific views, understanding, and issues in dealing with the open wide web of structured data where they are just another interested group, looking to benefit from shared recognition of common schemas by the major search engine organisations. But that is all part of the joy and challenge I relish when entering a new domain and meeting new interested and motivated people.
Of course enhancing, extending and evolving the Schema.org vocabulary is only one part of the story. Actually applying it for benefit to aid the discovery of your organisation, your resources and the web sites that reference them is the main goal for most.
I get the feeling that there maybe another blog post series I should be considering!