Schema.org: A New Approach to Structured Data for SEO

Schema.org

What will have a look to today it’s something that might turn really extremely useful when talkin’ bout’ SEO (Search Engine Optimization).
We’ll give a brief look at Schema, the result of Google, Yahoo! and Bing partnership that aims to define a common language and build a vocabulary for the creation of what we generally call Rich Snippets in Google language or SearchMonkey in a Yahoo! way
As you know HTML tags tell to the browser the way it should show the information in the tag.
Let’s take for example blue. It might refer both to someone feeling unhappy, or might refer to the color itself. And that’s exactly what causes confusion while search engines perform searches to display relevant content.

So how Schema can help me?

What Schema does is providing the right words at the right time & place by sharing tools (in this case vocabularies) that webmasters can use to mark up their pages so that search engines like Google, Bing, Yahoo can to understand your website content beyond simple HTML and text analysis. Schema.org markup has some highly specialized markups that will tell search engines that the content is actually talking about a “person,” “place,” “thing,” “movie,” “music” and so forth.

The potentials and the advantages here are huge:

  • benefits for searchers and publishers of quality content (“quality data” in this context) with the introduction of schema.org  since on a side 1)- they will be able to give even more detailed and complex material and on the other side 2)- the search engines will be able to give users much better answers to the most complex queries;
  • it will make it easier for search engines to identify what a site, or even a paragraph, is all about and search engines like Google will start showing “rich snippets” in their results (which are based on your markup);
  • the best results will lead to better clicks through rates and increased traffic;
  • since search engines like Google are creating search interfaces that rely on a particular type of content (like for example Google Recipe search), if the content the user will provide is marked-up properly, it will affect the searches made through Google;

But the vocabulary itself doesn’t do much without some support. And this comes from the microdata format which allows to embed machine readable data into HTML documents. Sound a bit ambiguous but it’s totally the opposite the way it sounds.

But…where Schema derived from?

To make a jump back in the past, the first language utilized was the one of “Microformats” which downside was the fact the dictionary was kinda limited. Then the RDF took over, RDF format which came from XHTML and in this last case the code needed a complex/hard synthax.

And that’s where the Microdata format makes its appearance, offering both the simplicity of the Microformats and the extensibility of XHTML format.

If a website it’s still using Microformats or RDF there will be no compatibility issue in terms of mark-up languages but the all the three search engines suggest to adopt the new ones. And to not mix up the languages.

Also important is that the myth bout’ microdata format languages not bein’ accepted from all engines due to the Rich Snippets it’s not true, actually it’s the opposite since with the launch of schema.org they’re more than accepted!.

Microdata was introduced with HTML5 and allows for the nesting of semantic information within existing HTML code. What this means is that you can embed some of the over 100 schemas provided in schema.org within the webpage code.  Semantic data and Schema.org can become useful tools in the future by search engines for analyzing web content, but at the moment Schema.org isn’t widely implemented as a standard by the webmaster community (as of 2011).

To make things easier…

here’s a short video that was realized the same day the Schema project was released and that briefly describes what i’ve tried to explain above:

How can i implement it on my website?

Firstly, let’s point out attention on one thing: if we plan to add Schema to our website and it’s fairly large in terms of content, the suggestion is to apply it only on the important stuff, to the content that stands out the most. If your website content can be split in the three different categories, let’s say mp3s content and searches, you would definitely implement it on the mp3s related one which is what will generate more searches results.

Let’s see for example an HTML structure for a website that wants to sell MP3s…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
<!DOCTYPE HTML>
<html>
<head>
<title>Believe Me Mp3 (no microdata)</title>
</head>
<body>
<h1>Believe Me</h1>
<b>Artist:</b> Jenny J<br />
<b>Genre:</b> Pop<br />
Listen here:<br /><br />
<embed src="flashplayer.swf" /><br /><br />
<b>Lyrics:</b><br />
<i>One day you will believe me,<br />
but it will be too late,<br />
cause it's part of my life, it's part of your fate.<br /></i><br />
<a href="buy_and_download.htm"><b>Download this song</b></a>
</body>
</html>

One thing you need to know about Schema is that it isn’t something like Meta Tags. It’s much different from that. Schema codes are inserted into div tags and h1 tags and span tags. The way the Schema code is integrated is such that the whole HTML code will not be affected.

Using the example above, right after the tag, you should put a  tag and inside a div you put something like that to define the exact type of content you’re  offering on your page, and this is done by using the ‘itemscope’ element:

1
<div itemscope itemtype="http://schema.org/MusicRecording">

To define artist and title of a song, you will use respectively the itemprop-name and itemprop-artist properties

1
2
<h1 itemprop="name">Believe Me</h1>
<b>Artist:</b> <span itemprop="byArtist">Jenny J</span><br />

As you can see, looks pretty clear and not complicated at all to spot the different sections that build up the code.

If lyrics are also part of  the website content, we can also “filter” them through Schema by using the element “description” and implement the tag “span” in this way:

1
2
3
<i><span itemprop="description">One day you will believe me,<br />
but it will be too late,<br />
cause it's part of my life, it's part of your fate.</span><br /></i><br />
1
<a href="buy_and_download.htm" itemprop="offers"><b>Download this song</b></a>

and of course

1
2
3
</div>
</body>
</html>

So the final structure would end up lookin’ like that:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
<!DOCTYPE HTML>
<html>
<head>
<title>Believe Me</title>
</head>
<body>
<div itemscope itemtype="http://schema.org/MusicRecording">
<h1 itemprop="name">Song #1 MP3</h1>
<b>Artist:</b> <span itemprop="byArtist">Jenny J</span><br />
<b>Genre:</b> <span itemprop="genre">Pop</span><br />
Listen here:<br /><br />
<embed src="flashplayer.swf" />
	<noscript>
		<a href="song1.mp3" itemprop="audio">Play this song</a>
	</noscript>
<br /><br />
<b>Lyrics:</b>
<br />
<i>
<span itemprop="description">One day you will believe me,<br />
but it will be too late,<br />
cause it's part of my life, it's part of your fate.</span><br /></i><br />
 
<a href="buy_and_download.htm" itemprop="offers"><b>Download this song</b></a>
</div>
</body>
</html>

Let’s see another example, this time involves both video and audio… This is the original snippet: Let’s use for example a portion of an HTML code and let’s turn into a code that can be easily “crunched”.. This is the original code snippet:

1
2
3
4
5
6
7
8
<h1>Foo Fighters</h1>
<h2>Video: Interview with the Foo Fighters</h2>
<object ...>
  <param ...>
  <embed type="application/x-shockwave-flash" ...>
</object>
Catch this exclusive interview with Dave Grohl and the Food Fighters
 about their new album, Rope.

Even this time, pretty simple as  structure. Now let’s see how it looks once applied the required changes

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<div itemscope itemtype="http://schema.org/MusicGroup">
 
<h1 itemprop="name">Foo Fighters</h1>
 
 
<div itemprop="video" itemscope itemtype="http://schema.org/VideoObject">
  <h2>Video: <span itemprop="name">Interview with the Foo Fighters</span></h2>
  <meta itemprop="duration" content="T1M33S" />
  <meta itemprop="thumbnail" content="foo-fighters-interview-thumb.jpg" />
  <object ...>
    <param ...>
    <embed type="application/x-shockwave-flash" ...>
  </object>
  <span itemprop="description">Catch this exclusive interview with
    Dave Grohl and the Food Fighters about their new album, Rope.</span>
</div>

The mark-ups are different but as u can see the steps followed are basically the same, plus some addition (duration, thumbnail,etc).

Let’s say that we have done the opportune changes on our content and that we wanna test if everything follows the rules as described on the Music Recording section on Schema.org

This can be done by putting the link to our page through this tool: http://www.google.com/webmasters/tools/richsnippets

Let’s make a quick test by opening the link above and clicking on the “Recipes” link on the page.

One of the results you might get, in case of an error, is  “Insufficient data  to generate the preview”  but this isn’t an important result in the test, what matters is the part under the “extracted rich snippet data from the page” note. Google correctly identified the type of content (recipe), and also obtained the correct values for the different item field properties.

For details about Schema.org and learning about Microdata, you can refer to the official Schema.org website.

There are some interesting sections here that are worth a read, in particular:

Getting started with schema.org: a brief overview on how to get things started

Full Type Hierarchy: this is the whole hierarchy. It’s already expanded so the entire structure its clearly visible

Data Model: if you wanna know more about what’s behind the Schema.org project have a look at this link.  And this is the specific argument we used in our examples

Documentation: as the name says

And don’t forget to have a look at the E-Commerce course that Web Courses Bangkok offers.

  • http://www.webcoursesbangkok.com Carl Heaton

    Thomas thanks for this post, so what is your advise for the Web Courses site, how should we use Schemas?

  • http://www.schemafeed.com Kai Chan

    Great article, we’re currently trying out an experimental project called schemafeed.com, which may or may not increase adoption. Semantic web is a tough nut to crack, maybe schema.org will work.