microformats2 & HTML5 -
The Evolution of Web Data

microformats2 & HTML5

The Next Evolutionary Step For Web Data

HTML5 microformats

CC-BY-3.0

etherpad.mozilla.org/
html5microformats

evolution of data on the web

  1. microformats today
  2. challenges & lessons
  3. microformats2 & HTML5 today

1. microformats today

structured data 2012:
~70% microformats

Web Data Commons pie chart of domains with structured data

simplicity & openness

public domain / CC0

18 translations!

18 translations on microformats wiki home page

why microformats?

  1. less work - re-usable class names
  2. site features/UI - Download hCard to Address Book Download hCalendar to Calenar
  3. cheap DRY API - compare to XML[1]/JSON
  4. search results Google Microsoft Bing Yandex
    Rich Snippet search result of a restaurant with rating
  5. sites consuming microformats
    Readability, Spinn3r, Foursquare

which microformats?

  • hAtom - Google Readability (hNews), numerous Spinn3r customers
  • hCard - Google Microsoft Bing Yandex Readability H2VX Foursquare Firefox Operator Microformats for Chrome
  • hCalendar - Google Microsoft Bing Yandex H2VX Firefox Operator Microformats for Chrome
  • hMedia - Google
  • hProduct - Google Microsoft Bing Yandex
  • hRecipe - Google Microsoft Bing Yandex Microformats for Chrome
  • hResume - Guardian Jobs, Madgex Labs library clients
  • hReview - Google Microsoft Bing Yandex Microformats for Chrome
  • hReview-aggregate - Google Microsoft Bing Microformats for Chrome
  • rel-me - Google, RelMeAuth/IndieAuth
  • rel-author - Google
  • rel-license - Google advanced search

2. challenges & lessons

8 years of alternative approaches

  • 2005-2009(?): StructuredBlogging
  • 2005-2011: Google Base schema
  • 2007-2011(?): Google Data API/Elements
  • 2009-2009(?): Yahoo et al CommonTag.org
  • 2009-2011(?): Google rdf.data-vocabulary.org
  • 2010-present Facebook OGP meta tags
  • 2011-present Google+MS(Y!) Schema.org
  • 2012-present Twitter Cards meta tags
  • 2012-present OpenMetadata.org

lessons learned

a. accessibility
humans first

accessibility - humans first

  • abbr title - must be human readable & listenable
    • <abbr title="2013-04-02">4/2</abbr>
  • alternative: value class pattern (VCP)
    • separate date & time (VCPDT)
      <span class="dtstart">
        <span class="value">2012-09-21</span> at
        <span class="value">15:25</span>
      </span>
    • or ...

accessibility - humans first

  • alternative: value class pattern (VCP)
    • empty span with value-title (VCPVT)
      <span class="latitude">38° 46' 9.692"
        <span class="value-title" title="38.769359"> </span>
      </span>
  • separate data should be the exception - DRY violation

b. class collisions
and losses

class collisions and losses

  • class="summary", "description"
  • site design updates remove & rewrite markup
  • answer: prefixed class names
    • h-*, p-*, u-*, dt-*, e-*
    • avoids collisions: "p-summary"
    • easier to recognize: "h-card"
    • enables generic parsing

c. too much markup

too much markup

<span class=vcard><span class=fn>Tantek Çelik</span></span>
<span class="vcard"><span class="fn n">
    <span class="given-name">Glenda</span>
    <span class="additional-name">Watson</span>
    <span class="family-name">Hyatt</span>
</span></span>
even bigger problem: microdata, RDFa (won't fit on a slide)
itemscope itemtype itemprop itemref itemid
vocab typeof property rel

microformats2: less markup

  • flat sets of properties. no subproperties.
  • common markup → common properties
    • <span class="h-card">Tantek Çelik</span>
      → name
    • <a class="h-card" href="http://tantek.com">Tantek Çelik</a>
      → url, name
    • <a class="h-card" href="http://tantek.com">
      <img src="IMG_0123.jpg" alt="Tantek Çelik"/></a>

      → url, photo, name

3. today: HTML5
& microformats2

a. HTML5 data tables

  • <table> <th id> <td headers>
  • Example: Unofficial XOXO Directory
  • One big <table> with 500+ <tr>s
  • One row of <th>s, the rest <td>s

HTML5 data tables: result

Google search results for XOXO directory show a table

b. HTML5 <time> & <data>

HTML5 new element: <time>

<time class="dt-start" datetime="2013-04-02 11:40:00">
  April 2nd 2013 at 11:40am</time>
or combine with value class pattern:
<span class="dt-start">
  <time class="value">2013-04-02</time> at
  <time class="value">11:40</time>
</span>
Trade-off: DRY vs. locale-specific datetimes

HTML5: <time> recent enhancements

  • year: © <time>2013</time>
  • year-month: (email list, blog archives)
    <time datetime="2013-04">April 2013</time>
  • month-day
    Birthdate: <time datetime="--03-11">03-11</time>
  • duration: <time>205s</time>
  • album length: <time datetime="42m 59s">42:59</time>

Thanks: microformats contributions to WHATWG

HTML5 new element: <data>

<data class="latitude"
      value="38.769359">38° 46' 9.692"
</data>
  • replaces value class pattern value-title

c. microformats2

microformats2 summary

  1. prefixed class names (h- p- u- dt- e-)
  2. flat sets of properties
  3. single class markup for common uses
  4. live documentation:
    microformats.org/wiki/microformats2

microformats2 implementations

microformats2 as API example

WebFWD.org - Mozilla Incubator

microformats2 as JSON read API

JSON read API WebFWD.org

microformats support coming to Firefox OS

Screenshots of FirefoxOS home screen and browser start page.

Q & A

Q & A FAQ

1. Which microformats should I use?

  1. a classic microformat on <body> for main page subject
  2. microformats2 for both main subject and nested data: JSON API
  3. optional site-specific link previews, consider:
    • OGP meta tags for Facebook
    • Twitter Cards meta tags for Twitter
    • Beware of duplicated invisible data drift!

2. How do I validate my microformats?

3. How do I get involved?

Thanks

Red panda (Firefox) Photo by Yortw

Tantek Çeliktantek.com@t
tantek.com/presentations/2013/04/microformats2