Greg FITZPATRICK <firstname.lastname@example.org>
This paper describes the work and progress of a coordinated effort by members of the Swedish educational and cultural communities in implementing metadata for publishing event information on Internet sites. It is intended to shed light on the problems and rewards involved in developing metadata strategies and hopefully encourage others to follow suit. The impetus for SKI comes from the Swedish foundation, MediaNet, which began work with Dublin Core implementation as part of its efforts within the European Union (EU) educational projects -- EUC and EUN -- in 1997.
While attempting to apply Dublin Core markup to an Internet cultural almanac, MediaNet became aware of an abundance of sites, all with the admirable ambition of becoming a centralized, all-encompassing, and trustworthy database for cultural events. Most of these sites were counting on a large amount of voluntary submissions from producers and organizers to keep the calendars up-to-date.
In the absence of one definitive dominating database, an unlikely development on a free-market Web in a free-market society, the very proliferation of calendar sites was not only detrimental to each site's individual success but also confusing and inefficient for the consumer.
MediaNet reasoned that with the use of suitable metadata tags, the publishers of event information would never need to go further than their own home page to make their postings. Then, data harvesters primed with the same metadata could be easily built to scan these sites, parse out event information, and deliver it to consumers. We decided to attempt to create a consensus amongst the cultural and entertainment community for the adoption of an event metadata standard to be called SKI. Our original intention was to use Dublin Core for this task, but it soon became apparent that iCalendar (The Internet Calendaring and Scheduling Core Object Specification) being developed within the Internet Engineering Task Force was a much more suitable vehicle.
We created a classification scheme for desirable information with the help of the 6W interrogatives: What: event name, description, etc; Who: organizers, publishers, etc., of the event; When: the time; Where: the place; Why: promotional free text; and Who: attendance requirements, status. With very little adjustment and the help of optional X-fields, iCalendar gave us an ideal Metadata framework for all this information.
Since Why is to be free text; Who is itemized free text; and When, Where, and Who are already well provided for with standards, our main attention turned to the problem of What. We looked enviously at the museum sector with their SPECTRE but we found no existing thesaurus for the categorization of our events. We had the choice of creating our own thesaurus, which we knew would be a tremendously time-consuming and wearying struggle, or come up with an alternative.
The alternative was to create a living register of the naming conventions used by each SKI compliant site, open to all. This causes a bit of confusion for our target groups: The distinction between being a centralized database of all events and merely a registry of naming conventions takes some time to sink in.
After an initial period of intensive lobbying, SKI picked up support from the Swedish National Education Agency and our efforts are proceeding.
They could have been zany. The idea, that in 1895, there was a shred of a possibility, that the known recorded history, empirical knowledge, and general babble of mankind could be gathered and stored physically in a warehouse in Brussels, sorted out, itemized, and made accessible to inquiry?
But this is what Paul Otlet and Henri Marie LaFontaine believed and convinced others to believe, to the extent that they were given premises, staff, and financial resources to realize a universal library of knowledge, later to be known as the Institute International de Documentation.
I dream of meeting them. I would coyly, in the manner of people who know a good secret, say that I had a line on the future and that perhaps there was a glint of a chance that they weren't really completely nuts after all.
In this future, I tell them, mankind has devised a way of connecting hundreds of millions of terminals so that human knowledge and discourse can flow freely, almost effortlessly twixt all of them.
"Is this through wireless or television or the telephone?" asks Henri, who lived long enough to witness these inventions.
"No," I answered, "it is a new media, but this new media has proved so effective, that its methods are being adapted by the devices you asked about. Shortly, every point and axis of communication over distance will be connected in a sphere of unilateral connection."
"And are people using this wonderful invention for the advancement of civilization and glory of scientific research?" continued Henri.
I hesitate. I have a list in my desk with the 100 words most commonly used by knowledge seekers on this "wonderful invention." Words that are not exactly on the level he might have found inspiring. "Well," I hedged, "some of the time."
"And the archiving and indexing of the world's knowledge?" asked Paul, sweeping his gaze across the mountains of books, papers, and images heaped about him.
"Oh that," I smile, "it will all eventually fit into a space the size of a match box!" I saw that with this I was losing my credibility, "er -- the archives of your Mundaneum will do just fine."
"And is it manageable?" he asked curiously.
"Not yet," said I, "but we are working on it."
I don't know if Messrs. Otlet and LaFontaine considered themselves binarists. Binarists believe that all matter and processes, ideas, arguments, and concepts can be broken down and dissected into one of two possible states, true or false, one or nothing.
But, just as we cannot with the naked eye see the individual atoms that form the simplest object on the table before us, neither can we fathom the vast Milky Way of binary combinations constituting the most elementary form or most singular idea.
I am watching a game of soccer on the television, while at the same time my 8-year-old son is playing a game of soccer on his Nintendo64. The roar of spectators can be heard from both rooms, though his fans sound more realistic or at least more excited than mine do. The sophistication of his game is startling and for Twinkie, the pet rabbit hopping between our two rooms, the nearly infinite distinctions between my game and Jaime's are indistinguishable.
Jaime's game is of course quite binary, if we disregard the real-life activity that inspired it. It can be broken down, just as it is built up, into millions and millions of yes's and no's. Each one, the decision of young men consuming pizzas and Dr. Peppers in late-night sojourns in front of computer screens. My game, if we disregard the television process bringing it into the living room, has a binary structure zillions of times more complicated than my son's, but our perceptions are very similar, as are the emotional response and behavior they incite.
The future development of automated content negotiation and information retrieval lies somewhere in-between the two.
Science and a lot of other academic endeavor is concerned with the codification of knowledge, reducing entities and findings to, if not a pure binarian state, at least a level of complexity that can be quantified, coded, and transformed into formulae that permit meaningful equations. This organization of knowledge is the groundwork for the concurrence of knowledge in our information-rich society.
The progression of human knowledge is a dialectic process that implies the exchange of observation and thought across both time and distance. Standardized codification is a prerequisite of this process.
We are now approaching a point in human development, at which a wide-scale, might we even say universal, dialectic process is becoming a reality, with overwhelming speed, range, and effectiveness.
It took weeks for Galileo to answer a letter from Johannes Kepler asking for a better telescope to help confirm his heliocentric theories.
Confucius and Aristotle, who lived at the same time, did not know of each other's existence.
Charles Darwin and Alfred Wallace, both coming to similar conclusions as to the origin of our species, were separated in their correspondence by the month-long voyages of ships.
Today Kepler would be able to look through Galileo's telescope without leaving Prague, Confucius and Aristotle would be on the same mailing list, and Darwin and Wallace would write requests for comments together.
We are all aware, and sometimes complain about, that the extent of human knowledge we can access almost instantaneously has reached a so overwhelming mass that it has become necessary to enlist the help of machines in order to process it.
And though I have no doubts, that one day machines will be able to preen, glean, wean, and parse the human language with baffling precision, we are not quite there yet. Machines are more apt to wreck nice beaches than to actually understand what we humans are talking about. They try, but they need lots of help in the form of predetermined structure, context, and code.
If you ever get to earth, by the way, try not to miss the "Death of Distance [DOD]" exhibition. It is a pretty amazing show they are putting on down there, but catch it while you can. I doubt if it will be running much longer.
You see, on earth they had pretty much come around to doing everything in a system of markets, buying and selling, trading, and so forth. These markets bring efficiency and transparency to the value systems of supply and demand, raw materials, and added value.
The value systems have evolved in the earthling's struggle with gravity, friction, rareness, and frustratingly short life spans. A while back some jokers pulled out the rug on this by effectively eliminating the factor of distance. Since the cost of distances is one of the mainstays of value determinations, this has caused quite a bit of commotion.
Up until the DOD, the earth was pretty much a geobased affair. The societal superstructure was based on proximity. Effective communication and governance were matters of proximity. Even nature tends to behave similarly in proximate geographical areas.
The cost of distance has not been eliminated, but its temporary demise has disrupted the market's efficiency in placing a value on certain goods and services. There are two specific causes for this:
No matter how wild and wonderful the DOD might be for many of us, I would not suggest that you build a business based on it, because the DOD will surely die. The market will eat it up.
So a good question would be -- once things return to normal, what will distance be worth? Well, I for one hope that it will be very cheap, not least because the project I work with, and will eventually get around to telling you about, is rather dependent upon cheap distance.
Let's suppose that you had a product for sale in Melbourne. Your market area would be determined by cost ratios proportional to the distance extending from your point of distribution, obviously becoming pricier the longer you had to travel to deliver your product. This ratio could be distorted by artificialities, such as unified postage costs for all of Australia or subventions for wilderness settlers or whatever, but without distortion you would find more efficiency in proximity, in concentrated population areas, as most businesses do. Most of what we buy and do and read about is local. Nearness rules.
Normally you would concentrate your marketing efforts on those areas that you assessed as economically viable for the distribution of your product.
Out of the blue, you are told that the market in which you are promoting your product is no longer Melbourne and its immediate suburbs, but the world. You might laugh, especially if your product was, say, fresh pizza.
But if you were told that your market area was not determined by you, but by each individual consumer --
You might be intrigued, if you made good pizza.
In a worldwide network situation with near-instant communication across the globe, most of the shopping, dining, entertainment, and communication needs of citizens would still be proximity based. Nearness rules.
So what is the point of creating a globally accessible communications network that would be used for primarily local needs? The point is expressed in the three master's of business administration Es: economy, efficiency and effectiveness, the same terms used in the pizza hypothesis above.
I read a year ago about a metadata initiative for classified ads. It might seem very nonsensical to think that someone with a sewing machine for sale in Anchorage should be in the same marketplace as a lonely soul seeking companionship in Djakarta. What is the point? Is this effective, efficient, and economical?
The point must be that there is, after all, an incredibly small possibility that these two people should be aware of each other's propositions. If the slightest potential does exist, then we can calculate the cost of being in the same marketplace in relationship to the potential of a transaction. If the potential, for example, could be valued at 0.00002 cents and the cost at 0.00001 cents, than we have an economically sound model.
The fresh pizza analogy was chosen for its extreme geobased characteristics. An ever-increasing share of market goods are in the form of immaterial intellectual resources, much of which have no logical geographical home, such as a chemical formula, a piece of music, or a tract about astronomy.
The interconnection of networks and the interoperability of machines are not deficit endeavors. These are long-term money savers.
Even if distance were taxed, even reasonably unfairly taxed, it would not be the significant element in the cost equation for a global marketplace. The cost that matters above all is the cost of fumbling around -- people fumbling around.
We all know what this means through our own personal experiences. We have all fumbled around. We have all waited as our computers fumbled around. We have all watched our search engines fumble around. Who could desire a seamless global marketplace with all this fumbling going on?
Pick up your telephone and dial any other number than 0 or 1 and you will have just effortlessly eliminated almost a billion undesired connections. Isn't that precision?
Of course telephone numbers are peculiarly binarian structures.
How about postal addresses? A carefully addressed letter will usually get to the right person, just about anywhere in world. That is pretty precise.
What about Geos? How binary can you get? With longitudes/latitudes we can not only pinpoint my apartment in Stockholm, but we can zero in on Twinkie's cage.
In the beginning many, many years ago, there were cavemen and cavewomen who spoke a very crude form of English. When asking questions they had only one utterance at their disposal. They asked Whæ?
Whæ dis? Whæ dat guy? Whæ is da wife? And so forth. Eventually the whæ word was found insufficient and, driven by man's inquisitive needs, it reproduced itself and reared a family: What, Where, When, Why, Who, Which, and Whow. Whow eventually dropped the W for unknown reasons.
The whæ words are not the only way to ask a question in English, but the division of questioning achieved through the whæs is remarkably well suited to the division of answers -- in other words, a very interesting starting-off place for the structuring of information negotiation.
Think about where. Think about the amazing fact that every little nook and cranny on this earth and a whole bunch of places in space have a name or an address most of us would agree on.
Think about when. Of course there are ambiguities to the keeping of time, but a very industrious group of people within the Internet Engineering Task Force (IETF) have just rounded off a considerable effort to bring precision to when. Working with the ISO (International Organization for Standardization) 8601 standard for the representation of time and dates they have developed the iCalendar, the superstructure upon which The Swedish Calendar Initiative, SKI, is carried.
SKI evolved at a government financed educational foundation, MediaNet. As part of our assignment to promote information technology (IT)-based cultural resources, we constructed a Web-based calendar for musically related events.
It was a typical Tom Sawyer application. We set up a Web page coupled to a common gateway interface (CGI), in turn coupled to a database, and every arranger of anything musical from a rock concert to a study course in Tibetan larynx chanting was expected to come to us and paint our fence with their events.
This seemed like a killer app at its inception, but as we began to look around we realized that we were sharing this bright idea with about 24 other Tom Sawyer calendars all requiring the same attention of the arrangers.
There was the arranger's own calendar, the county's calendar, the tourist calendar, a folk music calendar, a ticket vendor calendar, a summer festival calendar, a cultural calendar, and Time-out calendars, many with the admirable ambition of becoming a centralized, all-encompassing consumer service. The results were a lot of work for arrangers who still had to continue with their traditional publication channels -- newspapers, radio, and TV -- as well as paste up the town with posters.
In the absence of one definitively dominating database, an unlikely development on a free-market Web in a free-market society, the very proliferation of calendars was not only detrimental to each site's individual success but also confusing and inefficient for all concerned, especially the consumer.
In the beginning of 1997 MediaNet joined a European Union Information Society Project Office working group known as the EUC, the prime task of which was to promote implementation of Dublin Core metadata on educational Web resources. For those who are not familiar with metadata or Dublin Core I have included references at the end of this paper; briefly, these are technologies for structuring documents in order to facilitate automated search and retrieval.
We began this work by marking up our Web-based music calendar with Dublin Core tags, with slight reluctance I might add, since no search engine at that time could care less about Dublin Core. Furthermore, it was evident that Dublin Core was lacking the essential tools for the flexible representation of when, like repeating events and time zones and so forth. Then one day we discovered vCards and vCalendars.
VCalendars, which from here on I will refer to as iCalendars, were much more suitable for marking up our music calendar than the undeveloped Dublin Core was. iCal gave us a structure for where and when and at least a place to put what. iCal was not meant for public events, but for private engagements, so we needed to add elements such as what sort of event, the rules for attending it, ticketing procedures, and so forth. iCal also gave us some other new possibilities, the most important being the iCal object, a neat little package that could be sent off and dropped on all sorts of calendars.
I suppose you are supposed to remember the actual day the lights went on, but I don't. I just remember the feeling, when we eventually stumbled upon the amazingly obvious conclusion, that if we could convince "everybody" (meaning, in this early stage, the arrangers of cultural events in Sweden) to tag their own sites with the SKI version of iCal, they would never need to go farther than their own home page to make these events public. Once the items were posted, data harvesters primed with the same SKI format could crawl the sites, parse out event information, and deliver it to end consumers.
SKI, the Swedish calendar initiative, was born.
We tried to orient ourselves to focus on the needs of the knowledge seeker. Most metadata being created in the academic world seems to be oriented toward the needs of the knowledge server. It is an easy trap to fall into: Blinded by our own environment, we take for granted the generality of what, for others, amounts to very exotic fish. We lure ourselves into thinking that the knowledge seeker shares our contextual reference points, when of course the knowledge seeker doesn't have a clue.
The academic world's immunity to the whims of the marketplace is probably the culprit. The knowledge server must begin with a well-considered understanding of his or her customers, the knowledge seekers.
This creates the optimal meeting point of seeker and server for the intended negotiation of content.
SKI's model of information structure is built upon the natural inquiries of an information consumer:
The event name and categories, such as theatre, sports, dining, shopping, etc.
is making it happen?
The organizers, publishers, other agents of an event and, very important, who owns it.
is it happening?
The time and date of singular events, repeating events, event clusters, and opening times.
is it happening?
Politically and culturally named places as well as postal addresses and Geos.
is it happening?
Promotional free text, descriptions, testimonials, reviews.
is it happening?
Attendance requirements and rules, practical details, handicap access and facilities, ticketing.
choices are happening?
Defines specific orientations, such as events for Catholics, children, Republicans, doctors, etc.
In keeping with our philosophy of consumer orientation, the first implementation of these elements, even before we had a database of events, was in the creation of a data harvester we called the WhaMachine.
The basic difference between a data harvester and a search engine is that the harvester is finely tuned to the data it is accessing. Today's search engines make inquiries in the manner of yesterday's cavemen. They ask whæ. A data harvester can afford to be much more precise.
This does not mean that a harvester is more sophisticated than a search engine. Search engines have developed very ingenious techniques to compensate for the clumsiness of whæ. Data harvesting simply implies a predefined structural arrangement, agreed upon by both the seeker and the server of knowledge. A what for a what, a when for a when, and so on.
Our WhaMachine allows questions to be asked with the seven whæ words and functions to its fullest extent only when the information server has sorted its information correspondingly.
Each whæ question is equipped with two parameters, weight and range.
Weight represents the importance of the outcome for each individual whæ element: for example, in searching for a what:sporting event in a where:town on a when:date. Weight lets us choose the importance of each criterion for the results of the search. How important is the location? Must the sport be football? Does it have to be on Saturday afternoon at 3:00 p.m.?
Range represents the scope of the search for each individual element.
In when, ranging from a split second to an afternoon to a weekend to a week, month, year, and so on. For where, beginning at a specific address and branching out in an ever-increasing geographical circumference.
Once the WhaMachine was built, we were sorely in need of data to try it out on. How were we going to convince tens of thousands of event publishers to go back to their HTML (hypertext markup language) pages and redo them to suit us?
Our goal was a distributed solution, where each individual organizer of an event would publish and own the event information on his or her own homepage, but we did not have the resources to accomplish this.
It is no accident that Sweden is the birthplace of SKI. Not only does Sweden have very good pipes and more computers and cellular phones per person than almost any place in the world, but it also has a very highly organized societal infrastructure. Every form of human endeavor has its clubhouse, in the form of national and regional organizations. Rather than attempt to approach individual event publicists, we concentrated our efforts on their national associations.
By February 1999, we had gathered together the National Tourist Board, the National Sports Agency, the National Cultural Endowment agency, and similar agencies for museums and libraries, dance and theater, municipal governments, education, broadcasting, and the handicapped, all of which were very conveniently located in Stockholm.
These organizations also gave us one solution to the problem of authenticating events, since their members in almost all cases had received some form of certification as bona fide arrangers.
Literally, all of these associations had, at the time of our contacting them, launched IT strategies for the benefit of their members, but almost without exception, none of them had considered doing it with anyone else.
In truth, few of these associations initially welcomed our overtures. All had some sort of Tom Sawyer application in the works, and SKI sounded like trouble or the implied criticism that they were going about their business in the wrong way.
The solution was to work with each organization's existing framework, trying to make sure that as much SKI criteria as practically possible would be implemented, in the fashion best-suited to each group's needs. SKI objects could in turn be generated from each of the organization's centralized databases and then harvested by the WhaMachine or any other harvester using SKI.
Once the SKI objects are created they can also be sent back to the original publishers to keep at their sites, acquainting them with the SKI methodology. For independent publishers without a central database we are creating a Tom Sawyer calendar together with Sunet, Sweden's largest online catalogue of Net resources in Sweden. Here event submission will automatically produce SKI objects.
Most of you reading this will have already surmised that if SKI were to actually succeed, there are a host of players further down the publication chain who might see it as threat to their business strategies if it arrived in any other form than their own proprietary solution.
True, but what all of the associations named above have in common is the responsibility to make their member's events visible without restrictions or financial gain, to the largest possible public.
The commercial payoff comes when the consumers actually attend the event.
Event-reporting will in increasing amounts be handled by actors who are just beginning their operations in the field. This includes the cellular phone and paging companies, digital TV and radio broadcasters, text TV and Net-connected printing stations at hotels and transportation facilities as well as a maze of digiphernalia yet unknown. And we have begun meetings with the newspaper and magazine publishers, arguing that SKI will liberate them from the costly and tedious work of compiling event calendars on an individual basis, freeing resources for the added-value applications of reviews, recommendations, criticism, and consumer reports.
Look back at the whæ elements and you will see that six of them are quite common to almost any event. Of course there are intricacies. Broadcasters and Netcasters have a rather complicated where status. Some objects are not events at all, but rather permanent installations that would rather express when in terms of opening times. Many activities do not need all seven elements in order to accurately publish, but when it comes to what, this is where SKI's members part company.
Our strategy has been to gather together whatever was out there in the way of naming conventions from each group as well as from the traditional sources, such as the yellow pages and the largest Web catalogues. We are also particularly on the lookout for any other international activity in this direction, such as the SPECTRUM thesaurus, a resource used by the museum community. Tips are very welcome.
It is our intention to create an event thesaurus that is concurrent with other languages, not least the languages spoken within the European Union, since we are members of that Union, which, by the way, sponsors our project. By creating a relational database for naming conventions, we will be of service to both publishers and data harvesters and eventually address the problems of synonyms, disparaging names, misspellings, and phonetic searches.
It probably has occurred to many of us that perhaps the English language itself will be the metadata data of the future. It seems to me a bit unfair for me to suggest this, since I speak English and most of the people of the earth don't. Why should I be so lucky? English may not, for that matter, be the most suitable language. But it is undeniably the lingua franca of the information society. For that reason we will use English as our pivot language, rather than try to develop a complete set of multilingual naming conventions for every language.
Somewhere in the middle of 1998, the working groups that SKI follows or participates in shifted their XML/RDF (extensible markup language/resource description framework) positions from "wait and see" to "look out -- it's coming." For us, XML/RDF represents a golden opportunity; as publishers begin to adapt this new standard it will be more natural for them to use SKI and we have kept a SKI DTD (data type definition) in development parallel to our attribute/object SKI model.
As for RDF, it is a godsend. The original concept for a SKI event was that it could be constructed by gluing together a group of objects that we called rCards. Each card contained a logical resource, such as a venue, an act, a producer, or a promoter. There could be cards for handicap access descriptions, food and souvenir concessions, or any other auxiliary resources that might be used more than once. This would greatly simplify the task of the publisher, especially in the posting of multiple or recurring events such as tours, festivals, or tournaments.
Eventually the administration and overview of these rCards could have become quite unwieldy, especially in regard to authentication and intellectual property rights (IPR) management. RDF gives us an international platform for the management of an extensive interrelated network of resources pertaining to each event and its publication.
Finally I would like to mention the work with the handicapped sector, who gave us the slogan. "What's good for the handicapped is good for everybody else." The metadata tags of SKI give us the possibility of automated voice reporting over telephones on demand as well as specially formatted text for the dyslectic.
SKI is open to all. On our Web (http://ski.finns.nu/) you can find application forms for the mailing list, which is in English as well as Swedish; prototypes of a SKI event generator; the WhaMachine; several naming conventions; and links to relevant sites.
We are planning working groups both within the IETF and CEN/ISSS (European Committee for Standardization/Information Society Standardization System), which interested parties are encouraged to join.
I started this paper with reference to two great humanitarians: One of them was honored with a Nobel peace prize, and the other was credited by many as the inventor of the hyperlink. Both believed in the eventual concurrence of the human knowledge base. I did not bring up their names just to tell a good story or merely to do them the homage they deserve. I did it because SKI, with our humble ambition of getting people to soccer matches, finding a good restaurant, or seeing a movie, share their vision in our own mundane and perhaps zany way.
You have to start somewhere. SKI is dedicated to Paul Otlet and Henri Marie LaFontaine.
I would like to thank Lisa Lippert of Microsoft who was the first person outside Sweden to react when the SKI concept was presented to the iCalendar mailing list. Her spontaneous reaction of joy and encouragement was more of an inspiration to us than she can imagine.
Also, sincere thanks to Frank Dawson of Lotus, who spent a great deal of time and words editing our proposals, pointing out our weaknesses and strengths, and letting us share in the work of his XML DTD draft of iCal. Both Lisa and Frank have generously offered to help in the authoring of a SKI draft for the IETF.
Thanks to Patrik Fältström of the IETF for his advice and encouragement to start a working group and to Johan Hjelm, visiting engineer at the W3C, for his advice and comments.
And thanks to the SKIteam: Patrik Jonasson, Niklas Hjelm, Johan Groth, Benny Regner, Jörgen Otterstål, Anders Widström, and Anders Arpteg.