The FARNET States Inventory Project: Using Distributed Maintenance to Create and Maintain a Comprehensive Public Database of Subnational Information Infrastructure Planning and Development
Casey LIDE <firstname.lastname@example.org>
Much of the early discussion surrounding the creation of a US National Information Infrastructure (NII) occurred at a national level, with a focus on national issues and national participants. By the mid-1990s, however, it became increasingly clear that much of the work in building and regulating the NII would occur at a local and state level, with local and state actors. No forum existed to facilitate the efficient exchange of information among these subnational builders of the NII, and state policy makers had no reasonable opportunity to learn from and cooperate with their counterparts in other states.
Begun in 1995, the FARNET States Inventory Project is a three-year National Science Foundation (NSF)-funded initiative to provide a comprehensive, publicly accessible online resource for tracking information infrastructure development and strategies in each of the fifty states. With over 120 categories for each state, the scope of the Project (located at http://www.states.org) presented several opportunities for innovation, the most important of which was the implementation of a novel methodology for creating and maintaining a cost-effective, Web-oriented database: "distributed maintenance." A primary value of the Project is its attempt to test that concept; in that sense the Project is an experiment in information science.
With the "distributed maintenance" methodology, the States Inventory Project is an attempt to create a very valuable resource as well as a database architecture with great potential for cross-application (the most obvious being its usefulness for other nations developing information infrastructures).
This paper is a "lessons learned" report from the States Inventory Project Team. It has three areas of focus:
The FARNET/NSF States Inventory Project is based on three premises:
The States Inventory Project (located at http://www.states.org) fosters the development of the NII by providing a single, standardized clearinghouse for tracking state information infrastructure planning and activities. Through a publicly accessible, Web-oriented database, the States Inventory Project provides an opportunity for state policymakers, academics, and information technology professionals to (1) efficiently conduct comparative analyses across all of the states in any of the 100+ discrete categories in the Project database; (2) efficiently organize their own state's activity, and retrieve all of it at one time for a helpful "snapshot" of state progress and initiatives at any given time; and (3) control their portion of the database (their state) locally, through the States Inventory Project administrative interface (enabling ongoing "distributed maintenance" of the database by a large number of persons across the country).
This paper will first discuss FARNET's motivation to launch the States Inventory Project in 1995. It will move on to set forth exactly what the Project has accomplished, and how it has met the original needs for which it was created, through the implementation of a matrix design for the Web-oriented database. The "distributed maintenance" theory behind the Project will then be discussed in some detail (including lessons learned and a somewhat preliminary evaluation of the Project's viability). The paper will conclude with some of the broader lessons learned and suggestions for how certain aspects of the Project might be adopted in other contexts.
In the early 1990s, much of the debate concerning the development of information infrastructure in the United States occurred at a national level, with national actors and a national focus. By 1995, it became clear that while the national focus was certainly appropriate in many instances, the actual building of the National Information Infrastructure (NII) would occur largely at a state or local level (and, of course, the users of the NII are primarily state and local actors). Not only would the states themselves play a large role in promoting connectivity through major state-funded networking efforts, in many instances state regulatory bodies would be the primary regulatory force acting upon private-sector commercial networking entities.
In spite of this realization, it is no exaggeration to say that the states were proceeding to build their portions of the NII largely in a vacuum. Continuing with the usual, pre-Information Age model of insularity, the states had little opportunity or incentive to learn from the activities of other states, which were grappling with many of the same issues created by the explosion of information technology and networking.
The primary obstacle to learning from other states was the lack of a clearinghouse - or even a starting place - for finding state-level information. The cost to any single state for a comparative analysis of other states' information infrastructure environment for a given topic (distance learning, for example) would simply outweigh the benefits (which in any event would likely be restricted to the narrow purpose for which the study was undertaken, if the analysis had to be done from scratch each time). With the growth of the World Wide Web, this problem is mitigated only partially; while the information is increasingly available on-line, it tends to be presented in dissimilar formats, using inconsistent terminology and very different organization schemes (something of which the Project Team is well aware). This lack of standardization creates difficulty in both locating and assimilating the data. Clearly, the existence of the Web alone does not make comparative analysis economically feasible.
FARNET approached the National Science Foundation at the end of 1995 with a proposal for the creation of a "meta-resource" that might solve some of these problems, while simultaneously providing a mechanism for national and international actors to more easily follow the development of the NII in the United States. To help execute the Project, FARNET engaged two other organizations, which together with FARNET compose the States Inventory Project Team: ECLIPS (Electronic Commerce Law and Information Policy Strategies) is a policy organization housed at the Ohio Supercomputer Center. ECLIPS was responsible for the substantive research duties of the Project. As Research Director for the Project, Keith Harmon has directed ECLIPS' participation. The database architecture and graphical user interface were the responsibility of the Arizona State University (ASU) Web Development Team, specifically Izydor Gryko and Rob Kubasko.
While a number of persons have played a role in the development of this Project on behalf of FARNET, ECLIPS, and ASU, Keith, Izzy, Rob and the author comprise the core group, and have worked closely together (mostly via email) to bring the Project to fruition.
At the time of this writing, the States Inventory Project has roughly one year of development left. Even so, it is possible for us to make a number of fairly solid conclusions about what has been -- and what will be -- accomplished by the effort.
From the start, the States Inventory Project Team has sought to design the resource in a manner that maximizes the value of the data contained within it. An underlying theme of the Project -- and a major value of the effort--were the determination of how to best provide a means for the efficient extraction of knowledge from the mass of disparate data available on the Web. The National Science Foundation in 1997 embarked on a large-scale program it called "Knowledge and Distributed Intelligence" (KDI), the goal of which is to promote the development of new information technologies and applications that facilitate the discovery of knowledge from distributed datasets such as the Web. The States Inventory Project is, essentially, an early experiment in KDI.
As mentioned above, the Project has relied on a "matrix" model for its Web-oriented database, which adds another dimension to the knowledge that may be produced from a single set of data. Although a simple concept, the Project Team is aware of no other database design like it on the World Wide Web. The matrix design is easily conceptualized (although it is not presented this way on the site): At the top of the grid are the states, which govern the corresponding columns below. Along the side are the 100+ Project categories, organized into a hierarchy of 11 top-level categories (discussed further below). Each state has the same categories, and each category has a spot in each of the states. This allows data retrieval:
It is worth noting that much of the World Wide Web (and certainly the many collections-of-links) could be characterized as providing either vertical or horizontal accessibility, under the matrix concept. Vertically accessible databases would contain information encompassing several categories, but within one jurisdiction. State governments' presence on the World Wide Web is one general example. (As noted earlier, this approach is generally effective for monitoring activities in that jurisdiction, but totally inefficient for any sort of comparative analysis.) Horizontally accessible databases, on the other hand, would include information from several jurisdictions but would cover only one discrete category or topic (such as "telemedicine projects in the United States"). This approach is perfect for comparative analysis; it standardizes the information (by categorizing it as "telemedicine projects") and puts it all in one place.
The States Inventory tracks a number of categories, across a number of jurisdictions; it enables access both horizontally and vertically. Accordingly, it can be used (1) as an excellent tool for the monitoring of information infrastructure activities within a single state/jurisdiction and (2) as a tool for efficient comparative analysis across all of the states in any of the 100+ categories.
The Project tracks information infrastructure activities and strategic planning at a state granularity. The database is divided into over 100 categories, divided into the following top-level categories (a complete listing of Project categories is available in the Appendix):
An entry into the database consists of (1) a hypertext "headline" linked to a document on the World Wide Web, (2) a short abstract describing that document, including keywords associated with it (enabling preliminary review of the contents, and providing grist for our search engine), (3) a "last modified" date stamp (ensuring timeliness of content -- the contributor will automatically receive an e-mail when the entry reaches a certain age), and (4) the name of the contributor of the entry, linked to an on-site directory of registered contributors. A key feature of the States Inventory Project is the fact that all of this may be entered through an intuitive interface -- no knowledge of HTML coding is necessary -- by any person who registers as a contributor. This is in furtherance of the Project's role as an experiment in the theory of "distributed maintenance".
"Distributed maintenance" is a novel method for creating and maintaining a Web-based database. From the beginning of this Project, the Project Team realized that an attempt to create a comprehensive database of this sort would require a substantial effort, and that it would be very difficult to create and maintain with a centralized approach using centralized Project labor. At the outset, the Project Team hypothesized that the active participation of a substantial number of contributors could enable the creation and maintenance of a large, yet detailed, database with a minimum of centralized labor. This Project is an experiment in that concept, and in that sense is an experiment in information science.
After creating a Web presence and database structure in the matrix format explained above, the Project Team designed an administrative interface for use (potentially) by a large number of contributors across the country. Through this forms-based interface an entry can be made directly to the database, with very little or no moderation by the Project Team. No knowledge of HTML is required, (although it can be used for links and formatting within the entry itself).
This open methodology for adding to the database posed a number of issues relating to security and database integrity. We introduced several mechanisms designed to mitigate potential problems:
While not fail-safe, these mechanisms certainly have been successful thus far in maintaining the integrity of the database and the quality of its content.
Distributed maintenance, to be successful, requires the active participation of a substantial number of persons. The Project Team spent considerable time devising strategies and amassing contact information for the best potential contributors around the country. Over the course of the Project, we compiled a database of about 500 prospective contributors, selected from relevant state agencies, academics, other information technology professionals, and association members. In October 1997, we sent invitations (via e-mail) to approximately 100 carefully selected persons describing the Project, asking them to participate, and inviting them to help beta test our new contributor interface. After October, we periodically sent out similar mailings, and as of this writing we have completed the "affirmative contact" portion of our effort to solicit participation.
The response has been positive. We took special care to avoid any implication that we were "spamming," and received very few negative responses (mostly from persons who expressed concerns about the amount of time they had available to spend on the Project). Of the original 100, approximately 25 registered as contributors. Since October, our list of registered contributors has grown to over 75.
While we have had a good response in terms of registered contributors, the amount of content actually provided by these persons has been less than expected. The majority have not added any information at all. Some have added only one or two items. A few have done very well, providing a substantial amount of content for their state over the course of a few months. We were somewhat surprised with the disparity between the fairly large number of registered contributors and the relatively low degree of active participation (to this point).
While several factors may contribute to the current lack of active participation, there are a couple that we believe are dominant, based on communications with registered contributors. Fundamentally, there is a lack of incentive for contributors to spend any amount of time volunteering to provide content, despite the fact that it takes no more than five minutes to write and submit any one entry. In addition to the lack of an obvious incentive, contributors simply may not have understood what sort of content the Project hoped to include. The database was essentially an empty, untested architecture for a resource; it was not a useful resource yet.
In an attempt to counter both problems, the Project Team began a concentrated, centralized effort at the beginning of 1998 to provide "starter" content for the Project database. To a degree, this is an effort to test the hypothesis that the Project will be useful. We believe that the inclusion of more information gradually will prove its utility, allowing it to draw visitors based on usefulness rather than just curiosity. We expect at some point that the content will reach a critical mass of high utility, high traffic, and substantial word-of-mouth. Once that is realized, we believe that potential contributors will recognize the value in providing data for the resource, that they will then play a more active role in providing content and maintaining the database, and that the Project will, to some degree, "snowball" into a self-perpetuating, self-maintaining resource. If that occurs, then the distributed maintenance theory will have been proven to be a viable and very powerful tool for creating and maintaining a large and detailed database in a cost-effective manner.
A number of lessons have been learned during the course of the Project, some of which appear to be quite fundamental and should prove to be very valuable as similar knowledge extraction (or KDI) efforts are undertaken in the future. The experience has led us to a few broad conclusions. First, the database architecture and basic category structure of the Project could very easily be adapted for use in other jurisdictions (particularly for nations that have not yet developed their NII to the degree that the United States has). Second, the matrix format is powerful. Using a database architecture very similar to the Project, effective meta-resources could be created in any number of other contexts (whether or not the developers choose to rely on distributed maintenance).
In a sense, this Project was started too late. The United States had a head start in developing information infrastructure before this Project was launched, and we have been playing catch-up. For other nations that are just beginning (or have yet) to develop an NII, the Project could prove even more valuable.
Again, a key component of the Project is that it tracks at a "sub-national" level. The matrix design and category structure could be easily adapted to track information infrastructure in any jurisdiction that can be subdivided. Instead of states at the top of the matrix, it could be countries in sub-Saharan Africa. It could be prefectures within Japan, or it could even be cities within a European nation. However, perhaps more than any other nations, developing nations can capitalize on the efforts expended for design and implementation of this Project in the U.S.
It is also worth noting that distributed maintenance may be more effective in other environments. For the Project in the U.S., we approached persons with no prior contact and simply relied on volunteer time to help fill out an untested resource. The Project's application in other environments (1) may have a more available and willing core of distributed participation, perhaps not all volunteer; (2) will have the U.S. example to point to; and (3) will have the discrete lessons learned in the U.S. effort available (such as time investment, etc.).
The design of the Project lends itself to application in other substantive areas where a similar meta-resource might be appropriate. Again, the top of the matrix could include any number of jurisdictions. Along the side can be any categories, not just those relating to information infrastructure. Possible applications could include:
While the format lends itself to a centralized effort at locating and cataloging information, distributed maintenance conceivably could be used in other substantive areas as well (perhaps even more effectively than the State Inventory Project).
The Project Team envisions the further refinement of the Project, and has taken steps to gradually implement such technologies as Web robots, artificial intelligence, translators, and dedicated search engines in furtherance of the Project goal (comprehensiveness, with efficient and effective retrieval). We also envision the Project Team playing a role in the adaptation of the Project to other jurisdictions and to other substantive areas, as the need arises.