Pedro CUENCA <firstname.lastname@example.org>
Vicente SOSA <email@example.com>
Julián ROMERO <firstname.lastname@example.org>
Israel HERNANZ <email@example.com>
Grupo Anaya, SA
This paper presents our experiences in the development and deployment of a standards-based uniform resource name (URN) resolver service. Although there is not yet an approved standard mechanism that client applications can use to resolve URNs, we have nevertheless been able to successfully implement and deploy a resolver that has been tested inside the controlled context of an intranet environment. This approach has enabled our company to enjoy the anticipated benefits of an URN infrastructure without having to wait a long time until the standardization process freezes and commercial solutions are made available. Based on this experience, this paper tries to communicate the message that the potential for URNs is tremendous, even if their use must presently be confined to the limits of corporate intranets.
The extensive use of URNs to represent complex document trees and Web-based applications has allowed us to identify a number of features and services that we feel are missing in the current set of proposed URN standards. Therefore, another important contribution of the paper is the discussion of some of these requirements, such as the need for a standard set of administrative services or the necessity of introducing levels of indirection in the definition of URNs. We hope that standardization bodies and interested parties can benefit from the insights gained by our experience and that some of these ideas can be taken into account in future initiatives.
Just from the very first stages of Web development, work on mechanisms to refer to Web resources has recognized the importance of fulfilling a flexible, general, and uniform addressing model [1,2,5]. Request for comments (RFC) 1630 , for example, dated June 1994, proposes "a unifying syntax for the expression of names and addresses of objects" and introduces the notion of universal (now "uniform") resource identifiers or URIs as the basic addressing abstraction. Because the use of several types of URIs was anticipated, RFC 1630 tries to define a common syntax for all of them, and goes as far as to distinguish both URLs (uniform resource locators) and URNs (uniform resource names) as honorable members of the URI family.
Recent revisions of that early work  and of the design principles on which the Web model rests continue to concede the same importance to addressing. As a matter of fact, the recognition that uniform resource characteristics (URCs) are also another type of URI  has helped raise the consideration of identifiers to one of the two fundamental abstractions of the so-called Web model . The other basic abstraction, the resource, is itself usually defined in terms of identifiers: A resource is basically anything a Web identifier can refer to, a definition that imposes no further restrictions on any aspect of resources.
Nowadays, URIs are considered to comprise three different types of identifiers: URLs, URNs, and URCs:
Even though work on URN standardization and the support of the urn: scheme reportedly started by 1993 , URLs are still the only addressing mechanism available in the Web. But referencing a resource by the physical location where it happens to be stored poses several questions about robustness and efficiency. Of course, the immediate mapping of files stored in servers to URLs was central in the rapid diffusion of the Web, as it eased the way for the hassle-free incorporation of lots of services and pieces of information. But the need for a symbolic, persistent, higher-level addressing mechanism is still a key feature that has to be realized for the Web to become a reliable medium.
The availability of a persistent naming scheme is one of the most important reasons that justify effort spent on URNs. Before proceeding any further, it is worth noting that the requirements for persistence cannot be guaranteed by technology alone, but by a social commitment of the organizations. Some organizations can, indeed, guarantee the persistence of certain URLs they maintain, such as the address of their own home pages. Even if that's the case, it is also important to realize that technological support for location- and even organization-independent names is a convenient goal to pursue, for it can pave the way for the blossom of identifiers that show a greater longevity than that of machine names and addresses.
Another important issue that must be taken into account goes beyond the technological provision of persistence to its perception by end-users. The explicit support of the urn: scheme will undoubtedly help users determine that those identifiers have received some type of endorsement about persistence and reliability.
From a more practical point of view, the indirect mapping that translates names to locations to resources brings about a number of valuable consequences that will improve the reliability and robustness of the Web. Because a single URN can map to multiple URLs, the decision as to which URL to use can be deferred until access to the resource is required. This way, if one of the URLs is not available at a certain time, another one can be chosen with no user intervention, thus making it completely straightforward to achieve high-availability systems. The implementation of an URN resolver can also perform a round-robin algorithm for the selection of equivalent URLs, which translates in a simple and scalable solution to load balancing.
Because of their commitment to persistence and lower volatility, URNs are better suited than URLs to become integrated with URCs and metadata systems. The I2C (URI to URC) service that some URN resolvers are expected to provide  will allow clients to retrieve information about a resource before it is accessed. This situation contrasts with the current use of URLs as the sole identifying mechanism, which requires the physical retrieval of the resource in order to find out simple data as encoding format, length, or content. From a Web computing perspective, URLs provide no guarantee of service, as the only way to know whether a resource is available is trying to fetch it.
Work on URN standardization is mostly being carried out by the IETF (Internet Engineering Task Force) . The main goal of the URN working group is the definition of a framework for the assignment and resolution of URNs.
The need for URNs and even the basic functional requirements they must meet were identified very early in the process of Web standardization [1,7]. Therefore, a general agreement has been achieved on basic technical areas such as URN syntax  or the set of resolution services that can be performed on URNs . Work on resolution has advanced following the lines of using DNS for resolver discovery [8,10] and HTTP (hypertext transfer protocol) as a communications protocol that is expected to be supported by most resolvers . Resolver discovery using DNS has been recently submitted to the Standards Track.
Although the technical guidelines have been set up in a reasonable way, the assignment procedures are still under discussion. Documents that specify URN namespace definition mechanisms or assignment procedures for the resolution of URIs using DNS are in draft status and evolving rapidly.
As pointed out in the preceding paragraphs, the basic technical requirements for URN resolution have already been agreed upon to a sufficient level of detail [9,11,12]. Although the use of DNS for resolver discovery is still preliminary, at this writing (February 1999) there is not yet a general-purpose standalone resolver implementation. The URN working group does provide a set of scripts that illustrate the implementation of resolution services, but they are an ad-hoc solution tailored to the needs of an experimental namespace for IETF documents .
A number of proposals try to reuse the existing infrastructure to provide URLs with a commitment to persistence. WIRE (W3 identifier resolution extensions)  and PURL (persistent uniform resource locator)  use the standard HTTP redirection mechanism to introduce levels of indirection in the URLs. The Handle system [17,18] employs a proprietary hdl: URI scheme that must be resolved by a browser plug-in or a special HTTP proxy server. Although WIRE is still an experimental specification, both PURL and the Handle system are supported by their respective organizations and can be used to create URLs that map to other URLs. The idea of persistence stems from the fact that there is an institutional commitment to maintain the newly created URLs, although the mapping is allowed to change. As valuable as these services are, they are nonetheless limited in scope. They are successful in isolating users from changes in the names of the machines where their documents are stored, but they don't support the urn: scheme and don't try to provide resolution services other than a simple indirection.
The rationale that led us to build a corporate URN resolver lies in the nature of our organization. Grupo Anaya is a publishing company that produces both printed media and interactive online applications, with a special focus on educational products. Text, illustrations, photographs, maps, and all sorts of material make up a huge database of content that must be revised very frequently to be adapted to the requirements of education regulations and to the specific features of the products that are designed. These resources, however, are not maintained in a central repository database; instead, they are stored, in an unstructured format, in small departmental servers controlled by the different editorial groups in the company. Instead of trying to enforce a single solution for the organization and storage of data, our company is always looking at new ways to improve the traditional workgroup approach of editors to their work. From this point of view, it was recognized that the Web model is the only existing infrastructure that, despite some inconveniences, is working now as a bona-fide extensible, heterogeneous, distributed repository. The current efforts on addressing issues and the integration of metadata were the results of ideas that could also be applied to our heterogeneous working groups. In this context, the use of URNs was seen as a first step that would allow our company to achieve globally unique identifiers for all of the resources we maintain. Later on, a metadata model would provide the means to classify, relate, and browse through resources, thus improving reuse and helping build a consistent corpus of material.
The use of URNs was also important from the point of view of interactive online applications. All too often, we had encountered the problem of designing and maintaining complex Web-based applications that integrate data from different sources, to the point that even the most insignificant typographical change had to undergo a complete software engineering life-cycle to make its way to the user interface pages. URNs would help minimize the impact of technology by making applications independent of the physical locations where the diverse data and services reside. By using URNs to hide protocols and addresses, the implementation of Web-based applications could be more easily broken up into small independent pieces. Every one of these pieces is assigned a public URN identifier that allows access from other components in the system. This gave rise to the design and implementation of a Web-based computing architecture that has improved the efficiency and reliability of our applications, although its description is beyond the scope of this paper.
To sum up, these general requirements called for a name resolver that had to be:
To be able to achieve the high levels of interoperability, extensibility, and flexibility required, the implementation is based on Internet standards and protocols. This way, HTTP is used as the communication protocol, which makes it easy for many types of clients to access the resolution services. The resolver itself is implemented as a Java  servlet , and thus can run on most Web servers and computing architectures.
The URN to URL mapping is stored in a lightweight directory access protocol (LDAP)  repository, which provides for maximum scalability and excellent performance, especially for read access. If LDAP is not available, the file system can also be used as a repository. And because the resolver sports a modular architecture, other repositories can easily be added as well.
The resolver is not restricted to a single namespace; on the other hand, it can be used to resolve several namespaces, given that certain conditions are met.
The syntax of the URN namespace used in our intranet conforms to the guidelines given in RFC 2141 , with the additional constrain that the namespace specific string (NSS) must begin with a slash. To date, anaya: has been used as the namespace identifier, but it will have to be changed to x-anaya to conform to the latest recommendations on experimental namespaces .
Because some of our URNs must be used to identify application code, there must be a way to encode arguments so that they can be adequately received and processed by the software components. The generic URI syntax  specifies that the question mark character can be used to signal the beginning of a scheme-specific query string. The URN syntax recommendation , however, states that the question mark is a reserved character, because discussion is needed before query strings can be standardized for URNs. To work around this situation, question marks and ampersand signs are allowed in the anaya: namespace, but they must appear in their escaped form (%3F and %26, respectively).
Apart from these technical details, the most important point is that query strings are recognized in the anaya: namespace. The name resolver must be able to identify the delimiters and look for the mapping of the base string (i.e., without the query string). Any arguments that appear in the original query are appended to the URL(s) on the fly. If, for example, a mapping exists from urn:anaya:/apps/search to http://www3.anaya.es/search, an I2L operation  on the string urn:anaya:/apps/search%3Fkey=value will return the following URL: http://www3.anaya.es/search?key=value.
Alternatives were contemplated before support for query strings was implemented in the name resolver. The most immediate solution is to have clients implement the corresponding logic, and use query strings only when working with URLs. Although this was cumbersome, the most important reason not to follow that approach was that URN aliases are supported by the resolver, which allows for the definition of very general services that can be referenced more conveniently. Consider, for example, the following URN definitions:
urn:anaya:/apps/search - http://www3.anaya.es/search urn:anaya:/apps/searchByAuthor - urn:anaya:/apps/search%3Fitem=author urn:anaya:/WorksOfCervantes - urn:anaya:/apps/searchByAuthor%3Fauthor=cervantes
When a client requests urn:anaya:/WorksOfCervantes, it resolves to the following URL:
However, there's no need for the user to know that urn:anaya:/WorksOfCervantes is related in any way to urn:anaya:/apps/search; in fact, the mapping of the former URN can be changed at any time in a completely transparent way. But for this feature to be possible, query strings must be supported by the resolver.
Although the examples in this paper show legible URNs that closely resemble the structure of hierarchical URLs, caution must be observed when those names are used. As a general rule, encoding any type of user-related information inside the URN string is discouraged because it limits its longevity. There are situations, however, when a legible encoding is indeed helpful.
Our approach is to use machine-generated names for those applications where user interface tools are available. That is the case for resources created by our editorial teams. On the other hand, we currently use human-readable URNs for application components, simply because lots of URN references appear inside the code and it would be otherwise impossible to follow. As the engineering team is much smaller than the editorial one, it is easier to identify general components that can be assigned legible names with a reasonable expectation for durability.
The most frequently used resolution services specified in RFC 2483  have been implemented:
At this moment, unimplemented resolution services are:
Currently, the I2C service is used only for a very specific purpose: the retrieval of dependencies among software components.
When a client requests the loading of a resource, its MIME (multipurpose Internet mail extension) type is checked to find out if it is code to be dynamically executed. If so, the I2C service is invoked, and that operation returns a list of dependencies that must be loaded before execution of the module. Although quantitative analyses have not been performed, this is an innovative use of URNs and URCs to solve some of the problems of distributed computing.
URCs are intended to be used for the description of resources from several points of view. This requires the specification of a suitable metadata model and architecture that is being currently worked out.
RFC 2169  provides some basic ideas about the formatting of resolver data when HTTP is the communications protocol in use. Thus, the new media type text/uri-list is given as an example of URI codification. Likewise, an HTML layout for representing URIs is also suggested. These formats, however, are very simple and, at the same time, difficult to parse automatically because of the lack of standard tools.
XML was chosen because of its extensibility and the wealth of tools that can recognize and parse this format. Extensibility will presumably become more and more important as the standardization on metadata and URCs progresses. To date, though, the coding is completely straightforward. This is an example of how the results of an I2C operation are returned in XML*:
<I2C URN="urn:anaya:/js/collection/Collection"> <LOCATION> <URI>http://anduin.anaya.es/machina/js/collection/Collection.js</URI> </LOCATION> <NEEDS> <URI>urn:anaya:/js/support/support</URI> <URI>urn:anaya:/js/support/RangeMap</URI> <URIurn:anaya:/js/collection/CollectionInterface</URI> <URIurn:anaya:/js/collection/CollectionDelegate</URI> <URIurn:anaya:/js/collection/RemoteCollectionDelegate</URI> </NEEDS> </I2C>
* Our implementation also returns the locations when the I2C operation is performed. The rationale behind that is that locations are indeed characteristics of the resource.
var theJSObject = ; theJSObject.location = ; theJSObject.location['uri'] = ; theJSObject.location['uri'].push('http://anduin.anaya.es/machina/js/collection/Collection.js'); theJSObject; theJSObject.needs = ; theJSObject.needs['uri'] = ; theJSObject.needs['uri'].push('urn:anaya:/js/collection/RemoteCollectionDelegate'); theJSObject.needs['uri'].push('urn:anaya:/js/collection/CollectionDelegate'); theJSObject.needs['uri'].push('urn:anaya:/js/collection/CollectionInterface'); theJSObject.needs['uri'].push('urn:anaya:/js/support/RangeMap'); theJSObject.needs['uri'].push('urn:anaya:/js/support/support'); theJSObject.needs['uri'].reverse(); theJSObject;
As noted earlier, the implementation of a corporate resolver that had to fulfill the aforementioned requirements helped us identify a number of areas of ambiguity in the current set of proposed standards. These limitations had to be overcome by the introduction of several proprietary extensions. Although the driving force was the need to solve the practical problems encountered, the generality and extensibility of the solutions adopted were very much considered. Discussion is encouraged on these extensions so that their relevance can be assessed objectively.
Most clients will need access only to the basic resolution services. However, those organizations that are responsible for the maintenance of namespaces will also need to perform administrative operations, like advanced searches or the creation of URNs. For name resolvers to be truly interoperable, these operations must also be identified, especially when taking into account that URNs can be adopted internally by organizations for their own benefit. The standardization of administrative services will foster the design of off-the-shelf resolvers that can be easily deployed in a variety of environments.
Our name resolver implements the following administrative services:
Discussion on whether and how query strings will be supported in URNs is still necessary . The approach taken by our implementation, described earlier, is similar to the treatment of query strings for URLs. From a conceptual point of view, the use of query strings defines a family of parametric URNs that share the same prefix, which doesn't prevent every "instantiation" from being considered a distinct identifier.
URN aliases are nothing more than additional levels of indirection in the name to location mapping. They represent an improvement in flexibility over the general advantages of using names instead of locations. As demonstrated in a previous example, support for aliases and query strings can simplify the access to complex resources without losing expressive power.
URN concatenation is a proprietary technique similar to URN aliasing that can greatly simplify some administrative tasks. Consider, for example, a situation where URNs for several thousand image files must be created. All of the files are initially located at the same server, and it is anticipated that they will be replicated at a new location some time in the future. URN concatenation makes it possible to specify the location of each URN as the concatenation of a base URN plus a suffix string. For example:
urn:anaya:/images/base - http://media.anaya.es/images/ urn:anaya:/images/00001 - concat:urn:anaya:/images/base+img1.jpg urn:anaya:/images/00002 - concat:urn:anaya:/images/base+img2.png ...
This way, when a new server is added that contains a copy of all the images, the only mapping to update is that of the base URN, that is, urn:anaya:/images/base.
This technique is extremely useful to move entire trees of documents to new locations. That is often the case for Web-based applications whenever the underlying hardware or software needs to be updated.
Note that this feature requires the name resolver to interpret an internal concat: pseudo-scheme.
Unlike complex rule-based resolution, URN concatenation is very easy to implement and simplifies several common administrative tasks. Every URN, moreover, continues to be individually addressable.
For most clients, the standard resolution services (I2L, I2Ls) will be enough, as they provide the final URLs where resources can be effectively found. But the introduction of several levels of indirection calls for the specification of a new set of resolution services that can provide information about the existence of alias or the use of concatenation.
To satisfy this requirement, our resolver implements a number of additional resolution services that simply return the exact mapping defined for URNs, instead of following aliases, performing concatenation, or looking at dependencies. These services are I2LR, I2LsR, and I2CR. They can be considered "restricted" versions of their counterparts because they perform resolution once and return the results as-is.
I2CR is worth looking at. As mentioned, I2C returns a list of dependencies of the resource. If A depends on B depends on C, I2C will return C and B. I2CR, on the other hand, will simply return B.
Once again, it must be emphasized that this family of services will have to be standardized if URN aliases or any other type of indirection level is supported.
Once the basic resolution system was available, specific mechanisms had to be devised to guarantee access to the resolution services from all the user communities of interest.
By the simple setting up of an off-the-shelf proxy server (Netscape Proxy Server 3.5), Web browsers can be configured to access URNs without the need to employ special plug-ins or any other type of client software. When the user requests a urn: address, the browser simply redirects the request to the proxy server. This system, in turn, performs an I2R operation on the URN, and thus the resource is sent to the client. With this simple approach, editors and other end-users are able to browse through the URN space. Some of the benefits of URN-enabled browsing will be outlined in the section on the preliminary results.
Because online services and applications are built taking advantage of the underlying URN infrastructure, a method of access from the outside was necessary for all public services. This is achieved by a filtering server-side software layer that parses HTML documents (either static or application-generated) and replaces all URN occurrences by their equivalent URLs. Thus, external clients can safely look at images and click on hyperlinks. This function is disabled for requests originating inside the intranet. Whenever support for URNs is generalized in the Internet, this filtering layer can simply be removed.
The introduction of URNs to different internal workgroups is following a phased approach that is not over yet.
After beta testing by a small group of individuals, the complete software engineering department plus several other technically oriented people were informed about the benefits of URNs and the availability of a distributed architecture specially suited to Web development. The concept of indirection and its benefits were grasped almost immediately. After a few weeks of testing and validation, the new model was selected for the pilot development of an electronic shop by a team of six people, consisting of one manager, three programmers, and two designers. The same people had participated previously in a similar effort with state-of-the-art technology but no support for URNs. Unfortunately, no quantitative data were gathered at that point, because the purpose was simply to solve problems, identify new features, and get the team used to the model. Qualitative impressions, though, were extremely satisfactory. One of the most important advantages that can be directly attributed to the use of URNs is that the team was self-organizing and parallelism was greatly improved. After an initial agreement on the set of URNs to use, all of them were created and implemented using simple static dummies and mockups. After that point, though, programmers could work on specific data access features while designers worked on page layout. Once a particular feature was finished and tested, the programmer simply removed the old fake mapping, pointing the URN to the new live implementation. From that moment on, the rest of the team members observed that real data were now available, but the way to access the resource was exactly the same as before. This contrasted with the earlier scenario where all details had to be working before results could be seen. With the new model, therefore, designers and management felt more involved in the project and programmers were more focused. Thus, productivity and employee satisfaction increased.
Shortly after that, work began on the development of a set of end-user tools intended for editorial teams. Once the first prototypes were ready, they were made available to a small group of expert knowledge engineers. The engineers are now providing feedback on the features that they think will be useful to support the building of an extensible metadata architecture, to be used in the future by regular editors. This group has also received URNs with enthusiasm, but the necessity of building a graphical tool that allows users to browse through catalogs of resources and create new ones has been made apparent.
At the same time, some of the corporate proxy servers were configured to resolve URNs, and some of the intranet documentation was moved to a URN scheme. Users can browse through pages exactly the same as before. Because the original urn: address is retained, pages can be bookmarked and used as reference. The difference, however, is that the exploitation department can move resources whenever they need to. Before the use of URNs, this was impossible without formally announcing the new locations and warning users of the possibility of encountering invalid references. Reliability and efficiency have also improved because most pages are replicated at several locations.
As indicated, all the benefits promised by URN technology have been observed as soon as resolution services were made available. Therefore, the effort of writing a URN resolver has paid off more than adequately, even if its use must be confined to controlled corporate contexts. This fact indicates that interoperability is a very important goal to achieve, because general-purpose name resolvers are useful pieces of software that could be easily adopted by lots of organizations. To encourage discussion on specific resolution features, our company will continue its commitment to public announcement of results, and is considering the licensing of the resolver code to the Open Software movement.
URNs can be integrated with metadata systems to leverage their capabilities and empower users with the ability to create, modify, classify, and establish relationships using the same tools. Grupo Anaya has started development of a metadata framework for publishers, code named DAWN, whose design goals will also be published .