Last update at http://inet.nttam.com : Tue May 9 16:26:24 1995 Supporting a URI Infrastructure by Message Broadcasting Miguel Rio Joaquim Macedo Antonio Costa Vasco Freitas Abstract In this paper we address the use of standard message broadcasting techniques, namely those found in Usenet News and e-mail distribution lists, as the supporting mechanism of a reliable URN2URC mapping service. As both availability and response time of a name service often rely upon the existance of secondary servers, we sugest that the need will arise for two classes of such URN2URC servers: organizational and thematic secondary servers. While a DNS-like approach to the update of secondary servers seems to suit organizational URN servers, a message brodcasting technique would be more appropriate in the case of thematic servers. Because an URC may provide several URL for the same resource, in both cases URL selection is at issue. This selection functionality may be homed either in the URN2URC secondary servers or in the clients/proxy servers or even in both. As to the selection criteria, they must take several parameters into consideration, such as, domain location, access protocol, resource format and quality of service (QoS). 1 Introduction and Overview In order to achieve interoperability and integration of network services, a universal naming convention is needed. A first step towards this goal is the URL (Uniform Resource Locator) [1] [2], in wide use nowadays, which is a compact representation of location and access method of Internet resources. The expectation, however, is that URN (Universal Resource Name) [3] [4] and URC (Universal Resource Characteristic) [5] are to be used instead, so that location independancy is achieved. A requirement for the migration from URLs to URNs and URCs is the existance of reliable URN2URC or URN2URL mapping services [6]. Such name servers would return, in either case, the URC or an URL of a URN submitted to them. Server availability and response time is often achieved via its replication in secondary servers as it is the case of the DNS. The envisaged meta-information to be held in the resource URC sugests that some classification attribute with values from an agreed thesaurus might also be included allowing servers to be organized not only following the naming structure convention for URNs, but also under thematic criteria [7]. We refer to the latter servers as thematic secondary servers. But these are of a different nature for which a DNS-like approach to their update is not suitable as the URNs that they hold are mastered by potencially any server in the Internet. We sugest a management strategy for thematic servers based upon message broadcasting. In practice this technique is already being used to announce URIs in the Internet. An important issue in the management of URN2URC mapping services concerns setting reasonable limits to the amount of information stored. The question does not arise only as a concern to save storage space at the server, but rather because of the uselessness of keeping a large number of URLs for the same information resource as most certainly some will never be used at all. Hence, rules must be defined to select which URLs are to be kept in the URC at the server. Once such rules are implemented, clients and proxy servers are presented with a limited, but desirably sub-optimal, choise of URLs from which they would try and select the best one to actually access a copy of the desired resource. URL selection rules must be based upon certain criteria. However, as it will be shown, the criteria to be used in clients and in URN2URC servers must be different. We begin with a presentation of the proposed architecture and article format for the broadcasting of URIs. The thematic URN2URC secondary server concept is then introduced. Finally, the problem of URL selection and related parameters is discussed. 2 Message Broadcasting Whatever approach is taken for the migration to URNs, because information replication in the Internet is so widespread (eg, by mirroring), it is unreasonable to expect, at least for a transition period, to have all URLs of a given URN correctly registered. A solution to the transition was proposed in [7], based upon standard message broadcasting mechanisms, such as those provided by Usenet News [8] [9] and Open Distribution Lists (ODL) services, to announce URIs (Uniform Resource dentifiers): URCs of originals and URLs of copies. In fact, these are the mechanisms used for articles posting to exchange URLs, which are pointers to the locations of information resources. Should the articles use a standard format, it would be possible and simple to process them automatically. Such a format would have to include a filled-in URC template comprising of URLs and other meta-information. Hence, URLs might still be colleted in the usual way as they appear explicitly in the URC, but a new set of tools would have to be developed to automatically retrieve these announcements, publish them to an organization and, if desirable, replicate the resource locally. In [7], a set of such tools are described. Figure 1 depicts the architecture of the proposed system. The main functional entities are: the originals register and publisher, the urn2urc server which contains a brodcasted copies detector and a domain register, a broadcasted resources detector and a copies register and provider. +--------------+ | Figure 1 | +--------------+ Figure 1: An architecture for URI broadcasting The originals register and publisher entity sends to the domain register entity some meta-information on a document. We will refer to an Internet information resource as a document. Some of the meta-information field values are provided by the author and other are computed automatically, eg, document size, document date, etc. The urn2urc server gives the document an URN, saves the resulting URC in its database and hands it back to the originals register and publisher. The allocation of URNs is a procedure external to the server. The originals register and publisher is now able to post to the news network or to some distribution list the new document URC. The broadcasted resources detector entity retrieves the announcement and publishes it in its organization (as a WWW page, for instance). When someone whishes to keep a copy of a document, his copies register and provider must register it with the (original) document owner. This may be done either by sending it a message, if the URN is known, or by broadcasting the new URL together with the document's digest. The urn2urc server keeps a broadcasted copies detector looking into the news network for announcements of unregistered copies. Each time this verification process succeeds, it updates its database and eventually sends an e-mail to the copies register and provider. The server also supports direct copy registering. For each new document an announcement is broadcasted by the publisher with the format shown in table 1 while the announcement of a new copy is broadcasted by the copies provider and register with an article format shown in table 2 where and are HTML tags. Table 1: Format of an URC announcement +--------------------------------------------+ | | | urn: uminho.pt:2 | | url: ftp://ftp.uminho.pt/usr/rio | | Title: Supporting a URI Infrast ... | | version: 2.1 | | format: PostScript | | abstract: In this paper we address ... | | signature: qardy5sdw9ayfi5urh97rawy5 | | signatureAlg: md5WithRSAEncryption | | e-mail: urn2urc@uminho.pt | | | +--------------------------------------------+ Table 2: Format of an URL announcement +-------------------------------------------+ | | | url: ftp://cgip.fccn.pt/pub | | digest: isdysjk6ru69ytskgy56walg13h4 | | digestAlg: md5 | | e-mail: copyregister@fccn.pt | | | +-------------------------------------------+ 3 Thematic URN Servers For improved availability and response time of URN2URC servers it is reasonable to expect the deployment of secondary servers as it is done for the DNS. Secondary servers would simply mirror primaries keeping the same links to resources by answering to URN2URC queries although they are not the original name maintainer. We refer to this class of servers as organizational secondary servers. Consideration of secondary servers brings to mind the possibility of secondaries to replicate primaries under a different criteria such as thematic criteria. Thematic servers would be most interesting and usefull since searches very often take a yellow-page aproach. For example, there would be servers on Artificial Intelligence, Telecommunications, Operating Systems and so on. They would require resources to possess some classification attribute and therefore meta-information in URCs to include a classification field which would take values from an agreed classification alphabet or thesaurus. Although universal classification is a non-trivial problem in itself [10], it seems reasonable, as a first approach, to adopt an established classification scheme such as the ad-hoc classification in Usenet Newsgroups or in Open Distribution Lists. The advantages of thematic servers are, at least, threefold: * they can easily collect URCs from newsgroups and Open Distribution Lists published by author organizations [7]. They only need to subscribe to the relevant newsgroups or to the distribution lists; * they are likely to hold most and the more relevant set of URCs under the theme; * they allow yellow page browsing following a News hierarchy derived classification A disadvantage of a thematic server is that it replicates URCs from many (potentially all) servers and thus, standard secondary updating approaches may not make sense in this context. Finally, can thematic servers be considered secondary servers? Yes, in the sense that they do provide for URN to URC mapping and replicate URCs on a non-authoritative basis. 3.1 Update strategies For organizational secondary servers, a DNS-like incremental update approach could be used, irrespective of the urn2urc mapping service protocol (http, whois++). Secondary servers must poll primaries for URC updates and the URC template will need administrative fields (eg, last-modified-time). Such fields may be dropped from responses to normal user queries. For thematic secondary servers, a polling strategy is not suitable, since any server may hold URCs for a given theme or subject. We propose the referal to Usenet newsgroups and ODLs to supporting the updating of this server database. Update latency times in either case are obviously expected to be quite different considering the different updating strategies used. A DNS like strategy can minimize inconsistency between primary and secondary servers. Administrative parameters can reduce the time interval between update connections to a value considered satisfactory. A change is expected to be completly propagated to the secondary servers in a few hours. With the proposed broadcasting mechanism, it is not reasonable to expect the same behaviour. Thematic servers will receive announcements of new URL/URC with an age of days, depending on the broadcast method. A smaller latency is to be expected with ODLs than with Usenet News, but this requires confirmation on the field. Nevertheless such a degree of latency is not really a problem, since there is no need for great accuracy on a thematic URC database. A certain ammount of expected inconsistency is not relevant, considering the tipical usage of those servers. In summary, thematic servers of a given subject will subscribe to the same set of newsgroups and ODLs. The author of a new original document has to classify it and besides registering the document in his organization server he must announce it in the newsgroups and ODLs of that classification using the standard message format. 4 URL Selection Some resources may have dozens of URLs. Should a URN2URC server maintain all of them? It may neither be advisable nor needed because a small number of references will be enough to answer the most frequent queries. Servers may be configured to a maximum number of URLs per resource they are allowed to handle. Another question is whether all URN2URC servers should maintain the same set of links (URLs) to a given resource. A set of parameters can influence this decision. One of the most obvious is location. A server will try to keep references that are either nearer to it or within a given zone. A given URL may be important to some servers but not to others. It should be left to each server to decide. Again, a broadcasting mechanism is needed for the announcement of the new link. Therefore when a resource is replicated, the event should be annouced either in Usenet News or in a specific distribution list using standard article format. An additional type of announcement is needed when the resource is no longer replicated so that all URLs pointing to that resource are discarded. When a URN2URC server is presented with a new URL, it adds the new instance to the correspondent URC. If the maximum number of allowed URLs is reached, the server must run an URL selection algorithm in order to discard one of them. The URL selection problem must also be solved by other entities, like clients and proxy servers, when trying to choose the best URL for a given URN. The scenarios that call for a choice of URL are the following: * There are too many URLs for the same URC. The urn2urc mapping server must select a subset to be held in the URC, discarding the remaining URLs. * In the urn2urc resolution client/proxy servers (or the user himself) need to choose only one URL to access the document. * The urn2urc server returns to a client a subset of the URLs only (based on some criteria) and some additional information to allow further choice by the client/user. Some of the criteria of choice for the URL selection algorithm are now discussed. 4.1 Domain Location It seems clear that the location of a given instance of a resource is an important criterion. Location information can be often derived from the domain name. More precise location information can be found in GPOS DNS Resouce Records [11] which, unfortunately, are seldom used. This problem can be overcome if a matrix of domain geographic position is published. Matrix values would include distances among sites based upon an unloaded network metrics (a function of a selection of static values such as bandwidth, hop count, link lenghts, etc). Domain locations may be grouped upon organization, country and continental basis. To limit the size of such matrix, top level domain granularity should be used. Under certain circumstances the expansion of some top level domains may be desirable. The location parameter is used differently by the URN2URC server and by the client/proxy server in their selection algorithms. A URN2URC server wants to point to instances (copies) of the resource most widely spread in the net, on locations with easy network access. On the other hand, clients/proxy servers want to consider URLs nearer to them. 4.2 Access protocol and document format The URN2URC server is probably interested in providing URLs of copies accessible by multiple protocols (http, gopher, ftp, etc) and in multiple formats (ps, html, txt, etc). Clients/proxy servers, however, use this information based on client capabilities and configuration options. The access protocol and document format can be derived from the URL itself. 4.3 Signed Information Both the URN2URC service provider and the client may prefer locations that carry signed information [6]. This issue can only be taken into account if the information is available in the URC. 4.4 Quality of Service Another important group of parameters are those related with the quality of service (QoS) of a server, a parameter which is usualy a function of server availability and response time. The approach followed is based upon the passive probing [12] concept. While in [12] the author is mainly concerned with providing QoS information to the users, we consider using such a criterion in the URL selection rules. An effective way to measure server availability is to collect the statistics of "host unreachable" or "too many users, try later" answers when trying to connect to it. Response time is the sum of connection and data transfer times in which the processing time of the server is included. In this context both components are significant and therefore total elapsed time is taken for response time. As this parameter is most likely a highly non-stationary process we use an adaptive single exponential smoothing forecasting algorithm [13] to predict it. Should QoS data be organized by host or (host, protocol) pair? Apparently two servers running on the same machine are equally affected. This point is not relevant if one considers the existence of aliases. Two alias names for the same machine will have one entry each in the QoS table. For instance, www.uminho.pt and gopher.uminho.pt may stand for the same host. In such cases the hostname already gives protocol information. A pratical way to dispose of QoS values is to include a QoS Module in the proxy server as depicted in figure 2. This Module would mantain a QoS data record for each host, whose values are not affected by cached information. This information can be fed to local URN2URC servers. +--------------+ | Figure 2 | +--------------+ Figure 2: QoS data module and usage While a selection of a URL by a client will use the two QoS parameters, the URN2URC server should only consider host availability, since it may be queried by a client from an arbitrary location. 4.5 URL selection rules The URL selection algorithms may use one or a combination of the above criteria. For clients, there may exist a configuration file with the prefered options. For proxy servers, a similar configuration file can be generated based on existing client population. In this file the various instances of an URC field may be placed in an order of preference sequence. Randomize in the selection when there is no parameter data available to exercise a suitable criteria [6] or when the data values are all very similar. In clients/proxy servers there must be trigger conditions to indicate when the selection has to be done by the user. This is particularly important for charged accesses to resources. This simple set of rules and criteria were used to design an algorithm to automatically select the best URL or discard the worse ones. The algorithm is currently under evaluation in a field situation. Another valid option is not to use the result of applying the algorithm for automatic URL selection but to advise users [12] and server managers instead. 5 Conclusions and Further Work Broadcasting mechanisms are needed not only during the transition period when migrating from the usage of URLs to URNs and URCs but also if thematic URN servers are introduced. Thematic servers may play an important role in Yellow Pages service provision as a user interface to the whois++ directory mesh [14][15][16]. Tested automatic URL selection algorithms are needed. Often, the end-user is not suitably skilled to choose the best URL of the resource he wants to access. They are also indispensable if correct discarding of URLs in URN2URC mapping servers is to be achieved. QoS parameters play an important role in URL selection. More experimental results are needed for their correct understanding and usage. References [1] Kunze, J., "Functional Recommendations for Internet Resource Locators", RFC 1736, February 1995 [2] Berners-Lee, T., Masinter, L., McCahill, M., eds., "Uniform Resource Locators", RFC 1738, December 1994 [3] Sollins, K., Masinter, L., "Functional Requirements for Uniform Resource Names", RFC 1737, December 1994 [4] Internet Draft, "Uniform Resource Names", Work in progress from Internet the URI working group [5] Internet Draft, "Specification of Uniform Resource Characteristics", Work in progress from the Internet URI working group [6] Internet Draft, "URN to URC resolution scenario", Work in progress from Internet URI working group [7] Rio, M., Costa, A., Macedo, J. and Freitas, V., "A Framework for Broadcasting and Management of URIs", Proc 6th Joint European Networking Conference, JENC6, pp 4311-4316, Telavive, Israel, May 15-18, 1995. [8] Kantor, B., Lapsley, P., "A Proposed Standard for the Stream-Based Transmission of News", RFC 977 Network News Transfer Protocol [9] Horton, M.R., RFC 850 Standard for Interchange of USENET messages [10] Neves, F.L., Oliveira, J.N., "Classifying Internet Objects", National Portuguese WWW Conf: Multimedia Information in the Internet, Universidade do Minho, Braga, Portugal, July 6-8, 1995, (submitted extended abstract). [11] Farrell, C., Schulze, M., Pleitner, S., Baldoni, D., "DNS Encoding of Geographical Location", RFC 1712, November 94 [12] Baker, P., "Providing X.500 DUI with QoS information", Computer Communication Review, ACM SIGCOMM, pp 28-37, 1994 [13] Makridakis, S., WeelWrigth, S. McGee, V., "Forecasting methods and applications", John Wiley & Sons, pp 84-111, 1983. [14] Internet Draft, "Architecture of the whois++ service", Work in progress from Internet Network working group [15] Internet Draft, "Architecture of the whois++ index service", Work in progress from Internet Network working group [16] Internet Draft, "How to interact with whois++ directory mesh", Work in progress from Internet Network working group Author Information Miguel Rio is a student in Systems and Informatics Engineering at the University of Minho, Portugal, and is currently working in his final year project with the Computer Communications group of the Department of Informatics. Joaquim Macedo is a Lecturer of Computer Communications at the University of Minho, Braga, Portugal. He graduated in Electronics and Telecommunications Engineering in 1983 at the University of Agostinho Neto, Angola, and received his Masters degree from the University of Minho in 1993. From 1990 until 1994 he participated in several technical working groups for the establishment of the Portuguese National R&D Network, the RCCN, and was particularly active in the development of X.500 Directory Services. He is currently doing research for a Doctoral degree where his interests concern networked information and directory services and protocols. Antonio Costa is Assistant Lecturer in Computer Comunications at the University of Minho, Braga, Portugal. He graduated in Systems and Informatics Engineering in 1992 at this University and is currently pursuing post-graduate studies in network information services and protocols. He has been involved in several R&D projects under contract with the Computer Communications group in this area. Vasco Freitas is Associate Professor of Computer Communications at the University of Minho, Braga, Portugal. He graduated in electronic and telecommunications engineering in 1972 at the University of Lourenco Marques and received his M.Sc. and Ph.D. degrees from the University of Manchester (UK) in 1977 and 1980 respectively. From 1989 until 1994, on a partial leave from his University, he was appointed a member of the Board of Directors of the Portuguese National Foundation for Scientific Computing, where he had the opportunity to foster the establishment of the Portuguese R&D Network (RCCN). He represented the national networking initiatives for R&D in the RARE Association, DANTE and Ebone from their beginnings until recently. Currently, his interests concern networked information services and protocols and the specification, modelling and prototyping of communication protocols.