Manfred Bogen <firstname.lastname@example.org>
Guido Hansen <email@example.com>
Michael Lenz <firstname.lastname@example.org>
German National Research Center for Information Technology
W3Gate is an e-mail-based access method to the World Wide Web. It was developed by the German National Research Center for Information Technology for people with restricted access to the Internet, be it through a poor connectivity or security measures hindering them from accessing the Internet freely from an Intranet. W3Gate also offers everybody a possibility for asynchronous communication and cooperation with the Web.
A regular W3Gate service has been in production since the end of 1995, offered on a voluntary and free-of-charge basis. Since that time its usage by end users has grown rapidly. We have suffered from occasional attempts by networked maniacs to deliberately abuse this service, or from misuse by end users who did not understand our documentation and help files. All this finally led to a redesign of the software and to enhanced quality management tools.
This paper surveys our experiences running W3Gate and current developments to improve security and the quality of service. It presents our latest results together with our newest ideas about the future of W3Gate.
Keywords: multimedia, multimedia network interface, electronic mail, MIME, WWW, HTML, FTP, WAIS, GOPHER, NetNews, security, authentication, administration, value-added services, quality of communication services.
The World Wide Web (WWW) is a steadily growing worldwide library and repository full of multimedia objects. Most people have direct access to WWW through such interfaces as Mosaic, Netscape Navigator, or Microsoft Internet Explorer, which enable them to fetch multimedia objects such as video or audio clips in a comfortable and user-friendly way. At times, Web users share a problem with people who have no Web access: many documents are not retrievable during normal working hours because connectivity is bad, their transmission would impair ongoing work too long, or users are not allowed to get documents through HTTP because of security considerations and the installation of a company firewall protecting the Intranet from the global Internet (Figure 1).
Figure 1: W3Gate embedding
E-mail access to the Web solves a lot of problems. Apart from asynchrony, electronic mail has more advantages. It is one of the most popular value-added services in networking today, with a yearly growth of 35% and about 45 million users worldwide. Its high acceptance is somewhere between telephone and fax machines. Compared with other communication and cooperation facilities its speed and full coverage are quite impressive.
This paper is structured as follows: section 2 presents our early ideas about W3Gate and our research and design goals at that time. Section 3 relates W3Gate to other, similar implementations in the field. The next two sections describe the W3Gate implementation and its usage by end users. Section 6 outlines the abuse of W3Gate and our attempts to restrict this to a minimal percentage. The last section summarizes our latest ideas about the next steps and the near future of W3Gate, even if we expect more and more people to have a performing access to the Internet.
In early 1995, the German National Research Center for Information Technology (GMD) decided to build a gateway between electronic mail and the Web that would offer an asynchronous access to multimedia on the Web. In sending an e-mail to this gateway--we finally called it W3Gate (email@example.com)--with requests for the desired information objects, one should be able to access the information without waiting too long.
W3Gate started as vocational training for our staff to learn more about networking in the Internet, WWW, HTTP and HTML (Hypertext Markup Language), and programming languages such as Perl, C, and Unix shell. In our first concept, we had the following ideas and goals:
In spite of other implementations in the field, which are described in the next section, we definitely wanted to make our own implementation in order to use our experiences in establishing and running value-added services and in order to learn about the WWW servers, cache servers, proxies, and communication protocols needed.
When W3Gate was developed, similar services were set up, too. The most important one is Agora, developed at CERN by Arthur Secret . It was designed to fetch HTML documents via e-mail on behalf of a user from the specified server. A document and all references can be fetched with a deep command. To protect the users and the network in general, the size of what a user can get is limited to 5,000 lines per message.
Agora limits the number of requests processed in each message to 10. This is necessary to prevent the gateway from being attacked by messages containing hundreds of requests or users being unaware of what happens if they send a few messages with lots of requests.
Because most HTML files contain links to inline images or 8-bit files such as archives or executables, an e-mail-WWW gateway must encode these files before transport. Agora uses uuencode for this purpose. Agora is freeware, so other sites have set up Agora servers too (e.g., firstname.lastname@example.org).
As the Internet provides more information services besides WWW, such as NetNews, FTP (File Transfer Protocol), Gopher, WAIS, and search engines, some Agora servers also give access to newsgroups (email@example.com). Others provide only a gateway to HTTP and have no support for other services.
Webmail  (firstname.lastname@example.org), another similar implementation, was started at the suggestion of attendees at the Developing Countries Workshop at Stanford before INET '93 in San Francisco. It has some restrictions. It handles only one request per message and does not send binaries such as images or executable files. In addition, files larger than 64 Kb cannot be returned.
WebMail (email@example.com) is a newer gateway that supports the protocols HTTP, FTP, Gopher, and news. There is also a freeware package available for a Personal Digital Assistant (PDA) that acts as a front end. Currently it does not support binary files or files larger than 100 kb. WebMail does not support retrieval of linked documents. It is the only gateway with accounting. Unregistered users are limited to a certain number of retrievals per month; registered users are granted unlimited access. The fees are as little as $1 per month.
Electronic messages containing requests for help or for Web documents (Table 1) may have 10 commands in the message body at maximum. The subject field of the message is currently unused. The first word of each command line must be get or help. The commands are case-insensitive. Without any option, the get command returns the document denoted by URL unchanged to the requester. The subject field of a returned message contains the command the requester has used.
|get [ -t | -u | -a [-c columns]] [-ps] [-z] [-uu] [-s size] [-img] [-l] URL|
|-t||strip all tags|
|-u||preserve links to other documents as relative URLs if possible|
|-a||preserve links to other documents as absolute URLs|
|-c columns||wrap lines after columns columns|
|-ps||convert ASCII text into postscript format|
|-uu||uuencode before mailing|
|-s size||set size of document in email to size [kbytes]|
|-img||get all inline-images|
|-l||get all documents from links|
The options -t, -u, and -a are mutually exclusive. If one of these options is present the requested document is formatted according to the HTML tags included, if any. If one of the -u or -a options is specified, all URLs to linked documents are preserved in the text either as relative or as absolute URLs.
The -c option is allowed only in conjunction with one of the options -t, -u, or -a. If specified, the document is formatted with the given number of columns. If the value remains under 40 or exceeds 255 it is set to the respective limit. The number of columns defaults to 80, which is still the standard column length of messages in the EARN/BITNET world.
The -ps option causes any ASCII document to be converted into a PostScript document. The document is displayed in portrait mode. Large documents can be compressed (-z) and uuencoded (-uu) before transport.
Users can specify the maximum message size that their electronic mail system can handle using the -s option. By specifying -l W3Gate not only fetches the document denoted by the URL itself but also all other documents referred by hyperlinks. Analogously, the -img option tries to fetch all included images in a document. Users thus get an optical impression of the Web site.
When a user specifies the -l or -img option, W3Gate generates an index list at the beginning of the returned message that includes the URLs of the documents that are attached to the message.
As there are still many people with a text-only interface to WWW, W3Gate puts the alternative text that can be added to hyperlinks of inline images into the message by using the -u or -a option. Thus a user gets an idea of the layout of the Web page fetched.
When processing files larger than the size given by the -s option or 100 kb on default, users receive several messages containing split parts of that size as result. From the subject field users can see in which order the whole file has to be reassembled. They just have to strip off the header lines from each message and join the parts together according to the sequence number.
W3Gate has even more features and service elements to offer. The requester can split a very long URL into several lines using the backslash character at the end of each line. These URLs are common in queries to search engines working with databases. The user can also access documents via FTP, Gopher, and WAIS. If the requester uses the command option -l, the reply from W3Gate contains a list of all included files at the beginning of the message.
W3Gate makes dynamic retries to fetch documents that are currently unavailable. In the meantime a user gets a message with a note about how many times W3Gate will try to fetch the file again before it gives up. It also provides access to password-protected files because a user-id and password-field can be added to an URL. Moreover, a user can download software up to 5 Mb per get request.
W3Gate (Figure 2) handles not only plain HTML files but also binary files. Files are accessed via HTTP (96%), via FTP (3%), and via Gopher and WAIS (1%).
Figure 2: File formats requested from W3Gate
Users from more than 70 countries are using W3Gate regularly (Figure 3). They are from universities and schools and also from profit organizations. Last month, 21% of the requests were from Germany, 8% from Italy, and 5% from Poland. About 19% of the requests were from commercial organizations throughout the world, and about 10% were from developing countries.
Figure 3: Countries using W3Gate
Since W3Gate started its work in May 1995 the number of transmitted files has grown from a few to an average of 12,000 a day, especially in spring 1996 after the announcement of W3Gate in several newsgroups and mailing lists . At that point the traffic grew to 7.5 Gb per month (Figure 4). People liked W3Gate's functionality (a wide range of command options), performance (short response times), and throughput. By now, W3Gate is a well-established service.
Figure 4: W3Gate traffic
Some declines in the usage statistics are caused by operational problems and misuse. In June 1996 we had to take W3Gate out of service for nearly two weeks because somebody misused W3Gate and put heavy loads on GMD's central mail server and on another site on the Web (see next section). In September we fixed another bug that caused the small decline at that time. During those times we received many complaints about W3Gate being down and we tried hard to establish it again as soon as possible. Despite that trouble the number of users grows continuously and enthusiastic messages show us that the system is well appreciated. The decline in December is caused by the winter holidays.
We did not want W3Gate to be just another prototype. Instead, we wanted to use our operational experience with other value-added services to meet quality of service expectations for W3Gate.
In reaction to different incidents W3Gate was taken out of service several times for periods ranging from some days up to two weeks in order to fix existing problems and to clean up our machines. All requests during this time were lost, sometimes without informing customers. This is obviously unacceptable even for a free service. Users expect a 24-hour service 365 days of the year, whether they pay for it or not.
Apart from availability, high reliability is very important for a professional service. For each received request a corresponding reply has to be sent, be it the requested document, an error report, or a help file explaining the commands and their usage. Even if the service machine crashes, none of the already received requests should be lost, nor should replies be sent multiple times.
In order to achieve these goals we changed W3Gate's design. Now all requests and their current state are persistently stored in the file system. Hence, after a restart W3Gate can reconstruct the system status from these files.
If a requested document cannot be retrieved from the original server immediately, the request is repeated several times. The delays between successive retries for the same request grow progressively so that the retries are equally distributed over the hours of a day and the days of a week. If the maximum number of retries is exhausted, an error message is returned. Analogously, the electronic mail system tries to resend the reply messages several times before they finally bounce.
For acceptance of a service or product, user friendliness and ease of use play a central role. W3Gate's commands are simple and straightforward. Nevertheless, if W3Gate receives an e-mail with incorrect commands it generates a message that cites the wrong commands and explains the nature of the error. Therefore, users are able to learn about the usage by themselves and do not have to ask the service provider.
W3Gate also ignores signatures preceded by "-- " (minus+minus+blank) in a separate line. Because of this, users do not have to reconfigure their mail user agents each time they want to use W3Gate. They need only add this line at the beginning of the signature once.
If those users who like to work with WWW asynchronously or those with poor Internet connectivity read their mail with a WWW browser, they can click on these links and the browser displays the document immediately. So users have the feeling as if they fetched the document directly with the browser.
To support our customers, we established a W3Gate administration e-mail address (firstname.lastname@example.org). All comments related to the service and requests for help can be sent to this address. They will be handled immediately by our administrators.
Running a publicly accessible service on the Internet means not only attracting harmless customers but also, probably sooner than later, malicious characters. So we had to learn our lessons too, especially at the beginning of operation. Primarily, we were confronted with denial of service attacks of various kinds, striking W3Gate itself and also information providers as well as Internet service providers and Internet users. While this normally imposes no real risks on the affected machines themselves, it is annoying. Servers or services that might urgently be needed become unavailable, system administrators are kept busy cleaning up and bringing the system into normal operation again, and last but not least bandwidth is wasted. So, these incidents gave us a hard time, but by examining our log files and investigating the attacks we found out to our surprise that they were, in general, not caused intentionally but inadvertently by innocent users.
W3Gate can not only be used to fetch static HTML documents or image files, but also to call scripts, possibly resulting in a heavy load on the affected Web server. By sending the same request on a regular basis to W3Gate, users can retrieve recent versions of the data they are interested in. But if the requests are too frequent they may cause serious trouble to the server hosting the information.
One victim of such an automated process was the Deja News server , one of the world's largest publicly searchable Usenet news archives. One of W3Gate's users was interested in all articles in some particular news groups. But he used a very poor solution: he triggered search requests for a couple of common words that were likely to be included in almost every posting. In addition, he soon started to receive only error messages because he had based his requests on some temporary information that was automatically deleted after a short while. He continued sending his requests even more frequently (simply because he didn't receive the intended article list), finally bringing the Deja News server to its knees.
In consequence, W3Gate was temporarily excluded from using Deja News. Only after excluding the user from our service and after some debates with system administrators was W3Gate let in again. The use of an exclusion list for W3Gate is not an optimal solution, but we think it the only practicable one, and others such as the popular Listserv e-mail distribution list service use it too. A clear disadvantage is that users can only be excluded after an incident, not in advance. But other approaches, such as limiting the number of requests per user per day, would have led to poor overall availability of the service and were hence rejected. Another approach would be establishing a closed user group (see "Next steps").
The Deja News incident gave GMD a hard time too. Because of missing MX records in the domain name system (DNS) , W3Gate's replies could not be delivered properly to requesting users and hence steadily filled up GMD's central mail server. Finally even local mail deliveries took a couple of hours instead of a few seconds.
Now it was up to us to take W3Gate out of service in order to clean up and to move to a separate mail server dedicated to W3Gate. In addition, we tried to get in contact with the responsible IP service provider to get the DNS entries fixed. But because we didn't know a contact person and had no proper e-mail address, it was not that easy. Finally we received apologies from the IP service provider, and even came into contact with the user who had caused all this trouble. He affirmed that he just wanted to receive news articles for off-line reading (asynchronous work!) and that he had no idea of the trouble and no intention to shut down Deja News and W3Gate at all.
Another potential threat is flooding W3Gate with requests so that the service would effectively become unavailable. By splitting W3Gate into multiple components, we tried to give it a fail-safe design so that it would not use all system resources and end up in a system halt, even when confronted with a huge number of concurrent requests. Nevertheless, response times would decrease in such situations.
Again, providing unlimited service only to a closed user group might be at least a partial solution to these problems, as one better knows whom and what to blame.
During the operation of W3Gate so far we have encountered at least one case where a user unintentionally put himself under attack by sending an inappropriate request to W3Gate. Sometimes the default size of 100 Kb for fragments of a larger document might be inappropriate for the electronic mail system so that it becomes necessary to restrict this limit using the -s option. In our first implementation users could specify an appropriate maximum message size in bytes. But it quickly turned out that this was not fail-safe enough. One day a PC user confused bytes and kilobytes, resulting in a large archive file being sent to him in slices of 50 bytes instead of 50 kilobytes. Consequently, his mailbox was flooded by thousands of messages. After we noticed we were able to delete another few thousand messages on our side before delivery. We changed the meaning of the -s option; now users specify the maximum message size in kilobytes to be on the safe side.
In another incident the Internet service provider of the requesting user got struck because of a strange correlation of events. The user requested a large image file, which was properly split into several pieces of 100 Kb and sent back. But as the IP provider only guaranteed the proper delivery of messages up to but not including that size, for each message received from W3Gate a notification was returned asking for smaller messages in future. Unfortunately, these replies also contained a small piece of the complained message, which happened to include the original get request. For ease of use W3Gate had a very gentle command parser. It took the citation for a new request and again sent the whole picture properly split into multiple pieces to the requesting site, resulting in a mail explosion. So we had to restrict W3Gate's command parser a little to prevent such mail floods from happening again.
In our first implementation W3Gate supported an additional option allowing users to redirect answers to an e-mail address other than the requesting one. From the beginning this option was under discussion. On the one hand, it gave W3Gate's users the ability to send interesting Web documents to their home address from any computer in the world. On the other hand, this option could be abused for mail bombing innocent Internet users. During most of the incidents our first suspicion was in this direction, but the intent always proved to be innocent in the end.
However, we finally decided that the potential benefit of this option was far outweighed by its risk and that it was probably only a matter of time until this potential was abused. So we decided to be better safe than sorry and removed the option.
We plan to enhance the W3Gate functionality and to organize W3Gate software distribution, eventually licensing it.
The incidents mentioned in the previous section led to further discussions within our group. We are thinking about user registration with an e-mail-based three-way-handshake protocol: a potential user will have to register by sending a message to W3Gate's administrator, who in turn will send back an acceptance message to the requesting address. Only if a confirmation message for this acceptance letter arrives at W3Gate will a user be registered.
Additionally, unregistered users will have no or limited access to the service, i.e., only a limited number of requests would be processed or only replies up to a limited size would be sent. We haven't decided definitely on these measures because the balance between security and restrictive burdens on users is hard to find.
An integration of new communication protocols is already scheduled, as W3Gate now supports only the protocols HTTP, FTP, Gopher, and WAIS. More services based on W3Gate, such as subscription mode or integration into Web clients, will be implemented soon. As far as document handling is concerned, external references, HTML 3.2 , full MIME support, and news support are on our task list.
German publishers' recent studies about WWW usage have shown that only in some cases (shopping, entertainment) is a real interactive component needed. The rest of the usage is information or data retrieval, which can be perfectly covered by asynchronous electronic mail requests (Figure 5). This gives us hope that W3Gate will be needed for some years, and led to our decision to make a real package out of the W3Gate software.
Figure 5: Common usage of WWW
In February 1997, we will have version 2.0 in alpha testing, and we found a German cooperation partner, GreenPeace in Hamburg , to act as beta testing site for us. With this experience, we will improve the software, the documentation, and the version handling. Additionally, we are curious whether our license agreement will be acceptable to potential cooperation partners.
Our usage numbers show that a service like W3Gate is useful, even in times where everybody is supposed to have fast Internet and WWW access. To offer such a service is not easy: free usage and responsibility for resources contradict each other. Nevertheless, we envisage continuing this service because we see it as one of our major tasks to live with those unpredictable challenges.
What will happen to W3Gate if everybody really has this almost unlimited access to the Internet and the World Wide Web, which may happen in the next three years? W3Gate will still be attractive to people working asynchronously and for people in Intranets with no open access to the Internet. We assume those people to be in the commercial and industrial world, and we plan to work closely with them in the enhancement of the W3Gate service.
Manfred Bogen has been active in group communication, X.400 development, and X.400 standardization since 1983. In 1987 he became head of the VaS group responsible for the provision of value-added services. He studied computer science at the University of Bonn and is co-author of two books about X.400 and distributed group communication. At present, he is the convener of the TERENA working group on quality management for networking (WG-QMN) and a member of the TERENA Technical Committee and the Internet Society.
Guido Hansen has completed his vocational training as a mathematic-technical assistant (MTA) at GMD in 1995. During practical work as part of his vocation he wrote the first version of W3Gate. He is now working in the VaS research group in W3Gate, WWW, and Web-based projects with external cooperation partners from the media industry.
Michael Lenz received his master's degree in informatics (computer science) from the University of Bonn. He has worked at GMD in the Department for Network Engineering since 1993. His major topics are value-added services, security, and information systems. Since late 1993 he has been involved with establishing and maintaining information services such as WWW at GMD and within various external projects. He was especially involved in W3Gate's security and QoS design.
The authors can be reached at GMD, German National Research Center for Information Technology, D-53754 Sankt Augustin.