Privacy on the Internet

David M. Goldschlag <goldschlag@itd.nrl.navy.mil>
Michael G. Reed <reed@itd.nrl.navy.mil>
Paul F. Syverson <syverson@itd.nrl.navy.mil>
Naval Research Laboratory
USA

Abstract

The World Wide Web is rapidly becoming an important tool for communication and electronic commerce. But electronic messages sent over the Internet can be easily snooped on and tracked, revealing who is talking to whom and what they are talking about. Is privacy important, and how can it be guaranteed? This paper describes how a freely available system, onion routing, can be used to provide privacy for a wide variety of Internet services, including virtual private networks, Web browsing, e-mail, remote login, and electronic cash.

Keywords: anonymity, Internet, mixes, privacy, security, traffic analysis.

Introduction
The problem
Onion routing
Network configurations
- A basic configuration
- The customer-ISP model
Applications
Conclusion
References

Introduction

The World Wide Web is rapidly becoming an important tool for communication and electronic commerce. Is Internet communication private? Most security concerns focus on eavesdropping [7] to prevent outsiders from listening in on electronic conversations. But encrypted messages can still be tracked, revealing who is talking to whom. This tracking is called traffic analysis and may reveal sensitive information. For example, the existence of intercompany collaboration may be confidential. Similarly, e-mail users may not wish to reveal who they are communicating with to the rest of the world. In certain cases anonymity may be desirable also; anonymous e-cash is not very anonymous if delivered with a return address. Web-based shopping or browsing of public databases should not require revealing one's identity.

This paper describes how a freely available system, onion routing, can be used to protect a variety of Internet services against eavesdropping and traffic analysis attacks from both the network and observers. The focus here is on configurations of onion routing networks and applications of onion routing, including virtual private networks (VPNs), Web browsing, e-mail, remote login, and electronic cash. For the purposes of this paper, onion routing is treated as a black box that provides anonymous connections. Anonymous connections are bidirectional and real-time communication channels that do not implicitly convey identifying information about the connected parties. Any identifying information must be carried in the data stream over the anonymous connection. The goal of onion routing is anonymous connections, not anonymous communication. Section 3 provides a brief overview of onion routing; other papers describe onion routing in greater detail [9, 8, 5].

This paper is organized in the following way: Section 2 defines the threats of eavesdropping and traffic analysis. Section 3 provides a brief overview of the onion routing system. Section 4 describes how onion routing networks may be configured, and how varying the configuration changes the privacy characteristics of the network. Section 5 describes how onion routing may be used in a wide variety of Internet applications. Section 6 describes related work and presents concluding remarks.

The problem

Letters sent through the post office are usually in an envelope marked with the sender's and recipient's addresses. We trust that the post office does not peek inside the envelope, because we consider the contents private. We also trust that the post office does not monitor who sends mail to whom, because that information is also considered private.

These two types of sensitive information, the contents of an envelope and its address, apply equally well to electronic communication over the Internet and the Web. As the Web becomes increasingly important, so does protecting the privacy of electronic messages. Just like mail, electronic messages travel in electronic envelopes. Protecting the privacy of electronic messages requires both safeguarding the contents of the envelopes and hiding the addresses on them. Although communicating parties usually identify themselves to one another, there is no reason that the use of a public network like the Internet ought to reveal to others who is talking to whom and what they are talking about. The first concern is traffic analysis, the latter is eavesdropping.

By making both eavesdropping and traffic analysis hard, the privacy of communication is protected. But what about anonymity? Can two parties communicate if one or both do not want to be identified to the other? If a Web surfer wants to buy something using the electronic equivalent of (untraceable) cash [10], how could that e-cash be moved through the Web without identifying the purchaser?

If an electronic envelope keeps its contents private, and the address on the envelope is also hidden, then any identifying information can only be inside the envelope! So for anonymous communication, we also remove identifying information from the contents of an envelope. This may be called anonymizing a private envelope.

These goals may appear to be insoluble: Can the contents of an envelope really be kept private? How can a letter reach its destination if its address is hidden? Can two parties communicate without revealing their identities to one another? Can all this be done without trusting third parties (the post office, for example) not to remember addresses or to open envelopes?

The next section briefly describes the onion routing system, how the anonymous connections that it provides are secure against both eavesdropping and traffic analysis, and how they may be used for anonymous communication too.

Onion routing

Traffic analysis can be used to infer who is talking to whom over a public network. For example, in a packet-switched network like the Internet, packets have a header used for routing and a payload that carries the data. The header, which must be visible to the network (and to observers of the network), reveals the source and destination of the packet. Even if the header were obscured in some way, the packet could still be tracked as it moves through the network. Encrypting the payload is similarly ineffective, because the goal of traffic analysis is to identify who is talking to whom and not (to identify directly) the content of that conversation.

Onion routing protects against traffic analysis attacks from both the network and observers. Onion routing works in the following way: The initiating application, instead of making a connection directly to a responding server, makes a connection to an application-specific "onion routing proxy" on some remote machine. That proxy builds an anonymous connection through several other onion routers to the destination. Each onion router can only identify adjacent onion routers along the route. When the connection is broken, even this limited information about the connection is cleared at each router. Data passed along the anonymous connection appears different at and to each router, so data cannot be tracked en route and compromised onion routers cannot cooperate. An onion routing network can exist in several configurations that permit efficient use by both large institutions and individuals.

The onion routing proxy defines a route through the onion routing network by constructing a layered data structure called an onion and sending that onion through the onion routing network. Each layer of the onion defines the next hop in a route. An onion router that receives an onion peels off its layer, reads from that layer the name of the next hop and the cryptographic information associated with that hop in the anonymous connection, pads the embedded onion to some constant size, and sends it to the next onion router.

Before sending data over an anonymous connection, the initiator's onion routing proxy adds a layer of encryption for each onion router in the route. As data moves through the private connection, each router removes one layer of encryption, so it finally arrives as plaintext. This layering occurs in the reverse order for data moving back to the initiator. So data that has passed backward through the anonymous connection must be repeatedly decrypted to obtain the plaintext. The last onion router forwards data to another type of proxy on the same machine, called the responder's proxy, whose job is to pass data between the onion network and the responding server.

For instructions on how to use our onion routing prototype, please visit the onion routing Web site.

Network configurations

A basic configuration

In one basic onion routing network configuration, an onion router might sit on the firewall of a protected site. This router serves as an interface between machines behind the firewall and the rest of the network. To complicate tracking of traffic originating or terminating within the protected site, this onion router should also route data between other onion routers. There are three important features of this configuration:

Connections between machines behind onion routers are protected against both eavesdropping and traffic analysis. Because the data stream never appears in the clear on the public network, this data may carry identifying information, but communication is still private. (This feature is used in section 5.1.)
The onion router at the originating protected site knows both the source and destination of a connection.
As the connection between the responder's proxy and the responding server is unencrypted, communication to machines not inside a protected site must be anonymous. That is, the data stream must not identify the initiator. (We call this anonymizing the anonymous connection.) Otherwise, an attacker could listen in on the final segment of the connection and identify the initiator.

The customer-ISP model

In the basic configuration, the first onion router (or the first onion routing proxy) is the most trusted one. It may be desirable to move that trust closer to the user. For example, an Internet service provider (ISP) may run an onion router that accepts connections from onion routing proxies running on subscribers' machines. In this configuration, users generate onions specifying a path through the ISP's onion router to the destination. Although the ISP knows who initiates the connection, the ISP would not know with whom the customer is communicating. So the customer need not trust the ISP to maintain his privacy. Furthermore, the ISP becomes a common carrier, who carries data for its customers. This may relieve the ISP of responsibility both for whom users are communicating with and the content of those conversations.

Applications

We first describe how to use anonymous connection in VPNs and anonymous chatting services. We then describe onion routing proxies for three Internet services: Web browsing, e-mail, and remote logins. These three proxies have been implemented. Anonymizing versions that remove the identifying information that may be present in the headers of these services' data streams have been implemented as well.

Virtual private networks

If two sites want to collaborate, they could establish a long-term tunnel that would multiplex many socket connections over a single anonymous connection. This would effectively hide who is collaborating with whom and what they are working on, without requiring the relatively expensive construction of many individual anonymous connections. Such connections between enclaves provide the analog of a leased line over a public network.

Anonymous chatting

Anonymous connections can be used in a service similar to Internet Relay Chat (IRC), where many parties meet to chat at some central server. The chat server may mate several anonymous connections carrying matching tokens. Each party defines the part of the connection leading back to itself, so no party has to trust the other to maintain its privacy. If the communicating parties layer end-to-end encryption over the mated anonymous connections, they also prevent the central server from listening in on the conversation.

Anonymous cash

Certain forms of e-cash are designed to be anonymous and untraceable, unless they are double spent or otherwise misused. However, if a customer cannot contact a vendor without identifying himself, the anonymity of e-cash is undermined. For transactions where both payment and product can be conveyed electronically, anonymous connections can be used to hide the identities of the parties from one another.

How can the customer be prevented from taking his purchase without paying for it (e.g., by closing the connection early) or the vendor be prevented from taking the customer's e-cash without completing the transaction? This is a hard problem. In the case of a well-known vendor, a practical solution is to require customers to pay first. The vendor is unlikely to deliberately cheat its customers because it may be caught in an audit.

Remote logins

We proxy remote login requests by taking advantage of the optional -l username to rlogin. The usual rlogin command is of the form:

rlogin -l username server

To use rlogin through an onion routing proxy, one would type

rlogin -l username@server proxy

where proxy refers to the onion routing proxy to be used and both username and server are the same as specified above. At the protocol level, a normal rlogin request is transmitted from a privileged port on the client to the well-known port for rlogin (513) on the server as:

\0 username on client \0 username on server \0 terminal type \0

where username on client is the username of the individual invoking the command on the client machine, username on server is either the -l field (if specified) or the username of the individual invoking the command on the client machine (if no -l is specified), and the terminal type is a standard termcap/linespeed specification. The server responds with a single zero byte if it will accept the connection or breaks the socket connection if an error has occurred or the connection is rejected. Our normal rlogin proxy therefore receives the initial request:

\0 username on client \0 username@server \0 terminal type \0

The proxy creates an anonymous connection to the RLOGIN port on the server machine and proceeds to send a massaged request of the form:

\0 username on client \0 username \0 terminal type \0

Once this request is transmitted to the server, the proxy blindly forwards data in both directions between the client and server until the socket is broken by either side.

For the anonymizing proxy for rlogin, the proxy proceeds as outlined above with the following simple modification. The massaged request is now of the form:

\0 username \0 username \0 terminal type \0

thus blinding who initiated the rlogin request. At present, we see no reason to sanitize the terminal type field because it reveals no identity information.

Web browsing

Proxying HTTP (Hypertext Transfer Protocol) requests follows the IETF HTTP V1.1 Draft Specification [4]. An HTTP request from a client through an HTTP proxy is of the form:


GET http://foo.bar.com/baz.html HTTP/1.0

followed by optional fields. Notice that an HTTP request from a client to a server is of the form:


GET baz.html HTTP/1.0

also followed by optional fields. The server name and protocol type are missing, because the connection is made directly to the server.

As an example, a complete request from Netscape Navigator to an onion router HTTP proxy may look like:


GET http://www.foobar.com/file.html HTTP/1.0

Referer: http://www.foobar.com/index.html

Proxy-Connection: Keep-Alive

User-Agent: Mozilla/3.0 (X11; I; SunOS 5.4 sun4m)

Host: www.foobar.com

Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg

The proxy must create an anonymous connection to www.foobar.com and issue a request as if it were a client. Therefore, the request must be massaged to remove the server name and protocol type and transmitted to www.foobar.com over the anonymous connection. Once this request is transmitted to the server, the proxy blindly forwards data in both directions between the client and server until the socket is broken by either side.

For the anonymizing proxy of HTTP, the proxy proceeds as outlined above with one change. It is now necessary to sanitize the optional fields that follow the GET command because they may contain identity information. Furthermore, the data stream during a connection must be monitored, to sanitize additional headers that might occur during the connection.

The Anonymizer [1] also provides anonymous Web browsing. Users can connect to servers through the Anonymizer and it strips off identifying headers. This is essentially what our anonymizing HTTP onion routing proxy does. But packets can still be tracked and monitored. The Anonymizer could be used as a front end to the onion routing network to provide effective protection against traffic analysis.

Electronic mail

Electronic mail is proxied by utilizing the user%host@proxy form of e-mail address instead of the normal user@host form. This form should work with most current and older mail systems. Under this form, the client contacts the proxy server's well-known SMTP port (25). Instead of the normal mail daemon listening to that port, the proxy listens and interprets what it receives following a strict state machine: wait for a valid HELO command, wait for a valid MAIL command, and then wait for a valid RCPT command. Each command argument is temporarily buffered. Once the RCPT command has been received, the proxy proceeds to create an anonymous connection to the destination server and relays the HELO and MAIL commands exactly as received. The RCPT command is massaged and forwarded. Once this request is transmitted to the server, the proxy blindly forwards data in both directions from the client and server until the socket is broken by either side. An example of e-mail from joe@bar.com on the machine foo.bar.com to mary@baz.com via the onions.bar.com onion router is given below. First the communications from the client on foo.bar.com to the onion router SMTP proxy on onions.bar.com is given, followed by the communications from the responder's proxy to baz.com:


220 onions.bar.com OnionProxy ready at Wed, 28 Aug 96 15:13:34 EDT

HELO foo.bar.com

250 onions.bar.com Hello foo.bar.com [127.0.0.1], pleased to meet you

MAIL From: joe@bar.com

250 joe@bar.com... Sender ok

RCPT To: mary

The proxy massages the RCPT line to make the address mary@baz.com and makes an anonymous connection to baz.com. It then replays the massaged protocol to baz.com:


220 baz.com Sendmail 4.1/SMI-4.1 ready at Wed, 28 Aug 96 15:15:00 EDT

HELO foo.bar.com

250 baz.com Hello foo.bar.com [127.0.0.254], pleased to meet you

MAIL From: joe@bar.com

250 joe@bar.com... Sender ok

RCPT To: mary@baz.com

At this point, the proxy no longer plays any role in the protocol, other than forwarding data in both directions:


250 mary@baz.com... Recipient ok

DATA

354 Enter mail, end with "." on a line by itself

This is a note

.

250 Mail accepted

QUIT

221 baz.com delivering mail

For the anonymous proxy of electronic mail, the proxy proceeds as outlined above with a few changes. It is now necessary to sanitize both the MAIL command and the header portion of the actual message body. Sanitization of the MAIL command is trivial, with a simple substitution of anonymous for joe@bar.com. For the header sanitization, we have taken the conservative approach of deleting all headers, but this may be modified to remove only identifying information in the future and leave the remaining header information intact.

Conclusion

A new primitive, the anonymous connection, has been introduced [5, 8, 9]. Anonymous connections are strongly resistant to both eavesdropping and traffic analysis. This paper demonstrates the versatility of anonymous connections by exploring their use in a variety of Internet applications. These applications include such standard Internet services as Web browsing, remote login, and electronic mail. Anonymous connections can also be used to support virtual private networks with connections that are resistant to traffic analysis.

The onion routing network supporting anonymous connections can be configured in several ways, including a customer-ISP model that moves privacy to the user's computer and may relieve the carrier of responsibility for the user's connections.

Onion routers are based upon Chaum mixes [2]. Other Internet uses of Chaum mixes have been very application specific, focusing on one-way store-and-forward applications such as anonymous remailers [3, 6]. Onion routing moves the mixes below the application level, providing bidirectional and real-time communication channels that can be easily used by a variety of applications and services. Because the efficacy of mixes depends upon sufficient network traffic, allowing different applications to share the same communications infrastructure increases the ability of the network to resist traffic analysis.

References

1: The Anonymizer. http://www.anonymizer.com
2: D. Chaum. Untraceable Electronic Mail, Return Addresses, and Digital Pseudonyms, Communications of the ACM, v. 24, n. 2, Feb. 1981, pages 84-88.
3: L. Cottrell. Mixmaster and Remailer Attacks, http://obscura.obscura.com/loki/remailer/remailer-essay.html
4: R. Fielding, J. Gettys, J. Mogul, H. Frystyk, T. Berner-Lee. Hypertext Transfer Protocol - HTTP/1.1, http://ds2.internic.net/internet-drafts/draft-ietf-http-v11-spec-07.txt
5: D. Goldschlag, M. Reed, and P. Syverson. Hiding Routing Information, Workshop on Information Hiding, Isaac Newton Institute, Cambridge, UK, May 1996. Postscript
6: C. Gülcü and G. Tsudik. Mixing Email with Babel, 1996 Symposium on Network and Distributed System Security, San Diego, February 1996.
7: Internet Engineering Task Force. http://www.ietf.org/
8: M. Reed, P. Syverson, D. Goldschlag. Proxies for Anonymous Routing, to appear in the Proceedings of the 12th Annual Computer Security Applications Conference, December 1996. Postscript
9: P. Syverson, D. Goldschlag, and M. Reed. Anonymous Connections and Onion Routing, to appear in the Proceedings of the Symposium on Security and Privacy, Oakland, California, May 1997. Postscript
10: Peter Wayner. Digital Cash: Commerce on the Net, AP Professional, Chestnut Hill, Massachusetts, 1996.

Privacy on the Internet

Abstract

Contents