Is the Internet Heading for a Cache Crunch?
By Russell Baird Tewksbury
With each passing day, the Internet is growing and becoming more congested. It is estimated that by the end of December 1996, the Internet consisted of more than 100,000 networks, connecting more than 13 million computers to tens of millions of users worldwide. Today both the number of networks and the number of computers connected to the Internet have more than doubled since last year. Subsequently, Internet traffic jams and bottlenecks, or what is known as flash points and hot spots, have become daily occurrences, and network administrators are faced with the difficult challenge of how to provide more efficient bandwidth and server utilization for their customers. In order to meet that challenge, many are turning to proxy caching as a solution, but the negative by-products associated with a proxy cache solution for management of network congestion could prove to be detrimental to the advancement of the network itself.
Client-side caching occurs as Internet users surf the Web. Each Web document a user visits is cached, or copied/stored, on the user's local machine. When that same user wishes to retrieve a previously visited Web document, the user's computer first looks for the cached copy before going back to the original source. On the network side, a similar duplication process occurs called proxy caching and is automatically performed by the server software program that instructs the network computer to copy and save Web pages visited by users on that network. Proxy caching results in countless digital duplicates, or clones, of original Web pages being saved then served on many different networks throughout the Internet, and network users receive Web documents saved on the local server instead of the source server. In short, an original Web page is redistributed many times over, by someone other than the original content provider or copyright holder.
Today the development of an international hierarchical cache system, or global mesh, is well under way. These cache systems are being designed to be interconnected to the national backbone networks and to the regional, local, and LAN-based caches, thereby creating a worldwide caching infrastructure. Some of the many Web cache projects under way are NLANR, or National Laboratory for Applied Network Research (United States); CHOICE Project (Europe); HENSA (United Kingdom); Academic National Web Cache (New Zealand); W3 CACHE (Poland); SingNet (Singapore); CINECA (Italy); and JC (Japan). As the Internet evolves into this international global mesh of caches, its decentralized architecture will become centralized around these cache clearinghouses, providing far fewer points of access to the network.
Both proxy caching and impending implementation of an international hierarchical cache system set the stage for abuses such as individual monitoring and surveillance, tampering, identity theft, censorship, intellectual property or copyright infringement, invasion of privacy, and taxation by government. The integrity of information will decline, and data security risks are sure to escalate given that there is nothing to stop cache owners from altering the source code of a Web document and then passing the counterfeit version on as if it were the original-also called cache poisoning or identity theft; reducing the quality of service of access to Internet resources that do not support proxy cache; profiting from the exploitation and/or sale of confidential or proprietary information obtained through their cache; charging money for access to their cache content; or refusing to accept content or allow access to content, also known as censorship.
Win, Lose, or Draw?
Proxy caching is not as common a practice in the United States as it is in other places such as Japan and throughout Europe, where bandwidth is more expensive and restrictions to it are more common. Among many network administrators, the prevailing attitude is that proxy caching is a good thing. For example, Cache Now!--hosted by Vancouver Webpages--is an Internet-based campaign designed to "increase the awareness and use of proxy cache on the Web. Web cache offers a win/win situation for both content providers and users."
In the United States, the NLANR is working on a cache project entitled A Distributed Testbed for National Information Provisioning. According to NLANR, the goal of the project is "to facilitate the evolution of an efficient national architecture for handling highly popular information." According to information provided on the organization's Web site, NLANR submitted to the National Science Foundation a proposal entitled A Distributed Architecture for Global WWW Cache Integration. The NLANR document states, "What is needed is an information provisioning architecture to support efficient information distribution through the widespread use of network-adaptive caching."
NLANR's report identifies certain "trade-offs" of network caching: "The use of a network cache does not come without a price. However, we believe the benefits most often outweigh the disadvantages." The NLANR cites the advantages of network caching: "Improves browsing response time by migrating objects close to clients. Reduces wide-area network bandwidth. Reduces load on remote servers. Provides some robustness in case of external network failures."
NLANR's report states the disadvantages of network caching: "Information providers lose access counts on their servers. Occasionally non-transparent to end users. Often requires manual configuration. There is always a finite chance of receiving stale data. Requires additional resources (hardware, personnel). Can be a single point of failure. And depending on your point of view, caching has these features which generally find favor among administrators and managers, but are unwelcomed by users: Hides true client address. Provides an opportunity to analyze and monitor user's activities. Can be used to block access to certain sites."
Both Cache Now! and the National Laboratory for Applied Network Research declined to be interviewed for this article. However, NLANR has indirectly addressed some of the same issues on its Web site. Buried deep within NLANR's Web site on a page entitled "Tutorial: Insight into Current Web Caching Issues: Privacy and Security," one can find NLANR's statement regarding statistics collection--"What happens to the Web access statistics collected by centralized caches?"--which was published 16 October 1997, and which includes the following: "Of even greater concern may be the possible misuse of cached statistical data. Information privacy policies of originating web sites mean nothing if intermediate caches have sole control of access data, which may include sensitive information about sites and users viewing those pages and selecting particular links. Is the caching organization free to do whatever it wants with that information? Sell it for marketing lists? Provide it for investigative purposes as they see fit? Sell it to commercial databases? An origin (content) web site may have strict policies regarding information collected regarding sites or users visiting their web server. But a cache may have a completely different policy, or none at all."
The potential for problems with numerous networks' having countless copies of other people's Web pages is the basis for a content provider's worst nightmare: loss of control. To the chagrin of content providers, there is no independent mechanism or standard in place for knowing who has cached a copy of their Web page, whether it is a current version, or whether it's been altered in any way. Every traffic measurement program--used for justifying advertising rates charged to Web page sponsors--is essentially rendered ineffective: the more popular a particular Web page is, the more times it is cached throughout the Internet. For content providers whose business is based on generating advertising revenue, proxy caching results in lowering the number of reported page views to a given Web site, and thus, advertising revenue is more difficult to substantiate.
There is the problem of how to maintain consistency or freshness of content on a page that has been cached. As an original Web page changes or updates, every cached copy--on every network--must be updated or it becomes outdated. In addition, the proliferation of dynamically created Web pages makes it even harder to implement caches. The current solution is for the cache owner--not the content provider--to choose how fresh a document should be.
Another problem associated with proxy caching occurs when a Web page is cached on a remote network and then that cached copy is indexed by one or more of the search engine bots. The search sites have become the navigational hubs of the Internet; they are where nearly all users go to find what they're looking for. When Web surfers use a search site that has indexed cached Web pages, these digital clones can appear right next to, in front of, or instead of the original, true sites.
As users follow a hyperlink to a cached page, chances are they will not find the information they're looking for. Most proxy-server software programs deny access to cached Web pages unless the request for information originates from within that particular network. Anyone outside the network will most likely be turned away. Three search sites have been identified that contain cached Web pages. They are Excite, Infoseek, and Digital Equipment Corp.'s Alta Vista, some of the most popular and frequently visited Web sites on the Internet.
Cached documents are controlled by someone other than the original publisher. By altering the HTML of a cached Web document and then passing the counterfeit version on as if it were the original-called cache poisoning-an organization's information such as e-mail address or requests for credit card information all can be redirected elsewhere without the knowledge or consent of the original publisher. From a corporate identity perspective, cache poisoning would be similar to instant, negative plastic surgery: a business's information can be instantly and electronically cloned by anyone on the Internet and by people outside the geographic borders--and reasonable reach--of the United States.
When he was asked what's to stop tampering and cache poisoning, Vint Cerf, senior vice president of Internet Architecture and Engineering at MCI, said, "Nothing other than judicious use of digital signatures, the effectiveness of which is strongly dependent on public key directories and the difficulty of forging a digital signature." Win Treese, director of security at Open Market, said, "If caches are not well behaved, they will not be trusted by producers or consumers of information. As with many things on the Internet, only a few bad cases of cache misbehavior, problems, or abuse may poison the entire cache business."
According to Nathaniel Borenstein, chief scientist at First Virtual Holdings and current board member of Computer Professionals for Social Responsibility, "There is absolutely no technical remedy to this problem. Legal remedies might conceivably help if jurisdictional problems can be overcome. Some may claim that digital watermarking or other cryptographic authentication technologies are useful in this regard. The fact is that they can be very useful to an expert in proving such tampering after the fact, but their proper use is sufficiently esoteric that it is basically the province of experts. I think it will always be very easy to fool most Internet users into thinking they are looking at a valid page when they really aren't."
Similar views were expressed by Peter G. Neumann, principal scientist at SRI International, fellow of both the ACM and the IEEE, and moderator of the RISKS Forum: "The problem of ensuring data integrity is enormous. Worse yet, there will be all sorts of pointers to less than the most recent updated, corrected versions and unverifiable bogus copies, which will enable misinformation to propagate. Cryptographic APIs, digital signatures, authenticity and integrity seals, and trusted third parties will not help."
E-Commerce, Security, and Proxy Cache
When it comes to information security, especially on the Internet, the majority of individuals and organizations in both the public and private sectors are ill prepared for what lies ahead. Today we are faced with the year 2000 date change problem. In addition, more than 14,000 computer viruses are known to exist, and at least 100 new viruses are created each and every month. Furthermore, computer hackers continue to gain anonymous notoriety for committing such acts as the defacing of the U.S. Department of Justice and Central Intelligence Agency Web sites.
According to the fifth annual Information Week and Ernst & Young Information Security Survey of 8 September 1997, "Among the survey's main findings: Security breaches are on the rise; intranets bring vulnerability; viruses are still a threat; and industrial espionage is real. . . . Of all respondents worldwide, 70% have Internet connections. The United States is more aggressive: Among U.S. IT managers, 82% link their corporate networks to the Internet. This year 22% of U.S. respondents say they're moving vital data on the Internet. Worldwide, only 10% made the same claim."
Additional findings from the security survey indicate, "Security breaches have made IT security professionals wary. Indeed, more than 75% of the 627 IT managers and professionals surveyed in the United States believe authorized users and employees pose a threat to the security of their systems. Also, nearly 70% of the U.S. respondents see computer terrorists as a threat. Another 42% also see a security threat from competitors. Even service providers, consultants, and auditors are suspect, according to 61% of the U.S. respondents."
According to an article in the April 1997 issue of Internet magazine, it is estimated that by the end of 1997, "more than 68 million people will be on the World Wide Web. Ninety percent of the largest companies will have a commercial Web site. Transaction volume will increase by 400 percent as Web-based commerce becomes a major hot spot." Forrester Research claims that in 1996 electronic commerce represented a $518-million industry, and it is estimated that it will reach $6.6 billion by the year 2000. Active Media has stated that in 1996 $436 million was the figure, with predictions of $46 billion by the end of 1998.
Considering the opportunity for profit, the threat of cache abuse is significant enough to warrant attention to this issue. Based on the premise that trust is a prerequisite for any financial payment system or network to be accepted, issues such as proxy caching must be addressed and resolved now; otherwise, the inadequacies of today's security technologies pose significant risks to the future of electronic commerce on the Internet.
The Net Effect
When asked, Will these clearinghouses of cached Web pages become primary targets for tampering, censorship, and abuse? Neumann replied, "Absolutely yes. You might also expect that the FBI [U.S. Federal Bureau of Investigation] would want guaranteed surreptitious access to all caches-for example, for setting up stings and monitoring all accesses--much as they are seeking key--recovery mechanisms for crypto."
According to Neumann, "Any privacy-sensitive country would want to restrict the potential for data aggregation flowing outside its boundaries. Some countries would want to ensure that information relating to crime, kiddie porn, drugs, arms procurement, national security threats, etc., would not flow in. . . . As is the case with all innovative technologies, there is a conflict between what is possible and what is sensible. The desire to lead a sound life in the presence of technology that is not secure, reliable, or safe is often at odds with the opportunities to take commercial advantage of weaknesses in the implementation of the technology and with desires of governments to control or monitor their citizens. There is also often a strong disconnect between national interests and international interests. The situation you are exploring is just one more instance."
When asked about the effects of replication on censorship and privacy protection, Borenstein replied, "This is one of the biggest threats. The most salient feature of this technology is that it makes it easier to try to censor the Internet. A small, repressive country could establish a choke point on its Internet traffic and effectively censor the entire Internet as seen from inside that country. Sophisticated users will be able to work around these restrictions, but governments may well be able to detect and prosecute such workarounds. Insofar as the caches are used even for access-controlled Web pages, proxy caching and replication could have a very serious effect on privacy by exposing the access-controlled information to anyone who gets access to the cache."
Borenstein summarized his views concerning proxy caching and implementation of an international hierarchical cache system/global mesh by saying "It's a bad, bad idea." As for suggesting possible alternatives, "There aren't any problems proxy caching and replication solve that can't be solved by deploying more bandwidth. The answer is simple: more bandwidth. Provide more bandwidth, and the whole problem goes away," he said.
According to John Klensin, senior data architect at MCI, "These issues are among those that cause many content providers/owners to prefer mirrored sites, whose content, like that of the original sites, is controlled by them either directly or via traditional business arrangements. MCI believes that as better methods of referencing Web materials than one-site/one-domain-name URLs are standardized and deployed, mirroring approaches will come to dominate and global caching meshes will become an evolutionary dead end."
Privacy expert Ann Cavoukian, information and privacy commissioner of Ontario, Canada, offered the following insight: "In general, there are several things that can be done. One is to educate Net users so they are more knowledgeable about the privacy aspects of the Internet. Another is to lobby those in the industry to convince them that caching is an issue that they must address or they'll risk losing their users. Since caching has benefits especially as usage overtakes capacity, the argument that must be made to those in the industry is that they must provide alternatives that allow users options to choose from. Finally, by far the solution most unlikely to be achieved but that is most desirable is to have universal data protection legislation covering the private sector. Unfortunately, this is highly unlikely given the current situation in the U.S." Cavoukian, along with Don Tapscott, is coauthor of the book Who Knows: Safeguarding Your Privacy in a Networked World.
History has shown that privacy and security do not occur by happenstance
but by design. In the case of proxy caching, the apparent need
on the part of network administrators for immediate gratification
regarding congestion control should not take precedence over good
judgment in responsible development of the Internet. We must consider
the value and cost of designing and implementing network security
mechanisms that take into account network performance expectations.
In terms of future development of the Internet, it is this author's
opinion that we must be willing to avoid shortsighted solutions
and make the investments necessary to have the kind of communications
network we will want to use both tomorrow and in the years ahead.
Join the Internet Society today: http://www.isoc.org