Yoichi Shinoda <email@example.com>
Japan Advanced Institute of Science and Technology (JAIST)
Tomomitsu Baba <firstname.lastname@example.org>
Nara Advanced Institute of Science and Technology (NAIST)
Nobuhiko Tada <email@example.com>
Matsushita Electric Co.
Akira Kato <firstname.lastname@example.org>
University of Tokyo
Jun Murai <email@example.com>
Keio University, Shonan Fujisawa Campus
Recent growth of the Internet indicates that it is emerging as a common communication infrastructure among large masses of people. Furthermore, it can, by its distributed nature, be considered as a robust system. These characteristics collectively suggest that the Internet is capable of supporting information exchange during disaster support and recovery. However, the Internet and other computer networks did not function as we expected during the support and recovery phases of the Great Earthquake of Hanshin-Awaji (also known as the Great Earthquake of Kobe), which struck the area in the early morning of 17 January 1995.
The WIDE Internet project launched a LIFELINE task force shortly after the quake, to work on network information systems that really provide computer support for recovery from such disasters. After nine months of discussion, we conducted the first Internet disaster drill that lasted for a two-day period, starting at 5:46 a.m., 17 January 1996, a date and time we cannot forget. The drill included two independent subdrills: operation of a disaster support information database and restoration of damaged backbone by satellite channels. More than 6,000 people participated in the database drill.
This report describes the purpose and description of the drill, and the experiences we gathered from the two-day operation.
In the support and recovery of a large-scale urban disaster such as the Great Earthquake of Hanshin-Awaji, which struck the area on 17 January 1995, communications of various types play an important role. Information exchanged will be diverse, intense, and dynamic. The form of the communication is also diverse and likely to include point-to-point, point-to-multipoint, and broadcast communications. It is also expected that information exchanged will range from personal information to public announcements. Conventional communication media such as telephone, television, radio, and newspaper are not capable of handling these diversities in an integrated fashion.
The communication activity that took place on the Internet and closed computer bulletin board systems (BBSs) during the support and recovery phase of the great quake was not satisfactory, but suggested to us that computer communication is capable of handling the diversity in the required communications. The Internet was especially noted for its inherent characteristics:
In the area struck, we observed various problems in communication, such as garbling and omission of information during transmission, as well as communication delays. We believe that a properly designed information system that is build on network technology can solve these problems. The system should be capable of operating under extreme conditions and centralizing information using distributed services.
The Internet currently is not a part of our real lifeline like water, food, and electricity because of its limited popularity. However, as the Internet continues to expand, and as more information is exchanged on it, we expect that the Internet will gain the role of a lifeline. For example, the following kinds of information are actually lifelines:
Without this information, people will not be able to reach the actual physical lifeline materials. In conventional society, this information is said to be distributed naturally by means of communication mechanisms inherent to the community affected. It is also said that in a large-scale urban disaster such as the great quake, distribution of vital information should be done systematically by mass media. However, in the great earthquake, the vital information was not distributed as required. This implies limitations of the conventional means of information distribution system based on telephones, facsimile machines, and mass media such as television, radio, and newspapers.
We expected that the mass communication media would work as distribution mechanisms for vital livelihood information, but they fell short of our expectations. On the other hand, small regional newspapers were founded and played an important role of complementing the large mass communication media. The Internet and BBSs are well suited for distributing livelihood information. The livelihood information is highly adhered to the regional structure unit. Regions vary in size from that of the neighborhood, a few blocks, school districts, cities, counties, prefectures, and the whole stricken area; they all have different types of information to distribute, and a hierarchy of regional computer networks will fit into this needs.
Furthermore, a method for communication among rescue and support personnel including volunteers is required at various phases of disaster recovery. This communication channel is used for carrying information such as status reporting, support requests, and proposals. This information, if shared among relevant personnel in a integral fashion, can support rescue and recovery actions to a great degree. Computer networks are potentially well suited for this type of communication channel. By adding robustness and redundancy to the computer networks, we can provide a network infrastructure that is capable of exchanging the information required during disaster recovery.
The WIDE Internet Project has launched a LIFELINE task force which is intended for research and development of such networked information systems capable of supporting disaster support and recovery in various ways. A number of lengthy meeting were held during its nine-month period of activity, and accumulated requirements of the intended system, but no system was actually built or actual modifications made to the network operated by the WIDE project.
One lesson we learned from the great quake was that systems which are not in daily use or continuous review cannot be used in a crisis situation. The task force decided to follow this lesson, and conducted the first Internet disaster drill.
The drill included two independent subdrills code-named IAA and WISHBONE. The IAA drill is the operation of a disaster-support information database, and the WISHBONE drill is the restoration of damaged backbone by satellite channels. More than 6,000 people participated in the database drill. In the following, descriptions of each drill are presented, with lessons learned from them.
IAA is an acronym for "I Am Alive." The IAA database system was designed to accumulate and serve queries for information about people's safety. As discussed above, a generic disaster-support information database, capable of accumulation and distribution of a diverse pool of information, together with a robust information network system can support various stages of disaster support and recovery. Such a database system will be a quite complex piece of software, and it could even take years before system requirements are fixed. Instead, we decided to build a simpler system that can handle a very limited type of information as a first approximation of the final system.
We chose to handle people's safety information in our drill for several reasons. When a disaster strikes an area, telephone lines would almost always get congested with calls from outside of the area, seeking for one's safety information. In the Great Earthquake of Hanshin-Awaji, the worst congestion lasted for more than three days. During this period, calls within and originating from the struck area to the outside were also affected, blocking important calls.
A database system located outside the affected area, accumulating and distributing the personal safety information can relieve the phone lines from the stress of point-to-point calls. By designing, building, and operating such a database system, we can accumulate experiences on database systems with unusual requirements, and we will also have at least one system ready for a next disaster.
Major technical design goals for the first IAA database system in this drill were as follows:
Figure 1 illustrates the overall arrangement of the IAA database system we set up for the first drill. The entire system is composed of a set of so-called IAA clusters and a load distribution mechanism. An IAA cluster is a set of computers, each providing an identical set of functions which implements a complete IAA service. We used four IAA clusters distributed geographically apart from each other: at WNOC Nara (NAIST), WNOC Kyoto, WNOC Fujisawa (KEIO-SFC), and JAIST.
Figure 1. The IAA system for the first drill
This replication and distribution of IAA clusters is designed to accomplish the goal of scalability and robustness of the entire system. Requests for service can be distributed to replicated clusters for processing, and since all clusters provide the identical functionality, the capacity of the system can easily be increased. The replication also accounts for tolerance against cluster level failure. The geographical distribution accounts for two points: to reduce the chance of the entire IAA system being hit by a main quake or a series of after-quakes, and to provide service continuously even if an interconnecting network fails for some reason. In the actual implementation, the load distribution was handled by round robin issue of A and MX records by DNS server. Figures 2 and 3 illustrate the logical structure of each IAA cluster.
Figure 2. The IAA cluster (registration side)
Figure 3. The IAA cluster (inquiry side)
This subsystem processes raw inputs arriving by e-mail and those generated by an HTTP server. The raw information received is turned into canonical form by a input parser. The parser handles character code translation, timezone translation, and deletion of redundant information. It also looks at semantics of the received information and tries to standardize the language expression used in the information to some canonical form. When a parser fails to process a piece of information, it is reported to an operator via e-mail. The operator corrects the information, and sends it back for reprocessing. When the input processing is complete, the information is passed to a transport subsystem for input to a storage subsystem.
Each IAA cluster processes registration and query of IAA information independently from other clusters. Therefore, some form of database synchronization must be provided. The transport subsystem is responsible for distribution of information received by one IAA cluster to other IAA clusters, even during the failure of the underlying network infrastructure.
We chose the Usenet NetNews system running on-demand feed of articles by the Network News Transfer Protocol (NNTP) as the basis for this subsystem. When the underlying network is functioning, news distribution daemons can interact with each other synchronously as new information arrives at some cluster. When the underlying network is not functioning, information can be stored in a spool directory for future transmission. It also provides us with the unique feature of multiple news feeds. By cross-feeding multiple clusters, we can easily implement a system that tolerates partial network failures, including routing anomalies. In our implementation, we added a set of program modules so that information is encrypted when entering the transport subsystem for transmission, and decrypted when exiting the subsystem.
The storage subsystem is the actual database system, complete with the database itself, a search engine, an index generator and input/output interface modules. On the registration side, it creates an index from information delivered from the transport subsystem and stores them in the database. On the query side, it answers search queries from the retrieval subsystem.
In the drill, we tried to load all four clusters with different implement implementation of the database modules. Three were designed, and two were actually implemented and used. This is a widely known programming practice called n-version programming. In the IAA system, all clusters will eventually store the same information. If all clusters are loaded with a single implementation, it is very likely that if one cluster fails to process a particular bit of information, the entire system fails to process the same information.
The retrieval subsystem processes queries submitted by e-mail and Web browsers. This subsystem includes a parser similar to the input subsystem. After the query is canonicalized, it connects to the search engine, submits the search query and receives a search result. The search result is formatted into a e-mail message and returned to the original client.
This subsystem is responsible for the implementation of the privacy protection policy. In our implementation, we decided to reject exhaustive queries. For example, queries such as "Find records for Smith" or "Find people with home address in city of Kobe" are rejected.
For user interface to the IAA system, we provided e-mail and the Web. We have intentionally biased our interface toward use of e-mail, to handle users without a direct connection to the Internet. We also made a design decision of using the e-mail for returning query results. This is to prevent a user's expected behavior on a Web interface: that the user would rapidly repeat the same query by a simple click on a mouse button when the user sees a negative search result, which effectively prevents new queries being accepted by the retrieval subsystem.
This is essentially a database design problem of determining record fields that are essential and required for spotting someone in the database. In our implementation, we chose the following items as fields in the IAA database record:
In addition to the above fields, the IAA system will add information such as originating host, return address, date and time the information arrived at each module, and so on. Some of these system-added fields are for analysis purposes and some are essential.
The actual drill ran for almost two days, from 5:46 a.m. on 17 January to 12:00 a.m. on 19 January 1996. We invited users of the Japanese Internet to be hypothetical victims, and more than 6,000 people participated in the drill.
The operation of the IAA system was a successful failure. It was a success in that it actually accumulated over 6,000 IAA information items and answered queries for twice that figure, but it was a failure in the sense that it failed to provide the service in the way we have expected: smoothly and promptly. There were numerous deficiencies in our design, but the worst turned out to be our design decision to use e-mail as the primary user interface. Use of e-mail as means of simple and handy information transport failed miserably. The load average of IAA clusters, especially the machine that held input/output front-end (input and retrieval subsystems) skyrocketed during peak hours.
The largest reason for this behavior of the IAA system was probably the fact that sendmail (or the TCP) waits too long before connection establishment, or aborting the established connection when the remote host is not responding. In first few weeks of January 1996, the largest Japanese BBS with Internet e-mail connections were experiencing heavy congestion on e-mail deliveries because everybody with an account on that system decided to send new year's greetings to everybody else. This situation was still occurring on the day of the drill. It turned out that the BBS system was sending out messages but not receiving them well. When the IAA system processed registrations and queries from this system, sendmail blocked in the final output phase and the output queue containing well responding sites just simply built up. In the task force meeting, we talked about the possibility of the e-mail system failing during the disaster, but we did not expect this to happen in this drill. We definitely have to come up with better mechanisms for user interface. A lighter e-mail delivery process with a negative connection cache would be a candidate if we are going to use e-mail as the primary interface.
The design of our control flow also accounted for high load average. A pipeline of processes with sendmail at the end simply multiplies the number of processes hanging around in the memory. In our next implementation, we would take a queue-based approach.
Looking on a brighter side, the IAA transport subsystem worked without a single glitch. Information that exited the transport subsystem matched among four clusters from head to tail throughout the two-day operation. The n-version programming on database engines functioned successfully also. We started with three copies of a homebred database engine and a database based on a commercial database system. Shortly after the drill began, the latter was found to be not processing particular input on some fields. We could shut down the cluster with the problematic database while debugging was in progress, leaving the rest of the clusters up and running.
For the disaster-support distributed database to move information into and out of the area of the disaster properly, continuous connectivity is required. Surface-based communication links can also be damaged, and it may take a long time before they are operational again.
Satellite communication channels are capable of continuous service regardless of the surface damage, provided that the surface equipment are not damaged. The WIDE project operates the WISH (WIDE Internet with Satellite Harmonization) task force, which seeks utilization of satellite channels on the Internet. The action plan for the WISH task force included an experimental use of satellite communication links as backup links for our backbone network, and we found that a good chance to conduct this experiment as a part of the drill.
The WIDE project operates the somewhat linear backbone with two major spots of concentration of stub sites: Tokyo and Kyoto. As Tokyo NOC had too many connections, we could not find enough resources such as satellite channels and dishes to back up all of these connections, and it was simply too large for the first experiment. So, it was natural that Kyoto NOC was selected for our drill scenario.
Figure 4 shows a simplified configuration of the affected part of the network operated by the WIDE project, the links that were disabled, and how the satellite links were used to reconnect the isolated networks.
Figure 4. WISHBONE configuration
Satellite facilities used at each site were as follows:
Hiroshima NOC (Hiroshima City University) did not have the 1.8 m antenna at the time of the drill, so a 75 cm (30 inch) diameter portable VSAT antenna was brought in and deployed by a team of WISH task force members on the day prior to the drill.
Although the network operated by the WIDE project is primarily intended for experimental purposes, it is our primary means of connection to the Internet, so each link was tested separately, including basic operation of the gate in the satellite router prior to the drill.
In the drill, all four of the connections at Kyoto NOC, including two backbone links, were manually disabled for almost three hours, starting at 5:46 a.m. on 17 January, and isolated networks were linked back together by three different satellite communication links. The satellite channel itself was to be activated manually shortly (2 to 12 minutes, depending on the number on dice rolled at each site) after the links are confirmed down. Alternative route selection was managed by preconfigured OSPF speaking routers on and off the actual backbone.
JAIST was not connected to the OSPF backbone area, so the virtual link capability of the OSPF was used to extend the OSPF backbone area to the WISHBONE router at this site.
The satellite links came up datalink-wise a few seconds after they were enabled. It took a couple of minutes for the OSPF database to synchronize and for new routes to be established. This is unusual if links are surface-based, but since the OSPF DD synchronization process allows a window size of only one packet, it takes a lot of time when this process is run on the satellite link, which has inherent RTT of 500 milliseconds.
At 6:00 a.m., we thought the WIDE network had regained full connectivity, but there were unexpected glitches (as we had expected). It took another 45 minutes to troubleshoot these minor glitches, and around 7:00 a.m., the WIDE network began its stable operation with full connectivity within the network and to the rest of the world.
The drill taught us a numbers of lessons, both technical and nontechnical:
In a nonbackbone area with one router with a damaged serial link and another router with a WISHBONE link, a backbone virtual link should be configured between these two routers. Otherwise, the former router can not accept any routes introduced to that area from the WISHBONE router (Figure 5).
Figure 5. Backbone virtual link was not long enough
The router at the Kyushu University with the damaged link was also a BGP speaker with peer address on the serial interface that was driving the disabled link. When the link was disabled, BGP peers lost their connections to that router, and it caused some sites within Kyushu University and sites connected via Kyushu University to loose connectivity beyond the network operated by the WIDE project (Figure 6).
Figure 6. Loss of BGP peer address
This suggests to us that the BGP peer address should be the one with the most stable operational characteristics. For example, an address on a serial link is less preferable than an address on an Ethernet. The most preferable selection is an address on a software loopback interface found in some router boxes. Prior to the drill, or belief was that assigning addresses to this software loopback was a waste of address space, but now, we are aware of the importance of an interface which never goes down if we are going to build a network that continues to provide stable connectivity during link failures.
We found out that one of the WISHBONE routers was not running the VLSM capable version of gated. This caused many routes to variably subnetted networks to be dropped and, consequently, failed to deliver routes to many parts of the WIDE network. The test that was run on separate links prior to the drill failed to find this out.
Although there were some minor troubles, we evaluated the WISHBONE drill as a success. As we used no new techniques, no new piece of hardware or software, we expected the drill to be successful. However, problems listed above collectively suggest to us that we have to review our network's software and hardware configurations and test them in the real environment thoroughly (as we have done in the drill), to build a network which is ready to provide continuously stable connectivity during wide-area disaster situations.
The result of the first Internet disaster drill was a mixture of failures and successes, but we believe that the biggest success was that the drill was actually conducted. As mentioned earlier, the largest lesson we have learned from the Great Earthquake of Hanshin-Awaji is that a system of any sort can not be utilized in a disaster situation if it is not in daily use or receiving constant review. Many systems, from support and recovery plans on paper to walkie talkies with exhausted batteries, have failed to function. We now have an incomplete, but working model of a network-based application for disaster support and recovery.
We have to continue our research on this type of application as a system. By system, we mean not only the database system, but a complete system including devices and methods for handling data in extreme conditions. Cooperation with various public systems and various media are likely to be a necessary part of the system, and these arrangements should go through the drill as well.
During the IAA drill, we accumulated much data that can be used to re-run the drill to test our new implementations. In addition, we can gather a collection of comments from drill participants. Some of these comments will be used to improve the usability of our next implementation.
Authors would like to thank the members of the WIDE project for their hot arguments in the kick-off meeting of the LIFELINE task force. Members of the LIFELINE task force, especially those who participated actively in the development and operation of the IAA system deserve to receive our sincere gratitude. We also would like to thank members of the WISHBONE task force who deployed the entire system and operated the system during the drill, as well as Japan Satellite Systems Co. who donated the satellite channels during the drill. Users of the WIDE operated network who experienced inconvenience during our WISHBONE troubleshooting also deserve our gratitude for their patience. Finally, we would like to express our sincere gratitude to the 6,000 people who volunteered to participate in the IAA drill.