Study of an Efficient Server Selection Method for Widely Distributed Web Server Networks

Tsukasa OGINO <ogino@fastnet.co.jp>
Masahiko KOSAKA <kosaka@fastnet.co.jp>
Yoshiyuki HARIYAMA <hariyama@fastnet.co.jp>
Kazuhiro MATSUDA <kmatsu@fastnet.co.jp>
Kazuaki SUDO <donasuke@fastnet.co.jp>
FastNet, Inc.
Japan

Abstract

In order to disperse the load on a Web server, generally the server cluster is configured to distribute access requests, or mirror servers are distributed geographically or situated on different networks. However, although there are several proposals for leading clients to the most efficient server according to the constantly changing server and network condition, as yet no definitive solution has been proposed.

In this document, we propose a measurement method for dynamically changing server and network environment, a new selecting method to find the most efficient server based on the measurement method, and an efficient server selection system for leading the access clients to the most efficient server among the distributed Web server network.

Under this method, we use AS (Autonomous System) path routing information of BGP (Border Gateway Protocol) as the factors for evaluating the logical distance of the network. We also try to determine the most efficient Web server by using various server/network information measurement tools. Experimental Web server sites have been set up in both Japan and the United States, and a prototype system was implemented on the Internet and its performance was evaluated; results of the experiments and evaluation are included in this report.

Contents

1. Introduction

Along with the rapid expansion of the Internet, the information delivery system using the World Wide Web (hereinafter the "Web") has also expanded rapidly as a simple tool for delivering and accessing information. Web server sites that can be accessed from all over the world, such as the official Web server for Nagano Olympics, have sprung up, and such sites are growing in importance. On the other hand, in this type of Web server site that delivers information worldwide, access requests come in from all over the world, and so the site must be able to handle a large number of visitors. Load dispersion of the Web server is thus an important subject for study.

Several proposals have been made for dispersing the load on a Web server in order for the client to access the most efficient server. For example, some of the common methods are to create a server cluster composed of several Web servers on the site to distribute many access requests across several Web servers, and to set up mirror servers with similar contents on other sites that are either at a different geographic location or situated on different networks in order to disperse accesses to the same region or network.

However, as described later, it is extremely difficult to disperse a concentrated access load properly using a general method, because the regions or networks to or from which accesses are made to the Web server are not uniform. Access time is also variable. For instance, access to the Web server does not come from a uniform region or network, and there is considerable bias in the access distribution. For the Web servers that have accesses from all over the world, access distribution is also greatly affected by the time of day in the region. In terms of access time, access distribution varies greatly changes between business time and midnight.

In this document, we propose a measurement method for dynamically changing server and network environment, a new selecting method to find the most efficient server based on the measurement method, and an efficient server selection system for leading the access clients to the most efficient server among the distributed Web server network.

Under this method, we use AS (Autonomous System) path routing information of BGP (Border Gateway Protocol) as the factors for evaluating the logical distance of the network. We also try to determine the most efficient Web server by using various server/network information measurement tools. Web server sites for demonstration and experiment were constructed in both Japan and the United States, and a prototype system was implemented on the Internet and its performance was evaluated. This report includes the results of the experiments and evaluation.

In this document, the former load dispersion method is outlined in Chapter 2. In Chapter 3, the efficient server selection system proposed in this study is described, and in Chapter 4, we look at the performance evaluation of the parameters used and test results from the proposed prototype system. Chapter 5 examines the proposed method and Chapter 6 includes our future tasks and conclusions.

2. Existing load dispersion method

Currently several methods have been proposed. The two typical methods are outlined in Figure 1 and Figure 2.

2.1 Virtual host method

Under this method, the virtual host shown in Figure 1 is used. This method is designed to disperse concentrated access load evenly over several servers. As an example, let's look at a case where a client accesses the uniform resource locator (URL; access address), www.foo.com. When an attempt is made to access www.foo.com, access is initially made to a virtual host. Then, the virtual host attempts to disperse the access adequately according to the load status of each subordinate Web server. In fact, the particular information is delivered from www1, www2, or www3 when an access is requested by a client. The access is received by the virtual host first, in order to ensure appropriate access dispersal.

However, the case shown in Figure 1 using a virtual host has the following problems:

  1. A separate device that acts as a special virtual host is necessary.
  2. As the special virtual host receives all access requests, if there are a lot of them, the processing ability of the virtual host becomes a limiting factor. This means that the virtual host's actual processing ability determines the processing ability of this Web server site.
  3. Under the virtual host method, the server accepts access only to its own site. Therefore each dispersed Web server site must have its own respective virtual host in order to standardize access to its own Web server sites. This means that it is necessary to set a respective fixed individual URL. This makes it necessary to select a dispersed Web server site from the access side voluntarily.
  4. As the access side selects the Web server site, it is not possible to standardize access to each Web server site.

2.2 TCP connection hop method

Next, we look at a transmission method using transmission control protocol (TCP) connection hop in Figure 2. When an access request is made by the client, at first it is passed to www1. At www1, a scheduler is started. Here, the scheduler selects www3 as the most efficient Web server, and a response is made to the access request from the client. This means that www1 responds to access demands at all times and the most efficient Web server (www1, www2, www3) then responds to the access in turn. Under this method, access is dispersed using a scheduling function.

However, TCP connection hop transmission method shown in Figure 2 also has these problems:

  1. As all access requests are concentrated on the Web server where the scheduler resides, if this Web server fails, the entire server site stops functioning.
  2. It is not possible to control access to a distributed Web server site efficiently. This method aims to standardize access to the Web server site (in its own network) that is controlled by a scheduler, so it is necessary to set a fixed different URL for each Web server site just as it was for the virtual site method described previously. This makes it necessary to select a dispersed Web server site from the access side voluntarily.
  3. Just as in the virtual site method described previously, the access side selects the Web server site, so it is not possible to standardize access to each Web server site.

Fig1 and Fig2
Figure 1. Virtual host method / Figure. 2 TCP connection hop method

3. Efficient server selection system

3.1 Outline of the efficient server selection method

In order to select the efficient Web server for clients accessing from distributed Web server groups, we defined the following goals for our proposed method:

  1. To disperse the load equally in the same site.
  2. To disperse the load equally even in several dispersed sites.
  3. Not to place excess load on the Web server.
  4. To process responses from a client rapidly.

In order to achieve these goals, we collected and analyzed the following information, and based on the results configured a system to select the efficient server using this method. The characteristics of the system are as follows.

  1. The network distance between the access client and Web server is obtained from the routing information. Specifically, the logical network distance information is collected from AS Path information of BGP and the distance from each Web server is compared in order to find the Web server that has the shortest path.
  2. The status of the network between each Web server and the client is measured. Specifically, RTT, packet loss, throughput, and the number of router hops are measured.
  3. In-site network information measurement: transmitting and receiving network traffic in the site, number of packet errors, number of collisions occurred, gateway (GW) router traffic, and number of GW router packets rejected are measured.
  4. The status of each Web server is measured. The load on the server including the number of TCP connections established, the disk load, central processor unit (CPU) idle status, and load averages are measured.

In this method, the efficient Web server is selected based on the above-mentioned measurement results.

3.2 Prototype system configuration

The system is composed of the following four systems (Figure 3):

Fig3
Figure 3. System configuration

3.3 System functions

(1) Network Status Probe

The Network Status Probe (NS-P) has the following three measurement functions:

Under the network information measurement function, RTT (Round Trip Time) between client and server, packet loss, throughput, and number of router hops are measured. Under the in-site network information measurement function, transmitting/receiving network traffic in the site, number of packet errors, number of collisions occurred, GW router traffic, and number of GW router packets rejected are measured, and the operating condition of the server in the site is monitored. Under the server status measurement function, the load on the server including the number of TCP connections established, the disk load, CPU idle status, and load average are measured.

The same functions as UNIX commands are directly incorporated for implementation of the measurement function. The relationship between each measurement item and commands for the measurement function are shown in Tables 1 and 2.

Table 1: Network information selection
Measurement Item Measurement Unit Equivalent Command
RTT ms ping
Packet loss % ping
Throughput bit/s (Independent)
Number of router hops No. of layers traceroute
AS path length No. of layers (RS)
Table 2: In-site network information measurement function list
Measurement Item Measurement Unit Equivalent Command
Transmitting/receiving traffic packets netstat n
Number of packet errors packets netstat n
Number of collisions occurred No. of times netstat n
Router traffic bps snmp
Number of router packets rejected packets snmp
Table 3: Server state measurement function list
Measurement Item Measurement Unit Equivalent Command
Number of TCP connections No. of times netstat
Disk load No. of transmissions iostat n
CPU idle value % iostat n
Load average No. of processes uptime

Regarding throughput

In the case of communication such as Telnet where data is generated progressively, the influence of bandwidth is small. Therefore, latency is important, so we decided that throughput could be disregarded. On the other hand, in the case of bulk data such as on the Web, as the influence of bandwidth is large, throughput is an important issue. We used ttcp[1] and netperf[2] commands as tools for measuring network throughput. However, in order to use these tools, it is necessary to start a process on both the transmitting and receiving bases of the network being measured, so it is impossible to carry out real-time measurement of the distance between an unspecified client and a site such as in this case. Therefore we independently created and incorporated a measuring function for throughput. We created the throughput measurement function in the prototype system under the following conditions:

An implementation example is as follows. After issuing a 64-byte Internet control message protocol (ICMP) echo request twice consecutively for a client, a 1024-byte ICMP echo request is issued once. From the difference between the result of the second 64-byte (RTT1) measurement and the result of measuring 1024-byte data issued at the end (RTT2), throughput is calculated. The calculation formula is as follows:

Throughput = (1024 - 32) * 2 * 8 / (RTT2 - RTT1)

The first 64-byte data is measured in order to load (overhead) the client program; therefore it is ignored for data purposes. Throughput as described in this document means that incorporated in the prototype, and in the following the measurement result of this throughput is described.

NS-P is implemented on all servers in the site, one of which acts as a master, and the remaining NS-Ps act as slaves. A slave NS-P continuously measures the load on its own server. The master NS-P measures between the client and site as a network information measurement function and GW router traffic as an in-site network information measurement function. The master also collects the load result of each server as measured by the slave from all slaves in the site and collectively controls the load of the entire site (Figure 4). An NS-P consists of approximately 7,300 lines of C language and is multi-threaded. The operating system is FreeBSD at present.


Figure 4. NS-P configuration

(2) Network Status Server

The Network Status Server (hereinafter "NS server") is the core part of the entire system and has the following five functions:

In the network information acquisition function, network information selected by the NS-P is collected. In the efficient site selection function, the efficient site is determined from the measurement data obtained from the NS-P through RS or the network information acquisition function. In the efficient server selection function, the most efficient server at present for receiving access from a client is selected and determined by the NS-P master. The most efficient server here means the one with the lightest load. In the data aggregation function, the information on clients is aggregated and controlled. The information controlled here is the IP address of the client and network selection result between client and server. The site control function monitors the load status of each site. The NS server consists of approximately 5,600 lines of C language and is multi-threaded. The operating system is FreeBSD at present.

(3) NS-Agent

NS-Agent is implemented on Apache of the Web server as "Module (mod_nss)". The module[3] allows the functions of Apache to be freely changed. The developed NS-Agent consists of approximately 1,400 lines of C language. NS-Agent starts when Apache receives a hypertext transfer protocol (HTTP) request from a client (Web browser). This means that whenever Apache receives an HTTP request, NS-Agent is called within Apache and the necessary processing for the received request is executed. The NS-Agent that is called executes the following two processes:

Specifically, when an HTTP request arrives at Apache from a client (Web browser), Apache calls NS-Agent. NS-Agent obtains the IP address of a client within Apache and issues an efficient server information command to the NS server. At this time, the IP address of the client is set as a parameter. The NS server that receives the efficient server information command from NS-Agent determines the nearest site to the client from the dispersed sites and returns the IP address of that server in the site to NS-Agent. After receiving the IP address of the server, NS-Agent returns an HTTP response to the a client. At this time, the IP address of the server that specifies the nearest site is set in the response, and 302 is set in the HTTP response code. Response code 302 means "Moved Temporarily,"[4] so the Web browser of the client receiving this code redirects to the IP address of the specified server.

(4) Route Server

We modified the Zebra[5] Route Server for our system. Zebra supports some routing protocols; however, this time we only changed the BGP protocol processing part and added configurations and commands. Specifically, we added a configuration so that the GW router address of each site can be registered, and the additional commands enabled us to receive the client IP address. When this command is received, Zebra calculates the number of AS hops between GW routers of the client and each site from the IP address of the client set in the command, and responds with a list of GW routers of each site and number of AS hops to the source of the command.

3.4 Algorithms

In the prototype, the efficient server is determined by two-step algorithm. In the first step, the most efficient site is determined from the collected network information. Next, the server that has the smallest load at the efficient site is selected and determined as the efficient server. The efficient site selection algorithm is implemented on the NS server, and the algorithm for selecting the efficient server is implemented on the NS-P master. After the efficient site has been determined, the NS server sends an inquiry to the NS-P master at the efficient site for the most efficient server. Details of the algorithms for selecting the efficient site and efficient server are described next.

3.4.1 Efficient site selection

There are two efficient site selection methods depending on whether a client is accessing for the first time or not:

(1) Selection by the number of AS hops

In order to acquire the number of AS hops, the NS server issues to the RS the number of AS hops acquisition command (ASInfoCOM) in which the IP address of the client is set, and obtains a list of AS hop numbers as a response (ASInfoRSP). The efficient site is selected from the list obtained here. The content of the response is a list of router addresses of the site and numbers of AS hops to the client. The site with the minimum number of AS hops is selected from this list as the efficient site. For instance, in the case of:

    RTList: address 202.228.128.217 AS Length 1 <- SiteA
    RTList: address 216.98.110.62 AS Length 3 <- SiteB

SiteA with the smaller AS Length is chosen as the efficient site.

(2) Selection by network information

Here, the efficient site is calculated using the network status value and site status value. First, the network status value is calculated from the network information (RTT, number of router hops, etc.) secured in the aggregation table. Next, the site status value is calculated from the in-site network information (transmitting/receiving network traffic, number of collisions occurred, etc.). The sum of the calculated network status value and site status value is the efficient site judgment value for the site. Then, the efficient site judgment values of each site are compared and the efficient site is identified. The flow of efficient site selection is shown in Figure 5.


Figure 5. Flow of selecting efficient site

The detailed calculation is described next. The network status value is calculated by adding the weight of data for each parameter of the present network status. The calculation formula is as follows:

Network status value = ASL*A + RTT*B + RN*C + PL*D + TP*E (1)

The codes here indicate the following measurement data and coefficients:

In the prototype system, each parameter value of the network status is calculated. However, we have validated the number of AS hops and number of router hops as parameters to be used for calculation as a start. Each weight coefficient is 0.5. We decided to investigate the coefficient of other parameters from the results obtained through preliminary experiments conducted to test the validity of the network information.

The in-site network status value is calculated from the parameters of in-site network status by adding the weight of data. The calculation formula is as follows:

In-site network status value = CS*F + PS*G + ES*H + RTR*I + RTE*J (2)

The codes here indicate the following measurement data and coefficients:

In the prototype system, we validated the number of collisions, number of packets, and GW router traffic as parameters used for calculation. We calculated the collision occurrence rate from the number of collisions and number of packets and we calculated the bandwidth occupation rate from the GW router traffic, and to these two calculated figures we added a weight coefficient of 0.5.

3.4.2 Efficient server selection

The server load is calculated in real time, and the server with the smallest load is selected as the efficient server. The server evaluation value is calculated from the measured load and the values of each server are compared. The server evaluation value is calculated from each parameter of server load by adding the weight of data. The calculation formula is as follows:

Server evaluation value = LINK*K + IO*L + IDLE*M + CPU*N (3)

Each code indicates the following measurement data and coefficients:

In the prototype system, we have validated the number of TCP connections established, CPU idle status, and load average. The weight coefficient is 0.2 for the number of TCP connections established, 0.3 for CPU idle status, and 0.5 for load average.

4. Performance evaluation

In this section, the results of preliminary experiments performed using the prototype system and the results of system tests are discussed. Preliminary experiments were performed in order to check the effectiveness of the network information measured under this system.

4.1 Evaluation of network information

(1) Purpose and outline of preliminary experiments

Experiments were conducted to determine the effectiveness (hit ratio) of [Network status value] for [Efficient site selection], using tests on an actual site on the Internet. In the experiment, data transmission time is considered as network distance; this means that the shorter the data transmission time, the shorter the network. Data transmission time here is the time required to transmit data from a Web server to a client when the client accesses the Web, and is calculated by measuring the connection time between client and Web server (site). The status of the network between the client and Web server at this time is measured and the correlation of the data transfer time and the status of the network is examined.

(2) Measurement method

The data transmission time between client and sever sites is measured using the 'tcpdump' command, and the status of the network is measured using the created NS server and NS-P.


Figure 6. Images of same size are dispersed.

The actual procedure (Figure 6) is as follows:

  1. Embed two linked image files at the top of the Web page. Both images are 2,525 bytes.
  2. Place one of the actual images on a U.S. site and the other on a Japan site (they are linked to from the top page).
  3. When a client accesses the top page, because the image files are linked, the browser of the client goes to the U.S. site and Japan site to obtain the image files.
  4. Here, the connection time between the client and the two server sites (Figure 7) is measured. As a result, the one with the shorter connection time has the shorter data transmission time and is nearer in terms of the network[6].


Figure 7. Measurement of connection duration

This time, the network status measured by NSS, NS-P (Figure 8) includes RTT, throughput, number of router hops, and AS pass length all through to the client.


Figure 8. Network information detection

The tcpdump command is executed on each Web server at the Japan site and the U.S. site, and tcpdump logs related to Web access are collected in order to measure the connection time.

The points considered for the preliminary experiment were:

(3) Data evaluation conditions

Each item of network information data obtained at NSS and connection times are compared in order to decide whether the judgment of NSS is correct. To judge this, the network information measurement results for Japan and the United States and connection duration are compared for each accessing client. Specifically, if network information is Japan < US, when connection duration is Japan < US, the judgment of NSS is correct (hit), but if connection duration is Japan >= US, the judgment is incorrect. However, regarding TP (throughput), the larger the value, the shorter the transmission time, so the criteria should be reversed. The details are shown below. The network status measurement value (NET-STAT) and data transmission time (DATA-T) between the client and site (US, Japan) are compared, and are determined as follows:

    Condition 1) NET-STAT (US) > NET-STAT (Japan)
        DATA-T (US) > DATA-T (Japan) ..Hit
        DATA-T (US) <= DATA-T (Japan) ..Incorrect

    Condition 2) NET-STAT (US) < NET-STAT (Japan)
        DATA-T (US) < DATA-T (Japan) ..Hit
        DATA-T (US) >= DATA-T (Japan) ..Incorrect

    Condition 3) NET-STAT (US) = NET-STAT (Japan)
        DATA-T (US) = DATA-T (Japan) ..Hit
        DATA-T (US) != DATA-T (Japan) ..Incorrect

    Network measurement value: AS, RT, RTT, TP (judgment for TP is the reverse.)

We incorporated the following conditions for valid data:

The httpd.conf file of Apache was set as follows:

    KeepAlive Off

The third place of the decimal point is rounded down and experimental data is compared.

We performed the experiment twice. In the first experiment, we located the Japan site in Yokohama, and in the second experiment, we located it in Ohtemachi. The machine specifications of the Web server are shown in Table 4.

Table 4: Machine specifications of Web server
  1st Time 2nd Time
Japan USA JPN, USA
OS FreeBSD 2.2.7 FreeBSD 2.2.7 FreeBSD 3.2
CPU Pentium Celeron 333MHz Pentium II 333MHz Pentium II 450MHz
Memory 160 MB 256 MB 256 MB
Hard drive 4 GB 4 GB 12 GB

(4) Results

The results are shown in Table 5.

Table 5: Correct answer ratio of network information
  1st Time
(5 Days, Total Data 21,008 Cases)
2nd Time
(8 Days, Total Data 39,474 Cases)
Average Correct Answer Ratio No. of Samples Average Correct Answer Ratio No. of Samples
AS 53.4% 14,605 52.2% 27,365
RT 48.9% 5,727 48.9% 10,250
RTT 71.1% 6,743 71.5% 12,666
TP 67.8% 6,028 66.2% 9,958

For the shortest route according to routing information (number of AS path length), the efficient route was selected at a ratio of approximately 50%. On the other hand, for RTT, we selected the efficient route at a ratio of approximately 70%. This result suggests that if we use both data effectively, we can quickly select the efficient route. We also confirmed that the network changed dynamically throughout any given day or week.

4.2 System tests

In the prototype we created this time, we validated only the number of AS path length and number of router hops as parameters used to calculate the network status value and site status value. We were able to confirm the redirect action to the efficient server that was obtained by calculating the current parameters.

5. Verification

5.1 Parameters

In the preliminary experiment, we verified the effectiveness of the four parameters of the network information (RTT to a client, throughput, number of router hops, AS path length). We compared the average correct answer ratio for the first and second experiments and they were almost the same (within 2%), so we consider the data to be highly reliable. The number of router hops produced the lowest correct answer ratio. It is generally considered that when there are many router hops, router overhead occurs, which increases network distances. However, our results showed that delays due to network congestion (the line) itself are greater than the overhead in the router on the present Internet (wide area network, WAN), so we can assume that excessive router overhead will not occur.

The correct answer ratio for throughput investigated in the preliminary experiment was around 67%, which is lower than the result of RTT. We had assumed that the result of throughput would be higher than RTT as we thought that throughput would tend to be more easily affected by the bandwidth than RTT. However, the correct answer ratio was not as high because the packet size used for measurement was small or the timing of issuance was not good. We need to investigate throughput measurement methods in the future.

5.2 Algorithms

We used two-step algorithms in the prototype to determine the efficient server. In the first step, the efficient site was selected and in the second step, the efficient server was selected. We also selected the efficient site in two ways: in the case of the client's first access, and in the case of subsequent accesses. As a result, we were able to confirm that the action was as expected, and a quick response was made to the accessing client. We need to examine the measurement parameters in more detail, and modify the system to achieve more accurate selection in the future.

5.3 The system

We achieved distribution to the most efficient server by HTTP redirection. HTTP redirects functions effectively in a system with small load. However, in the case of a system with a heavy load, overhead might constitute a bottleneck because of large overhead.

6. Future tasks and conclusions

In this document, we investigated a system for selecting the most efficient server from among distributed Web servers. We configured a site on the actual Internet (set up a Web server site for measurement in Japan and the United States), measured the data transmission time between the server and the client and each network status value, and examined the methods for efficient Web server selection according to these correlations.

We used a newly developed prototype system (NSS, NS-P) for measuring the status of the network. We also verified appropriate servers using several parameters and checked the redirect action to the selected efficient server in the system.

Future tasks include:

We intend to study the remaining parameters and improve practicability by increasing the accuracy of parameters.

Acknowledgments

We thank Mr. Kunihiro Ishiguro of Digital Magic Laboratory for his assistance for modifying RS (zebra) for the prototype system development.

References

  1. ttcp - Enger, R., Reynolds, J., FYI on a Network Management Tool Catalog, RFC 1470 (1993)
  2. netperf - http://www.netperf.org/netperf/NetperfPage.html
  3. Apache modules - http://modules.apache.org/
  4. Berners-Lee, T., Fielding, R. and Frystyk, H., Hypertext Transfer Protocol -- HTTP/1.0, RFC 1945 (1996).
  5. GNU Zebra - http://www.zebra.org/
  6. Yutaka Nakamura, Ken-ichi Chinen, Suguru Yamaguchi, Hideki Sunahara: "Proposal of WWW Server Behavior Observation Method by Packet Monitoring." In Proceedings of the Internet Conference 1998, December 1998.
  7. W. Richard Stevens, TCP/IP Illustrated,Volume1:The Protocols. Addison-Wesley Publishing Company 1995.
  8. W. Richard Stevens, UNIX Network Programming, Prentice Hall, 1990.
  9. Bassam Halabi, Internet Routing Architectures, Cisco Press/New Riders, 1997.