In order to disperse the load on a Web server, generally the server cluster is configured to distribute access requests, or mirror servers are distributed geographically or situated on different networks. However, although there are several proposals for leading clients to the most efficient server according to the constantly changing server and network condition, as yet no definitive solution has been proposed.
In this document, we propose a measurement method for dynamically changing server and network environment, a new selecting method to find the most efficient server based on the measurement method, and an efficient server selection system for leading the access clients to the most efficient server among the distributed Web server network.
Under this method, we use AS (Autonomous System) path routing information of BGP (Border Gateway Protocol) as the factors for evaluating the logical distance of the network. We also try to determine the most efficient Web server by using various server/network information measurement tools. Experimental Web server sites have been set up in both Japan and the United States, and a prototype system was implemented on the Internet and its performance was evaluated; results of the experiments and evaluation are included in this report.
Along with the rapid expansion of the Internet, the information delivery system using the World Wide Web (hereinafter the "Web") has also expanded rapidly as a simple tool for delivering and accessing information. Web server sites that can be accessed from all over the world, such as the official Web server for Nagano Olympics, have sprung up, and such sites are growing in importance. On the other hand, in this type of Web server site that delivers information worldwide, access requests come in from all over the world, and so the site must be able to handle a large number of visitors. Load dispersion of the Web server is thus an important subject for study.
Several proposals have been made for dispersing the load on a Web server in order for the client to access the most efficient server. For example, some of the common methods are to create a server cluster composed of several Web servers on the site to distribute many access requests across several Web servers, and to set up mirror servers with similar contents on other sites that are either at a different geographic location or situated on different networks in order to disperse accesses to the same region or network.
However, as described later, it is extremely difficult to disperse a concentrated access load properly using a general method, because the regions or networks to or from which accesses are made to the Web server are not uniform. Access time is also variable. For instance, access to the Web server does not come from a uniform region or network, and there is considerable bias in the access distribution. For the Web servers that have accesses from all over the world, access distribution is also greatly affected by the time of day in the region. In terms of access time, access distribution varies greatly changes between business time and midnight.
In this document, we propose a measurement method for dynamically changing server and network environment, a new selecting method to find the most efficient server based on the measurement method, and an efficient server selection system for leading the access clients to the most efficient server among the distributed Web server network.
Under this method, we use AS (Autonomous System) path routing information of BGP (Border Gateway Protocol) as the factors for evaluating the logical distance of the network. We also try to determine the most efficient Web server by using various server/network information measurement tools. Web server sites for demonstration and experiment were constructed in both Japan and the United States, and a prototype system was implemented on the Internet and its performance was evaluated. This report includes the results of the experiments and evaluation.
In this document, the former load dispersion method is outlined in Chapter 2. In Chapter 3, the efficient server selection system proposed in this study is described, and in Chapter 4, we look at the performance evaluation of the parameters used and test results from the proposed prototype system. Chapter 5 examines the proposed method and Chapter 6 includes our future tasks and conclusions.
Currently several methods have been proposed. The two typical methods are outlined in Figure 1 and Figure 2.
Under this method, the virtual host shown in Figure 1 is used. This method is designed to disperse concentrated access load evenly over several servers. As an example, let's look at a case where a client accesses the uniform resource locator (URL; access address), www.foo.com. When an attempt is made to access www.foo.com, access is initially made to a virtual host. Then, the virtual host attempts to disperse the access adequately according to the load status of each subordinate Web server. In fact, the particular information is delivered from www1, www2, or www3 when an access is requested by a client. The access is received by the virtual host first, in order to ensure appropriate access dispersal.
However, the case shown in Figure 1 using a virtual host has the following problems:
Next, we look at a transmission method using transmission control protocol (TCP) connection hop in Figure 2. When an access request is made by the client, at first it is passed to www1. At www1, a scheduler is started. Here, the scheduler selects www3 as the most efficient Web server, and a response is made to the access request from the client. This means that www1 responds to access demands at all times and the most efficient Web server (www1, www2, www3) then responds to the access in turn. Under this method, access is dispersed using a scheduling function.
However, TCP connection hop transmission method shown in Figure 2 also has these problems:
Figure 1. Virtual host method / Figure. 2 TCP connection hop method
In order to select the efficient Web server for clients accessing from distributed Web server groups, we defined the following goals for our proposed method:
In order to achieve these goals, we collected and analyzed the following information, and based on the results configured a system to select the efficient server using this method. The characteristics of the system are as follows.
In this method, the efficient Web server is selected based on the above-mentioned measurement results.
The system is composed of the following four systems (Figure 3):
Figure 3. System configuration
The Network Status Probe (NS-P) has the following three measurement functions:
Under the network information measurement function, RTT (Round Trip Time) between client and server, packet loss, throughput, and number of router hops are measured. Under the in-site network information measurement function, transmitting/receiving network traffic in the site, number of packet errors, number of collisions occurred, GW router traffic, and number of GW router packets rejected are measured, and the operating condition of the server in the site is monitored. Under the server status measurement function, the load on the server including the number of TCP connections established, the disk load, CPU idle status, and load average are measured.
The same functions as UNIX commands are directly incorporated for implementation of the measurement function. The relationship between each measurement item and commands for the measurement function are shown in Tables 1 and 2.
Measurement Item | Measurement Unit | Equivalent Command |
---|---|---|
RTT | ms | ping |
Packet loss | % | ping |
Throughput | bit/s | (Independent) |
Number of router hops | No. of layers | traceroute |
AS path length | No. of layers | (RS) |
Measurement Item | Measurement Unit | Equivalent Command |
---|---|---|
Transmitting/receiving traffic | packets | netstat n |
Number of packet errors | packets | netstat n |
Number of collisions occurred | No. of times | netstat n |
Router traffic | bps | snmp |
Number of router packets rejected | packets | snmp |
Measurement Item | Measurement Unit | Equivalent Command |
---|---|---|
Number of TCP connections | No. of times | netstat |
Disk load | No. of transmissions | iostat n |
CPU idle value | % | iostat n |
Load average | No. of processes | uptime |
In the case of communication such as Telnet where data is generated progressively, the influence of bandwidth is small. Therefore, latency is important, so we decided that throughput could be disregarded. On the other hand, in the case of bulk data such as on the Web, as the influence of bandwidth is large, throughput is an important issue. We used ttcp[1] and netperf[2] commands as tools for measuring network throughput. However, in order to use these tools, it is necessary to start a process on both the transmitting and receiving bases of the network being measured, so it is impossible to carry out real-time measurement of the distance between an unspecified client and a site such as in this case. Therefore we independently created and incorporated a measuring function for throughput. We created the throughput measurement function in the prototype system under the following conditions:
An implementation example is as follows. After issuing a 64-byte Internet control message protocol (ICMP) echo request twice consecutively for a client, a 1024-byte ICMP echo request is issued once. From the difference between the result of the second 64-byte (RTT1) measurement and the result of measuring 1024-byte data issued at the end (RTT2), throughput is calculated. The calculation formula is as follows:
Throughput = (1024 - 32) * 2 * 8 / (RTT2 - RTT1)
The first 64-byte data is measured in order to load (overhead) the client program; therefore it is ignored for data purposes. Throughput as described in this document means that incorporated in the prototype, and in the following the measurement result of this throughput is described.
NS-P is implemented on all servers in the site, one of which acts as a master, and the remaining NS-Ps act as slaves. A slave NS-P continuously measures the load on its own server. The master NS-P measures between the client and site as a network information measurement function and GW router traffic as an in-site network information measurement function. The master also collects the load result of each server as measured by the slave from all slaves in the site and collectively controls the load of the entire site (Figure 4). An NS-P consists of approximately 7,300 lines of C language and is multi-threaded. The operating system is FreeBSD at present.
Figure 4. NS-P configuration
The Network Status Server (hereinafter "NS server") is the core part of the entire system and has the following five functions:
In the network information acquisition function, network information selected by the NS-P is collected. In the efficient site selection function, the efficient site is determined from the measurement data obtained from the NS-P through RS or the network information acquisition function. In the efficient server selection function, the most efficient server at present for receiving access from a client is selected and determined by the NS-P master. The most efficient server here means the one with the lightest load. In the data aggregation function, the information on clients is aggregated and controlled. The information controlled here is the IP address of the client and network selection result between client and server. The site control function monitors the load status of each site. The NS server consists of approximately 5,600 lines of C language and is multi-threaded. The operating system is FreeBSD at present.
NS-Agent is implemented on Apache of the Web server as "Module (mod_nss)". The module[3] allows the functions of Apache to be freely changed. The developed NS-Agent consists of approximately 1,400 lines of C language. NS-Agent starts when Apache receives a hypertext transfer protocol (HTTP) request from a client (Web browser). This means that whenever Apache receives an HTTP request, NS-Agent is called within Apache and the necessary processing for the received request is executed. The NS-Agent that is called executes the following two processes:
Specifically, when an HTTP request arrives at Apache from a client (Web browser), Apache calls NS-Agent. NS-Agent obtains the IP address of a client within Apache and issues an efficient server information command to the NS server. At this time, the IP address of the client is set as a parameter. The NS server that receives the efficient server information command from NS-Agent determines the nearest site to the client from the dispersed sites and returns the IP address of that server in the site to NS-Agent. After receiving the IP address of the server, NS-Agent returns an HTTP response to the a client. At this time, the IP address of the server that specifies the nearest site is set in the response, and 302 is set in the HTTP response code. Response code 302 means "Moved Temporarily,"[4] so the Web browser of the client receiving this code redirects to the IP address of the specified server.
We modified the Zebra[5] Route Server for our system. Zebra supports some routing protocols; however, this time we only changed the BGP protocol processing part and added configurations and commands. Specifically, we added a configuration so that the GW router address of each site can be registered, and the additional commands enabled us to receive the client IP address. When this command is received, Zebra calculates the number of AS hops between GW routers of the client and each site from the IP address of the client set in the command, and responds with a list of GW routers of each site and number of AS hops to the source of the command.
In the prototype, the efficient server is determined by two-step algorithm. In the first step, the most efficient site is determined from the collected network information. Next, the server that has the smallest load at the efficient site is selected and determined as the efficient server. The efficient site selection algorithm is implemented on the NS server, and the algorithm for selecting the efficient server is implemented on the NS-P master. After the efficient site has been determined, the NS server sends an inquiry to the NS-P master at the efficient site for the most efficient server. Details of the algorithms for selecting the efficient site and efficient server are described next.
There are two efficient site selection methods depending on whether a client is accessing for the first time or not:
In order to acquire the number of AS hops, the NS server issues to the RS the number of AS hops acquisition command (ASInfoCOM) in which the IP address of the client is set, and obtains a list of AS hop numbers as a response (ASInfoRSP). The efficient site is selected from the list obtained here. The content of the response is a list of router addresses of the site and numbers of AS hops to the client. The site with the minimum number of AS hops is selected from this list as the efficient site. For instance, in the case of:
RTList: address 202.228.128.217 AS Length 1 <- SiteA RTList: address 216.98.110.62 AS Length 3 <- SiteB
SiteA with the smaller AS Length is chosen as the efficient site.
Here, the efficient site is calculated using the network status value and site status value. First, the network status value is calculated from the network information (RTT, number of router hops, etc.) secured in the aggregation table. Next, the site status value is calculated from the in-site network information (transmitting/receiving network traffic, number of collisions occurred, etc.). The sum of the calculated network status value and site status value is the efficient site judgment value for the site. Then, the efficient site judgment values of each site are compared and the efficient site is identified. The flow of efficient site selection is shown in Figure 5.
Figure 5. Flow of selecting efficient site
The detailed calculation is described next. The network status value is calculated by adding the weight of data for each parameter of the present network status. The calculation formula is as follows:
Network status value = ASL*A + RTT*B + RN*C + PL*D + TP*E (1)
The codes here indicate the following measurement data and coefficients:
In the prototype system, each parameter value of the network status is calculated. However, we have validated the number of AS hops and number of router hops as parameters to be used for calculation as a start. Each weight coefficient is 0.5. We decided to investigate the coefficient of other parameters from the results obtained through preliminary experiments conducted to test the validity of the network information.
The in-site network status value is calculated from the parameters of in-site network status by adding the weight of data. The calculation formula is as follows:
In-site network status value = CS*F + PS*G + ES*H + RTR*I + RTE*J (2)
The codes here indicate the following measurement data and coefficients:
In the prototype system, we validated the number of collisions, number of packets, and GW router traffic as parameters used for calculation. We calculated the collision occurrence rate from the number of collisions and number of packets and we calculated the bandwidth occupation rate from the GW router traffic, and to these two calculated figures we added a weight coefficient of 0.5.
The server load is calculated in real time, and the server with the smallest load is selected as the efficient server. The server evaluation value is calculated from the measured load and the values of each server are compared. The server evaluation value is calculated from each parameter of server load by adding the weight of data. The calculation formula is as follows:
Server evaluation value = LINK*K + IO*L + IDLE*M + CPU*N (3)
Each code indicates the following measurement data and coefficients:
In the prototype system, we have validated the number of TCP connections established, CPU idle status, and load average. The weight coefficient is 0.2 for the number of TCP connections established, 0.3 for CPU idle status, and 0.5 for load average.
In this section, the results of preliminary experiments performed using the prototype system and the results of system tests are discussed. Preliminary experiments were performed in order to check the effectiveness of the network information measured under this system.
Experiments were conducted to determine the effectiveness (hit ratio) of [Network status value] for [Efficient site selection], using tests on an actual site on the Internet. In the experiment, data transmission time is considered as network distance; this means that the shorter the data transmission time, the shorter the network. Data transmission time here is the time required to transmit data from a Web server to a client when the client accesses the Web, and is calculated by measuring the connection time between client and Web server (site). The status of the network between the client and Web server at this time is measured and the correlation of the data transfer time and the status of the network is examined.
The data transmission time between client and sever sites is measured using the 'tcpdump' command, and the status of the network is measured using the created NS server and NS-P.
Figure 6. Images of same size are dispersed.
The actual procedure (Figure 6) is as follows:
Figure 7. Measurement of connection duration
This time, the network status measured by NSS, NS-P (Figure 8) includes RTT, throughput, number of router hops, and AS pass length all through to the client.
Figure 8. Network information detection
The tcpdump command is executed on each Web server at the Japan site and the U.S. site, and tcpdump logs related to Web access are collected in order to measure the connection time.
The points considered for the preliminary experiment were:
Each item of network information data obtained at NSS and connection times are compared in order to decide whether the judgment of NSS is correct. To judge this, the network information measurement results for Japan and the United States and connection duration are compared for each accessing client. Specifically, if network information is Japan < US, when connection duration is Japan < US, the judgment of NSS is correct (hit), but if connection duration is Japan >= US, the judgment is incorrect. However, regarding TP (throughput), the larger the value, the shorter the transmission time, so the criteria should be reversed. The details are shown below. The network status measurement value (NET-STAT) and data transmission time (DATA-T) between the client and site (US, Japan) are compared, and are determined as follows:
Condition 1) NET-STAT (US) > NET-STAT (Japan) DATA-T (US) > DATA-T (Japan) ..Hit DATA-T (US) <= DATA-T (Japan) ..Incorrect Condition 2) NET-STAT (US) < NET-STAT (Japan) DATA-T (US) < DATA-T (Japan) ..Hit DATA-T (US) >= DATA-T (Japan) ..Incorrect Condition 3) NET-STAT (US) = NET-STAT (Japan) DATA-T (US) = DATA-T (Japan) ..Hit DATA-T (US) != DATA-T (Japan) ..Incorrect Network measurement value: AS, RT, RTT, TP (judgment for TP is the reverse.)
We incorporated the following conditions for valid data:
The httpd.conf file of Apache was set as follows:
KeepAlive Off
The third place of the decimal point is rounded down and experimental data is compared.
We performed the experiment twice. In the first experiment, we located the Japan site in Yokohama, and in the second experiment, we located it in Ohtemachi. The machine specifications of the Web server are shown in Table 4.
1st Time | 2nd Time | ||
---|---|---|---|
Japan | USA | JPN, USA | |
OS | FreeBSD 2.2.7 | FreeBSD 2.2.7 | FreeBSD 3.2 |
CPU | Pentium Celeron 333MHz | Pentium II 333MHz | Pentium II 450MHz |
Memory | 160 MB | 256 MB | 256 MB |
Hard drive | 4 GB | 4 GB | 12 GB |
The results are shown in Table 5.
1st Time (5 Days, Total Data 21,008 Cases) |
2nd Time (8 Days, Total Data 39,474 Cases) |
|||
---|---|---|---|---|
Average Correct Answer Ratio | No. of Samples | Average Correct Answer Ratio | No. of Samples | |
AS | 53.4% | 14,605 | 52.2% | 27,365 |
RT | 48.9% | 5,727 | 48.9% | 10,250 |
RTT | 71.1% | 6,743 | 71.5% | 12,666 |
TP | 67.8% | 6,028 | 66.2% | 9,958 |
For the shortest route according to routing information (number of AS path length), the efficient route was selected at a ratio of approximately 50%. On the other hand, for RTT, we selected the efficient route at a ratio of approximately 70%. This result suggests that if we use both data effectively, we can quickly select the efficient route. We also confirmed that the network changed dynamically throughout any given day or week.
In the prototype we created this time, we validated only the number of AS path length and number of router hops as parameters used to calculate the network status value and site status value. We were able to confirm the redirect action to the efficient server that was obtained by calculating the current parameters.
In the preliminary experiment, we verified the effectiveness of the four parameters of the network information (RTT to a client, throughput, number of router hops, AS path length). We compared the average correct answer ratio for the first and second experiments and they were almost the same (within 2%), so we consider the data to be highly reliable. The number of router hops produced the lowest correct answer ratio. It is generally considered that when there are many router hops, router overhead occurs, which increases network distances. However, our results showed that delays due to network congestion (the line) itself are greater than the overhead in the router on the present Internet (wide area network, WAN), so we can assume that excessive router overhead will not occur.
The correct answer ratio for throughput investigated in the preliminary experiment was around 67%, which is lower than the result of RTT. We had assumed that the result of throughput would be higher than RTT as we thought that throughput would tend to be more easily affected by the bandwidth than RTT. However, the correct answer ratio was not as high because the packet size used for measurement was small or the timing of issuance was not good. We need to investigate throughput measurement methods in the future.
We used two-step algorithms in the prototype to determine the efficient server. In the first step, the efficient site was selected and in the second step, the efficient server was selected. We also selected the efficient site in two ways: in the case of the client's first access, and in the case of subsequent accesses. As a result, we were able to confirm that the action was as expected, and a quick response was made to the accessing client. We need to examine the measurement parameters in more detail, and modify the system to achieve more accurate selection in the future.
We achieved distribution to the most efficient server by HTTP redirection. HTTP redirects functions effectively in a system with small load. However, in the case of a system with a heavy load, overhead might constitute a bottleneck because of large overhead.
In this document, we investigated a system for selecting the most efficient server from among distributed Web servers. We configured a site on the actual Internet (set up a Web server site for measurement in Japan and the United States), measured the data transmission time between the server and the client and each network status value, and examined the methods for efficient Web server selection according to these correlations.
We used a newly developed prototype system (NSS, NS-P) for measuring the status of the network. We also verified appropriate servers using several parameters and checked the redirect action to the selected efficient server in the system.
Future tasks include:
We intend to study the remaining parameters and improve practicability by increasing the accuracy of parameters.
We thank Mr. Kunihiro Ishiguro of Digital Magic Laboratory for his assistance for modifying RS (zebra) for the prototype system development.