Design and Implementation of a Cost-Effective

Scalable Web Server

 

Li-Ju Chen and Hsin-Chou Chi

Department of Computer Science and Information Engineering

National Dong Hwa University

Hualien, Taiwan, R.O.C.

 

Address for Correspondence:

Hsin-Chou Chi

Department of Computer Science and Information Engineering

National Dong Hwa University

Hualien, Taiwan, R.O.C.

Phone: +886-3-8662500 ext.1837

E-mail: hcchi@csie.ndhu.edu.tw

 

 

Abstract

 

Since the World-Wide Web becomes popular these years, well-known web sites have to serve tens or hundreds of requests every second. These loads can threaten to overwhelm web server systems and have forced service providers to seek faster and more powerful servers to get higher performance. In this paper, we try to solve this problem by having a group of workstations that are alternately mapped to the hostname of the web server. When the end user uses the single URL for accessing files on the web server, the DNS of the web server will respond to the request based on two ways: (1) If it is possible to place multiple mirrored servers in different locations around the world, the DNS will answer the request by dynamic mapping based on client location. (2) If the multiple mirrored servers are placed as a pool of servers in a subnet, the DNS will answer the request by dynamic mapping based on servers’ load. We have designed and implemented a scalable web server based on the above approach. Our study shows that the load-balanced configuration of DNS improves the performance of the web server. Since the only thing that one has to do is to have the BIND code modified and the IP addresses of the servers added to the resource record, a cost-effective scalable web server is accomplished.

 

 

Keywords: Web server, World Wide Web, scalability, load balance, domain name system.

 

  1. Introduction

The World Wide Web, often abbreviated WWW or simply Web, is one of the hottest things since TV, bringing millions of people onto the Internet for the first time [9]. With a simple “point-and-click” operation and the ability to automatically display a wide variety of multimedia information, the Web browser lets everybody get in on the fun and hide most of the complexity of the Internet [16]. For example, a person may visit a digital library to retrieve the digitized document, to listen to a real-audio orchestra performance, or even to see a movie, by a simple “point-and-click” action, without regard to the physical details of the computers and networks that make this possible.

With millions of people navigating the Web, there are many more clients than servers. Hence, any web server can get very busy trying to serve dozens or possibly even hundreds of requests every second [8]. These loads can threaten to overwhelm web server systems and have forced service providers to seek the highest possible performance from their servers [16].

For a single-workstation server, there is an upper bound for the number of requests per second that the server can handle [11]. For example, during the days of the Olympic game in Atlanta, many people use the computer to connect to the Game Report Center in order to see the game or to get the result of a particular game. The easiest way for people to memorize the URL location is to use http://www.olympic.org. However, this web server will become the bottleneck since the number of HTTP requests to the web server exceeds its capacity to respond.

Therefore, it is important that web servers can be accessed in a way that scales according to its numbers of requests. This can be achieved via dynamic scalability by having a pool of servers that are alternately mapped to the hostname alias of the web server. The system administrator is able to add any number of servers to the available pool according to its need. For example, during the days of the Olympic Game, the computer system administrator of the Olympic Game Organization can build a cluster of distributive microcomputers as a pool of web servers that are alternately mapped to the same location, such as http://www.olympic.org. If the number of HTTP requests to the web server exceeds its capacity to respond, the system administrator can add any number of servers to the available pool, dynamically increasing the load capacity of the virtual server. After the Olympic Game, the system administrator can just remove some of the servers from the pool according to the number of requests.

Figure 1 shows the system architecture to support a scalable web server. In a scalable web server system, the name of the web server can be mapped to a group of sever computers. A new server can join in service whenever the number of requests increases. When the number of requests decreases, the server can be removed from the server groups. There should be no difference from the users’ point of view when they retrieve the data from the web server. In other words, the system should look and feel identical to requesting clients. This means that any user can access any document regardless of the server to which that user is connected [7].

Figure 1: Model of the system to support a scalable web server.

We have developed a load-balanced scalable web server that employs Sun Sparc 20 workstations as the HTTP servers. Through modification to the domain name server, the HTTP servers utilization is improved by scheduling user HTTP requests to a proper node for efficient processing. Server scalability is achieved by actively monitoring the CPU loads and reducing transmission costs in the routing environment.

This paper is organized as follows. Some previous work on scalable web servers is described in Section 2. Section 3 presents our design and implementation of a scalable web server. The performance evaluation of our web server and its implications will be also discussed in Section 3. Finally, Section 4 summarizes this work with discussion on using a cost-effective web server.

  1. Previous Work

This section briefly describes relevant previous work on building a scalable web server. NCSA has built a multi-workstation HTTP server based on a round-robin domain name system (DNS) to assign requests to workstation [7]. The round-robin approach only accomplishes load distribution, but not load balancing. Hence, scalability of the server is not as good as expected. UCSB has developed the preliminary version of a scalable WWW server, called SWEB, running on a cluster of workstations and parallel machines [1]. Scalability of the server is achieved by actively monitoring the run-time CPU, disk I/O, and network load of system resource units, and dynamically scheduling user HTTP requests to proper nodes for efficient processing. Each of the servers is required to install the related software modules.

There are some other commercial products which improve the scalability of the web server. Network Dispatcher is a product from IBM. It scales the server capability at IP level by modifying the dispatcher node’s kernel. The three key components of the Network Dispatcher (Executor, Advisor, and Manager) interact to balance and dispatch the incoming requests between servers [5, 6]. Cisco’s LocalDirector and DistributedDirector are two other products that transparently and intelligently redirect session to other local servers as necessary. This redirection is accomplished by rewriting the IP header information in accordance with a dynamic table of mappings between each session and the server to which it has been redirected [2, 3, 4]. The advantage of this approach is that the kernel does not have to be modified. However, the router might become the bottleneck in this approach.

In summary, the NCSA approach only modifies the DNS to include round-robin rotation among multiple hosts with the same name. Besides using the round-robin DNS, the SWEB adds a small expert system to each web server to analyze the CPU and I/O loads of the HTML files in order to choose the best host to serve the client. IBM and Cisco provide a modified kernel and a specific router respectively to achieve scalability of the server; however, they cost the web provider to buy these specific devices to enhance the performance of the server. In the following, we describe a cost-effective scalable web server with a simple load-balanced DNS.

 

  1. Design of a Scalable Web Server

A scalable system can be expanded or reduced easily without significantly reconfiguring the whole system. In other words, by adding more resources such as another server, or more disk or memory, a scalable web service can grow with a seemingly endless increase in the number of users’ requests. While the traditional solution to this problem is replacing an existing server with a faster and more powerful machine, a simpler and cost-effective approach is to modify the BIND program of the DNS to build a load-balanced scalable web server. Our design in this approach is presented in this section [17].

The system architecture of our scalable web server is shown in Figure 2. There can be one or more HTTP servers that are alternately mapped to the hostname alias of the WWW server. The load balance technique is applied to the DNS server for distributing HTTP requests. There are two kinds of request mapping: one is based on client location and the other is based on the server’s CPU load. When the user issues a query to the WWW server, the DNS server is dynamically scheduling the user’s HTTP requests to a proper node for efficient processing and returning the IP address of this proper node to the user for the connection.

 

Figure 2: The system architecture of the scalable web server.

In the following sections, we explain our design and implementation of the scalable web server and evaluate its performance.

3.1. Request Mapping Based on Client Location

As many WWW servers exist in the Internet, the demand on the network bandwidth increases as the number of Internet users increases. These users, who request web pages, can be from anywhere in the world. Therefore, we can set up some web servers in different areas of the world with identical data for a popular Web site. When the users want to access the file from this site, they can just connect to the nearest server, and hence the traffic on the network is reduced.

We observed that, in a DNS server, there is a function called multi-homed hosts. This function can cause a host to act as a router. For example, we have a resource record of the DNS like Table 1 [10]. The A stands for address and each resource record maps a name to an address. Since there are two addresses associated with wormhole.movie.edu, it acts as a router with two address records. Unlike host table lookups, a DNS lookup can return more than one address for a name. In this case, a lookup of wormhole.movie.edu will return two. If the requester and name server are on the same network, the name server will place the “closest” address first in the response for better performance. This feature is called address sorting.

;

; Host addresses

;

localhost.movie.edu. IN A 127.0.0.1

terminator.movie.edu. IN A 192.249.249.3

diehard.movie.edu. IN A 192.249.249.4

misery.movie.edu. IN A 192.253.253.2

shining.movie.edu. IN A 192.253.253.3

;

; Multi-homed hosts

;

wormhole.movie.edu. IN A 192.249.249.1

wormhole.movie.edu. IN A 192.253.253.1

;

; Aliases

;

wh249.movie.edu. IN A 192.249.249.1

wh253.movie.edu. IN A 192.253.253.1

Table 1: An example of the DNS resource record.

Based on the above feature of address sorting, we can reduce traffic on the Internet. We assign some IP addresses, which are located at different areas, to the unique Universal Resource Locator (URL) of the web server. Therefore, when the user requests a file, this request will be mapped properly to one of these IP addresses. For example, in Table 2, we map the WWW server to the IP addresses at three different areas: Hualien, Taipei, and USA. All three WWW servers have exactly the same data within them. As shown in Figure 3, when a user is accessing the file from the NTU campus, the DNS server of NTU will attempt to map hostname to IP address. Since WWW has three addresses associated with its name, a DNS lookup of WWW will return all three address records. At this moment, the requester, or the user from NTU campus, notices that there is a WWW server on the same network. Therefore, the DNS will place the “closest” address first in the response for better performance and reduce traffic on the Internet.

 

; Resource-Record for a scalable web server

; http://csl.csie.ndhu.edu.tw

;

www IN HINFO WWW-Server WWW

IN A 203.64.88.10 ; at NDHU campus

IN A 140.112.19.76 ; at NTU campus

IN A 128.32.206.66 ; at UC Berkeley

IN WKS 203.64.88.10 TCP http

IN WKS 140.112.19.76 TCP http

IN WKS 128.32.206.66 TCP http

Table 2: Another example of the DNS resource record.

 

 

 

Figure 3: Request mapping based on client location.

 

By building a scalable web server with request mapping based on client location, a company may place multiple mirrored servers in different locations around the world in order to maintain a good service to end users. The system administrator only has to modify its resource record of the DNS. Then, the DNS of the requester will automatically, through a list of several geographically dispersed IP addresses, select the “closest” IP address first in the response. Therefore, it provides scalability and reduces the transmission costs. We have implemented such a feature in our web server, and fully tested it.

3.2 Request Mapping Based on Server Loads

Besides placing multiple mirrored servers in different locations around the world and having the request mapping based on client location, a load-balanced scalable web server with request mapping based on server’s loads is presented in this section.

The design idea of a load-balanced scalable web server is simple. The user requests are first routed to the web server computers via the DNS load balancing distribution, as shown in Figure 4. Our design is to modify the DNS that provides an implementation that allows a pool of HTTP servers to map to the hostname alias of the WWW server. Such a collection of server machines can have heterogeneous hardware and operating systems. The DNS assigns the requests by consulting dynamically changing system load information, and allocates the IP address of the server with the lowest CPU load to the client [12].

For example, in our labatory, we have three workstatoins, which are located in the same subnet, in the server pool for the scalable web server. The DNS’s resource record is shown in Table 3. At any time, when a user issues a request to http://csl.csie.ndhu.edu.tw, the DNS of the client site will ask the DNS of the server site for the hostname to IP address mapping. According to the CPU load information of the servers, the DNS of the server site will return the IP address of the one with the lowest CPU load. Then, the client makes the connection with the given server and retrieves the data from it.

There are two schemes to gather the web servers system load information. One is to modify the existing BIND 4.9.2 code. When the user sends queries for the URL to the server, the named goes through a list of several IP addresses that can be mapped to a DNS query for the web server’s URL, checks the CPU load information for each server on the list, and returns the IP address of the server with the lowest CPU load. Then, the client makes the connection to the host that is assigned as the server. The whole process is depicted in Figure 5.

 

 

Figure 4: Request mapping based on server loads.

 

 

; Resource-Record for a scalable Web server

; http://csl.csie.ndhu.edu.tw

;

www IN HINFO WWW-Server WWW

IN A 203.64.100.117 ;

IN A 203.64.100.155 ;

IN A 203.64.100.183 ;

IN WKS 203.64.100.117 TCP http

IN WKS 203.64.100.155 TCP http

IN WKS 203.64.100.183 TCP http

Table 3: The resource record for a scalable web server.

 

Figure 5: The DNS asks each server’s load at the time when the name is queried.

 

Another scheme to dynamically monitor the CPU loads of the servers is to have each server to report its own system load to the DNS periodically and record such information in a file as shown in Figure 6. Then, according to the report, the named assigns the IP address of the server which has the lowest CPU load to the client. If any of the servers does not report its system load after a certain amount of time, the DNS marks this server as one that is out of service. This particular design has also eliminated the extra delay of polling the failed servers.

Our experimental testbed is a scalable web server consisting of three Sun Sparc 20 workstations running SunOS 4.1.4 via standard 10M b/s Ethernet. Each Sun Sparc workstation has a local 1.05GB hard disk and 32MB of RAM. These three workstations are placed in the same subnet. It should be noted that the bandwidth of this Ethernet is shared by other machines in NDHU. Each of servers is running the httpd and has identical data with different sizes of files. The BIND code has been modified to include the load monitoring routines designed by us.

Figure 6: Each server reports its load to the DNS server periodically.

 

In our experiment, we tried to find out the maximum request per second (rps) of a single web server first. The maximum rps is determined by fixing the average file size and increasing the rps until requests start to fail. When the requests start to fail, it indicates that the system limit is reached. As stated in the experiment of the SWEB, the duration of the test which simulates the burst of simultaneous requests also affects the experimental result [1]. The requests coming in a short period can be queued and processed gradually. However, the requests continuously generated in a long period cannot be queued without actively processing them since there are new requests coming after each second.

Since our web site is not popular, we use other workstations where we generate the requests to the server and evaluate the performance of the web server. Tools can be used to help evaluate the HTTP server performance [13, 14, 15]. We used a evaulation tool called webpest. Webpest allows users to estimate performance metrics using their mix of resources and their estimated request rates. Table 4 shows the maximum rps of a single-server for a test duration of a short period and that of a long period.

 

 

1MB file size

100 KB file size

1 KB file size

30 seconds

2 r/s

60 r/s

150 r/s

150 seconds

1 r/s

20 r/s

90 r/s

Table 4: The maximum rps of single-server for a test duration of 30s and 150s.

After we got the maximum rps of a single-server, we tried to scale the web server to a pool of 2 workstations. Table 5 shows the maximum rps for test duration of 30 seconds and 150 seconds with the file size 1 MB. It indicates that the maximum rps of a single node server is improved with the 2-node server. However, it is difficult for us to add more servers to the pool of the servers because we don’t have more machines that can produce enough requests to issue to the web server due to the limitation of the client machines.

 

 

30 seconds

150 seconds

1 - node

2 r/s

1 r/s

2- node

3.6 r/s

1.8 r/s

Table 5: Maximum rps for test duration of 30s and 150s with the file size 1MB.

We have also performed experiments which compare our approach, the load-balanced scheme, with the NCSA’s round-robin approach. We use another workstation as the controller to generate extra requests to a particular server of the pool. Then, we apply two different approaches to the scalable web server with 2 nodes: one is the NCSA’s rouind-robin approach and the other is our load-balanced approach. Since NCSA’s round-robin approach uniformly distributes requests to nodes, in turn, regardless of the condition of each server, it is expected to reach the system limit faster than our load-balanced approach. In other words, because the load-balanced approach monitors the servers’ load information before the DNS assigns the IP address to the client, it will choose the best server to connect to. To do this experiment, we generate 0.2 request per second to one of the web server from the controller machine. The processes of httpd will cause the server machine’s CPU load to increase. Then, we get the maximum rps with the two different approaches. Since our load-balanced approach takes into consideration the CPU’s load information, the overall performance of our approach is better than NCSA’s round-robin approach, which evenly distributes the request to each node. From Table 6, it can be seen that the load-balanced approach achieves 3.3 rps, which is better than 2.9 rps with the round-robin approach.

 

2 - nodes

server w/o extra load

load-balanced server

round-robin server

30 seconds

3.6 r/s

3.3 r/s

2.9 r/s

Table 6: Performance comparison between load-balanced and round-robin schemes. 1MB file size, 30 sec. duration.

 

SWEB observed that DNS caching enables a local DNS system to cache the name-to-IP address mapping, so that most recently accessed hosts can quickly be mapped [10]. The down side of this is that all requests for a period of time from a DNS server’s domain will go to a particular IP address [1]. However, this problem can be mitigated by setting the time to live (TTLs) on its zone’s data to zero. Since TTL is on a resource record and any server can cache that record, when the load-balanced approach is applied, it removes the entry from its cache immediately. In other words, the data should not be cached by the local DNS server. Every time a user needs the data, it will have to query the name server again to get the “best” IP address. The cost of this technique is that the delay for domain name resolution may increase.

During our experiment, we have found some anomalous behaviors that Katz et al. had mentioned in their paper [7]. These problems include slow response, dropped network connection, and other indication of system degradation, all associated with extremely high rate of opening and closing TCP/IP connections for long periods of time. We also tried to build the scalable web server with a cluster of workstations consisting of three or more hosts. The system is tested and works well. The DNS always finds the server with the lowest CPU load to respond to the request. However, it is difficult to get the accurate data of performance evaluation for such larger systems. The reason for this is that if we try to retrieve files with small sizes such as 100KB, we don’t have enough client machines to generate requests to determine the maximum request per second. On the other hand, if we use larger file sizes, the above dropped network connection problem may appear.

  1. Summary

The architecture described in this paper increases the overall capability of a HTTP server. We try to solve the heavily used Web sites problem in two ways:

First, if it is possible to place multiple mirrored servers in different locations around the world, the user needs only a single URL for accessing a distributed set of servers. This results that the DNS transparently redirects end user requests to the topologically closest server. It achieves better access performance seen by the end user and reduces transmission costs in the Internet environment.

Second, if the multiple mirrored servers are placed as a pool of servers in a subnet, the load-balanced DNS assigns the one with the lowest CPU load to respond to queries to the same URL. The servers may be different platforms, but each sees exactly the same document tree.

Our load-balanced configuration of DNS, which alleviates the problem of the NCSA’s load-distributed prototype, is easily adaptable to other sites. The only thing that one has to do is to have the modified BIND code, and add the IP addresses of the servers to the resource record. With this modification to the DNS, a cost-effective scalable web server is accomplished.

In our scalable web server, only CPU load has been used as the load indicator of the servers. Since other resources on a server can also be used for HTTP connections, we plan to add more load information to our load monitoring routine on the DNS. The additional load information includes memory usage, I/O loads, network conditions, etc. After such information is incorporated, our scalable web server is expected to achieve even better performance.

 

References

  1. D. Andresen, T. Yang, V. Holmedahl, and O. Ibarra, “SWEB: Towards a Scalable World Wide Web Server on Multicomputers,” Dept. of Computer Science Tech Rpt. TRCS95-17, U.C. Santa Barbara, September 1995.
  2. Cisco, “How to Cost-Effectively Scale Web Servers”, http://www.cisco.com/warp/public/784/5.html#DisDir.
  3. Cisco, “Cisco LocalDirector”, http://www.cisco.com/warp/public/751/lodir/lodir_wp.htm.
  4. Cisco, “Cisco DistributedDirector”, http://www.cisco.com/warp/public/751/distdir/dd_wp.htm.
  5. Annette Hamilton, “Network Dispatcher May Solve Your Web Site Woes,” Dec.6, 1996, http://www.ics.raleigh.ibm.com/netdispatch/solved.htm.
  6. IBM, “IBM Interactive network Dispatcher: Planning for Network Dispatcher ,” http://www.ics.raleigh.ibm.com/netdispatch/plan.htm.
  7. E. D. Katz, M. Butler, R. McGrath, “A Scalable HTTP Server: the NCSA Prototype,” Computer Networks and ISDN Systems, vol. 27, pp. 155-164, 1994.
  8. Thomas T. Kwan, Robert E. McGrath, and Daniel A Reed, “NCSA's World Wide Web Server: Design and Performance,” IEEE Computer, vol. 28, no. 11, pp. 68-74, November 1995.
  9. Berners-Lee, T.R. Cailliau, H.F. Nielsen, and A. Secret, “The World-Wide Web,” Communications of the ACM, vol. 37, no. 8, pp. 76-82, August 1994.
  10. C. Liu, and P. Albitz, DNS and BIND, Sebastopol, O’Reilly & Associates, Inc., 1992.
  11. Open Market, Inc., “Technical Overview: WebServers,”
  12. http://www.openmarket.com/Products/WhitePapers/Server/.

  13. B. A. Shirazi, A. R. Hurson, and K. M. Kavi (Eds), “Scheduling and Load Balancing in Parallel and Distributed Systems,” IEEE CS Press, 1995.
  14. Silicon Graphics, Inc., “Webstone: World Wide Web Server Benchmarking,” http://www.sgi.com/Products/WebFORCE/WebStone/.
  15. Standard Performance Evaluation Corporation (SPEC), “Webperf: The Next Generation in Web Server Benchmarking,” unpublished, 1996.
  16. Gene Trent and Mark Sake, “WebSTONE: The First Generation in HTTP Server Benchmarking,” February 1995, http://www.sgi.com/Products/WebFORCE/WebStone/.
  17. Nancy J. Yeager and Robert E. McGrath, “Web Server Technology: An Advanced Guide for World Wide Web Information Providers,” Morgan Kaufmann Publishers, 1996.
  18. Nancy J. Yeager, “Building a Scalable Web Server,” http://www.ncsa.uiuc.edu/InformationServers/Horizon/Scaling/arch.html.