research-article

Performance scalability of a multi-core web server

Authors:
Bryan Veal

Intel Corporation, Hillsboro, OR

Intel Corporation, Hillsboro, OR
View Profile

,
Annie Foong

Intel Corporation, Hillsboro, OR

Intel Corporation, Hillsboro, OR
View Profile

ANCS '07: Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systemsDecember 2007Pages 57–66https://doi.org/10.1145/1323548.1323562

Published:03 December 2007Publication History

ANCS '07: Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems

Pages 57–66

ABSTRACT

Today's large multi-core Internet servers support thousands of concurrent connections or ows. The computation ability of future server platforms will depend on increasing numbers of cores. The key to ensure that performance scales with cores is to ensure that systems software and hardware are designed to fully exploit the parallelism that is inherent in independent network ows. This paper identifies the major bottlenecks to scalability for a reference server workload on a commercial server platform. However, performance scaling on commercial web servers has proven elusive. We determined that on web server running a modified SPEC-web2005 Support workload, throughput scales only 4.8 x on eight cores. Our results show that the operating system, TCP/IP stack, and application exploited ow-level parallelism well with few exceptions, and that load imbalance and shared cache affected performance little. Having eliminated these potential bottlenecks, we determined that performance scaling was limited by the capacity of the address bus, which became saturated on all eight cores. If this key obstacle is addressed, commercial web server and systems software are well-positioned to scale to a large number of cores.

References

CAIDA. Workload Characterization: Application Cross-Section, 2006. http://www.caida.org/analysis/workload/.Google Scholar
J. Chase, G. Gallatin, and K. Yocum. End-system optimizations for high-speed TCP. 39(4), 2001. Google ScholarDigital Library
S. Chinthamani and R. Iyer. Design and evaluation of snoop filters for web servers. In SPECTS, 2004.Google Scholar
D. Clark, V. Jacobson, J. Romkey, and H. Salwen. An analysis of TCP processing overhead. IEEE Communications Magazine, 27(6), 1989.Google ScholarDigital Library
A. Foong, J. Fung, D. Newell, A. Lopez-Estrada, S. Abraham, and P. Irelan. Architectural characterization of processor affinity in network processing. In ISPASS. IEEE, 2005. Google ScholarDigital Library
R. Hariharan and N. Sun. Workload characterization of SPECweb2005. In SPEC Benchmark Workshop. SPEC, 2006.Google Scholar
Intel Corporation. Receive Side Scaling on Intel Network Adapters. http://support.intel.com/support/network/adapter/pro100/sb/CS-027574.htm.Google Scholar
R. Iyer. Characterization and evaluation of cache hierarchies for web servers. World Wide Web, 7(3):259--280, Sept. 2004. Google ScholarDigital Library
L. Kencl and J.-Y. L. Boudec. Adaptive load sharing for network processors. In INFOCOM, volume 2, pages 545--554. IEEE, 2002.Google ScholarCross Ref
É. Lemoine, C. Pham, and L. Lefèvre. Packet classification in the NIC for improved SMP-based Internet servers. In ICN. IEEE, 2004.Google Scholar
Linux Kernel. Linux IP Sysctl Documentation. Documentation/networking/ip-sysctl.txt.Google Scholar
C. MacCárthaigh. Scaling Apache 2.x beyond 20,000 concurrent downloads. In ApacheCon EU, July 2005.Google Scholar
M. Martin, P. Harper, D. Sorin, M. Hill, and D. Wood. Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors. In ISCA, pages 206--217, 2003. Google ScholarDigital Library
Microsoft Corporation. Scalable Networking with RSS, Apr. 2005.Google Scholar
D. Miller. How the Linux TCP Output Engine Works. http://vger.kernel.org/¿davem/tcp output.html.Google Scholar
I. Molnar. Goals, Design and Implementation of the New Ultra-Scalable O(1) Scheduler. Linux Kernel, Apr. 2002. Documentation/sched-design.txt.Google Scholar
J. Salehi, J. Kurose, and D. Towsley. The effectiveness of affinity-based scheduling in multiprocessor network protocol processing (extended version). Transactions on Networking, 4(4):516--530, Aug. 1996. Google ScholarDigital Library
W. Shi and L. Kencl. Sequence-preserving adaptive load balancers. In ANCS, pages 143--152. IEEE/ACM, Dec. 2006. Google ScholarDigital Library
W. Shi, M. MacGregor, and P. Gburzynski. Load balancing for parallel forwarding. Transactions on Networking, 13(4):790--801, Aug. 2005. Google ScholarDigital Library
SPEC. SPECweb2005 Release 1.10 Benchmark Design Document, Apr. 2006.Google Scholar
SPEC. SPECweb2005 Result for the HP ProLiant DL380 G5, 2007. http://www.spec.org/web2005/results/res2007q2/web2005-20070507-00066.html.Google Scholar
SPEC. SPECweb2005 Result for the HP ProLiant DL385 G2, 2007. http://www.spec.org/web2005/results/res2007q3/web2005-20070828-00079.html.Google Scholar
SPEC. SPECweb2005 Result for the HP ProLiant DL580 G5, 2007. http://www.spec.org/web2005/results/res2007q3/web2005-20070828-00077.html.Google Scholar
SPEC. SPECweb2005 Result for the HP ProLiant DL585 G2, 2007. http://www.spec.org/web2005/results/res2007q2/web2005-20070507-00067.html.Google Scholar
SPEC. SPECweb2005 Result for the HP ProLiant ML360 G5, 2007. http://www.spec.org/web2005/results/res2007q2/web2005-20070507-00068.html.Google Scholar
S. Tripathi. FireEngine--a new networking architecture for the Solaris operating system. Whitepaper, Sun Microsystems, Nov. 2004.Google Scholar
V. Viswanathan. Intel front side bus architecture. Intel Software College course, 2006.Google Scholar
J. Walker. Pseudorandom Number Sequence Test Program. Fourmilab, Oct. 1998.Google Scholar
P. Willman, S. Rixner, and A. Cox. An evaluation of network stack parallelization strategies in modern operating systems. In USENIX, pages 91--96, 2006. Google ScholarDigital Library
W. Zhang and W. Zhang. Linux virtual server clusters. Linux Magazine, 5(11), 2003.Google Scholar

Index Terms

Performance scalability of a multi-core web server

Recommendations

Improving the scalability of a multi-core web server
ICPE '13: Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering

Improving the performance and scalability of Web servers enhances user experiences and reduces the costs of providing Web-based services. The advent of Multi-core technology motivates new studies to understand how efficiently Web servers utilize such ...
Read More
Comparing high-performance multi-core web-server architectures
SYSTOR '12: Proceedings of the 5th Annual International Systems and Storage Conference

In this paper, we study how web-server architecture and implementation affect performance when trying to obtain high throughput on a 4-core system servicing static content. We focus on static content as a growing numbers of servers are dedicated to ...
Read More
Workshop on relaxing synchronization for multicore and manycore scalability (RACES 2012)
SPLASH '12: Proceedings of the 3rd annual conference on Systems, programming, and applications: software for humanity

Massively-parallel systems are coming: core counts keep rising whether conventional cores as in multicore and manycore systems, or specialized cores as in GPUs. Conventional wisdom has been to utilize this parallelism by reducing synchronization to the ...
Read More

Reviews

Reviewer: Carlos Juiz

The Internet provides a computing scenario where clients communicate with Web servers through mutually independent connections. If Internet server application processing and the associated protocol processing of a connection (flow) are done exclusively on a single central processing unit (CPU) core, minimal data sharing and synchronization between flows is expected. The computation ability of future servers will depend on increasing the number of cores. This interesting paper "identifies the major bottlenecks to scalability for a reference server workload on a commercial server platform." To test their hypothesis, the authors "set up [a] test server running a well-tuned Apache HTTP server and [the] Linux operating system. The server had eight cores with pairs of cores sharing L2 cache." The experiments show that the test server, running a modified SPECweb2005 Support workload, achieved only a 4.8 times speedup in throughput, compared to the ideal eight times-official SPECweb2005 results show similar scaling problems. This work provides "insights on the key causes of poor scalability of a Web server," and also provides "the analysis methodology leading to these insights." This latter feature makes the paper more interesting than the findings themselves, since the main bottleneck of the multicore server is the bus, and the snoopy protocol for sharing it. The authors determined that the main cause of poor scaling is the capacity of the bus. They confirmed that the address bus reached 77 percent utilization on eight cores, which is considered fully saturated. Other results showed that the number of cache misses remained nearly constant per byte as the number of cores increased, and that shared cache between cores on the same bus had little effect on performance. However, profiling revealed some scalability obstacles in software. "Increasing hash table capacities and reducing dependence on linked lists," as workload increases, should fix these scalability problems. "In the kernel, flow-level parallelism broke down in the file-system directory cache," which was widely shared. The authors propose that "a possible workaround would be to maintain alternate directory trees for each core." In conclusion, the remaining problem in scaling performance with the number of cores is address bus capacity. As stated, "directories (and directory caches) can be used to replace snoopy cache coherence," with paying the price of additional cost and latency. Further studies should be addressed to verify this last hypothesis for real workloads Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ANCS '07: Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems
December 2007
212 pages
ISBN:9781595939456
DOI:10.1145/1323548
General Chair:
Raj Yavatkar
Intel Corporation, USA
,
Program Chairs:
Dirk Grunwald
University of Colorado, USA
,
K. K. Ramakrishnan
AT&T Labs Research, USA
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 December 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cache hierarchies
load balancing
network protocol stacks
networks
parallelism
scalability
web servers
Qualifiers
- research-article
Conference

Acceptance Rates
ANCS '07 Paper Acceptance Rate20of70submissions,29%Overall Acceptance Rate88of314submissions,28%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 53
  Total Citations
  View Citations
- 1,491
  Total Downloads
- Downloads (Last 12 months)32
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Performance scalability of a multi-core web server

ANCS '07: Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Improving the scalability of a multi-core web server

Comparing high-performance multi-core web-server architectures

Workshop on relaxing synchronization for multicore and manycore scalability (RACES 2012)

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Performance scalability of a multi-core web server

ANCS '07: Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Improving the scalability of a multi-core web server

Comparing high-performance multi-core web-server architectures

Workshop on relaxing synchronization for multicore and manycore scalability (RACES 2012)

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media