research-article

Improving network connection locality on multicore systems

Authors:
Aleksey Pesterev

MIT CSAIL, Boston, MA, USA

MIT CSAIL, Boston, MA, USA
View Profile

,
Jacob Strauss

Quanta Research Cambridge, Boston, MA, USA

Quanta Research Cambridge, Boston, MA, USA
View Profile

,
Nickolai Zeldovich

MIT CSAIL, Boston, MA, USA

MIT CSAIL, Boston, MA, USA
View Profile

,
Robert T. Morris

MIT CSAIL, Boston, MA, USA

MIT CSAIL, Boston, MA, USA
View Profile

EuroSys '12: Proceedings of the 7th ACM european conference on Computer SystemsApril 2012Pages 337–350https://doi.org/10.1145/2168836.2168870

Published:10 April 2012Publication History

EuroSys '12: Proceedings of the 7th ACM european conference on Computer Systems

Pages 337–350

ABSTRACT

Incoming and outgoing processing for a given TCP connection often execute on different cores: an incoming packet is typically processed on the core that receives the interrupt, while outgoing data processing occurs on the core running the relevant user code. As a result, accesses to read/write connection state (such as TCP control blocks) often involve cache invalidations and data movement between cores' caches. These can take hundreds of processor cycles, enough to significantly reduce performance.

We present a new design, called Affinity-Accept, that causes all processing for a given TCP connection to occur on the same core. Affinity-Accept arranges for the network interface to determine the core on which application processing for each new connection occurs, in a lightweight way; it adjusts the card's choices only in response to imbalances in CPU scheduling. Measurements show that for the Apache web server serving static files on a 48-core AMD system, Affinity-Accept reduces time spent in the TCP stack by 30% and improves overall throughput by 24%.

References

Chelsio Terminator 4 ASIC. White paper, Chelsio Communications, January 2010. http://chelsio.com/assetlibrary/whitepapers/ChelsioT4 Architecture White Paper.pdf.Google Scholar
Apache HTTP Server, October 2011. http://httpd.apache.org/.Google Scholar
Httperf, October 2011. http://www.hpl.hp.com/research/linux/httperf/.Google Scholar
Lighttpd Server, October 2011. http://www.lighttpd.net/.Google Scholar
Receive Side Scaling, October 2011. http://technet.microsoft.com/en-us/network/dd277646.Google Scholar
SMP and Lighttpd, October 2011. http://redmine.lighttpd.net/wiki/1/Docs:MultiProcessor.Google Scholar
SpecWeb2009, October 2011. http://www.spec.org/web2009/.Google Scholar
AMD, Inc. Six-core AMD opteron processor features. http://www.amd.com/us/products/server/processors/six-core-opteron/Pages/six-core-opteron-key-architectural-features.aspx.Google Scholar
S. Boyd-Wickizer, A. T. Clements, Y. Mao, A. Pesterev, M. F. Kaashoek, R. Morris, and N. Zeldovich. An Analysis of Linux Scalability to Many Cores. In Proc. OSDI, 2010. Google ScholarDigital Library
M. Dobrescu, N. Egi, K. Argyraki, B.-G. Chun, K. Fall, G. Iannaccone, A. Knies, M. Manesh, and S. Ratnasamy. Route-Bricks: Exploiting Parallelism To Scale Software Routers. In Proc. SOSP, 2009. Google ScholarDigital Library
T. Herbert. RFS: Receive Flow Steering, October 2011. http://lwn.net/Articles/381955/.Google Scholar
T. Herbert. RPS: Receive Packet Steering, October 2011. http://lwn.net/Articles/361440/.Google Scholar
T. Herbert. aRFS: Accelerated Receive Flow Steering, January 2012. http://lwn.net/Articles/406489/.Google Scholar
Intel. 82599 10 GbE Controller Datasheet, October 2011. http://download.intel.com/design/network/datashts/82599 datasheet.pdf.Google Scholar
Linux 3.2.2 Myricom driver source code, January 2012. drivers/net/ethernet/myricom/myri10ge/myri10ge.c.Google Scholar
Linux 3.2.2 Solarflare driver source code, January 2012. drivers/net/ethernet/sfc/regs.h.Google Scholar
G. Lu, C. Guo, Y. Li, Z. Zhou, T. Yuan, H. Wu, Y. Xiong, R. Gao, and Y. Zhang. ServerSwitch: A Programmable and High Performance Platform for Data Center Networks. In Proc. NSDI, 2011. Google ScholarDigital Library
E. M. Nahum, D. J. Yates, J. F. Kurose, and D. Towsley. Performance issues in parallelized network protocols. In Proc. OSDI, 1994. Google ScholarDigital Library
A. Pesterev, N. Zeldovich, and R. T. Morris. Locating cache performance bottlenecks using data profiling. In Proc. EuroSys, 2010. Google ScholarDigital Library
Robert Watson. Packet Steering in FreeBSD, January 2012. http://freebsd.1045724.n5.nabble.com/Packet-steering-SMP-td4250398.html.Google Scholar
L. Soares and M. Stumm. FlexSC: flexible system call scheduling with exception-less system calls. In Proc. OSDI, 2010. Google ScholarDigital Library
Sunay Tripathi. FireEngine: A new networking architecture for the Solaris operating system. White paper, Sun Microsystems, June 2004.Google Scholar
P. Willmann, S. Rixner, and A. L. Cox. An evaluation of network stack parallelization strategies in modern operating systems. In Proc. USENIX ATC, June 2006. Google ScholarDigital Library
D. J. Yates, E. M. Nahum, J. F. Kurose, and D. Towsley. Networking support for large scale multiprocessor servers. In Proc. SIGMETRICS, 1996. Google ScholarDigital Library

Index Terms

Improving network connection locality on multicore systems
1. General and reference
  1. Cross-computing tools and techniques
    1. Performance
2. Software and its engineering
  1. Software creation and management
    1. Designing software
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
    2. Extra-functional properties
      1. Software performance

Recommendations

Counter-Based Cache Replacement and Bypassing Algorithms

Recent studies have shown that in highly associative caches, the performance gap between the Least Recently Used (LRU) and the theoretical optimal replacement algorithms is large, motivating the design of alternative replacement algorithms to improve ...
Read More
A cost-effective load-balancing policy for tile-based, massive multi-core packet processors

Massive multi-core architectures provide a computation platform with high processing throughput, enabling the efficient processing of workloads with a significant degree of thread-level parallelism found in networking environments.

Communication-centric ...
Read More
Reshaping cache misses to improve row-buffer locality in multicore systems
PACT '13: Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Optimizing cache locality has always been important since the emergence of caches, and numerous cache locality optimization schemes have been published in compiler literature. However, in modern architectures, cache locality is not the only factor that ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EuroSys '12: Proceedings of the 7th ACM european conference on Computer Systems
April 2012
394 pages
ISBN:9781450312233
DOI:10.1145/2168836
General Chair:
Pascal Felber
University of Neuchâtel, Switzerland
,
Program Chairs:
Frank Bellosa
KIT, Germany
,
Herbert Bos
Vrije Universiteit Amsterdam, The Netherlands
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 April 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cache misses
multi-core
packet processing
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate241of1,308submissions,18%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 88
  Total Citations
  View Citations
- 765
  Total Downloads
- Downloads (Last 12 months)32
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Improving network connection locality on multicore systems

EuroSys '12: Proceedings of the 7th ACM european conference on Computer Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Counter-Based Cache Replacement and Bypassing Algorithms

A cost-effective load-balancing policy for tile-based, massive multi-core packet processors

Reshaping cache misses to improve row-buffer locality in multicore systems