skip to main content
10.1145/3078597.3078605acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

Explaining Wide Area Data Transfer Performance

Published:26 June 2017Publication History

ABSTRACT

Disk-to-disk wide-area file transfers involve many subsystems and tunable application parameters that pose significant challenges for bottleneck detection, system optimization, and performance prediction. Performance models can be used to address these challenges but have not proved generally usable because of a need for extensive online experiments to characterize subsystems. We show here how to overcome the need for such experiments by applying machine learning methods to historical data to estimate parameters for predictive models. Starting with log data for millions of Globus transfers involving billions of files and hundreds of petabytes, we engineer features for endpoint CPU load, network interface card load, and transfer characteristics; and we use these features in both linear and nonlinear models of transfer performance, We show that the resulting models have high explanatory power. For a representative set of 30,653 transfers over 30 heavily used source-destination pairs ("edges''),totaling 2,053 TB in 46.6 million files, we obtain median absolute percentage prediction errors (MdAPE) of 7.0% and 4.6% when using distinct linear and nonlinear models per edge, respectively; when using a single nonlinear model for all edges, we obtain an MdAPE of 7.8%. Our work broadens understanding of factors that influence file transfer rate by clarifying relationships between achieved transfer rates, transfer characteristics, and competing load. Our predictions can be used for distributed workflow scheduling and optimization, and our features can also be used for optimization and explanation.

References

  1. W. Allcock, J. Bresnahan, R. Kettimuthu, M. Link, C. Dumitrescu, I. Raicu, and I. Foster. The Globus striped GridFTP framework and server. In SC'05, pages 54--61, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. Allen, J. Bresnahan, L. Childers, I. Foster, G. Kandaswamy, R. Kettimuthu, J. Kordas, M. Link, S. Martin, K. Pickett, and S. Tuecke. Software as a service for data scientists. Commun. ACM, 55(2):81--88, Feb. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. Altman, D. Barman, B. Tuffin, and M. Vojnovic. Parallel TCP sockets: Simple model, throughput and validation. In 25th IEEE Intl Conf.\ on Computer Communications, pages 1--12, April 2006.Google ScholarGoogle ScholarCross RefCross Ref
  4. E. Arslan, K. Guner, and T. Kosar. HARP: predictive transfer optimization based on historical analysis and real-time probing. In SC'16, pages 25:1--25:12, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Balaprakash, A. Tiwari, S. M. Wild, and P. D. Hovland. AutoMOMML: Automatic Multi-objective Modeling with Machine Learning. In ISC, pages 219--239, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  6. BBCP. http://www.slac.stanford.edu/ abh/bbcp/.Google ScholarGoogle Scholar
  7. P. H. Carns, B. W. Settlemyer, and W. B. Ligon III. Using server-to-server communication in parallel file systems to simplify consistency and improve performance. In SC'08, page 6, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Chard, S. Tuecke, and I. Foster. Globus: Recent enhancements and future plans. In XSEDE'16, page 27. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. Chen and C. Guestrin. XGBoost: A scalable tree boosting system. arXiv preprint arXiv:1603.02754, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Crowcroft and P. Oechslin. Differentiated end-to-end internet services using a weighted proportional fair sharing TCP. SIGCOMM Comput. Commun. Rev., 28(3):53--69, July 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E. Dart, L. Rotman, B. Tierney, M. Hester, and J. Zurawski. The Science DMZ: A network design pattern for data-intensive science. Scientific Programming, 22(2):173--185, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. FDT. FDT - Fast Data Transfer. http://monalisa.cern.ch/FDT/.Google ScholarGoogle Scholar
  13. J. Gao and N. S. V. Rao. TCP AIMD dynamics over Internet connections. IEEE Communications Letters, 9:4--6, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  14. I. Guyon and A. Elisseeff. An introduction to variable and feature selection. J. Mach. Learn. Res., 3:1157--1182, Mar. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. J. Hacker, B. D. Athey, and B. Noble. The end-to-end performance effects of parallel TCP sockets on a lossy wide-area network. In 16th Intl Parallel and Distributed Processing Symp., page 314, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Hanemann, J. W. Boote, E. L. Boyd, J. Durand, L. Kudarimoti, R. Lapacz, D. M. Swany, S. Trocha, and J. Zurawski. PerfSONAR: A service oriented architecture for multi-domain network monitoring. In 3rd Intl Conf.\ on Service-Oriented Computing, pages 241--254, Berlin, Heidelberg, 2005. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. iperf3. http://software.es.net/iperf/.Google ScholarGoogle Scholar
  18. T. Ito, H. Ohsaki, and M. Imase. GridFTP-APT: Automatic parallelism tuning mechanism for data transfer protocol GridFTP. In 6th IEEE Intl Symp.\ on Cluster Computing and the Grid, pages 454--461, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E.-S. Jung, R. Kettimuthu, and V. Vishwanath. Toward optimizing disk-to-disk transfer on 100G networks. In 7th IEEE Intl Conf.\ on Advanced Networks and Telecommunications Systems, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  20. T. Kelly. Scalable TCP: Improving performance in highspeed wide area networks. ACM SIGCOMM Computer Communication Review, 33(2):83--91, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Kettimuthu, G. Vardoyan, G. Agrawal, and P. Sadayappan. Modeling and optimizing large-scale wide-area data transfers. 14th IEEE/ACM Intl Symp.\ on Cluster, Cloud and Grid Computing, 0:196--205, 2014.Google ScholarGoogle Scholar
  22. J. Kim, E. Yildirim, and T. Kosar. A highly-accurate and low-overhead prediction model for transfer throughput optimization. Cluster Computing, 18(1):41--59, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. E. Kissel, M. Swany, B. Tierney, and E. Pouyoul. Efficient wide area data transfer protocols for 100 Gbps networks and beyond. In 3rd Intl Workshop on Network-Aware Data Management, page 3. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. G. Kola and M. K. Vernon. Target bandwidth sharing using endhost measures. Perform. Eval., 64(9--12):948--964, Oct. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. Kosar, G. Kola, and M. Livny. Data pipelines: Enabling large scale multi-protocol data transfers. In 2nd Workshop on Middleware for Grid Computing, pages 63--68, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. N. Liu, C. Carothers, J. Cope, P. Carns, R. Ross, A. Crume, and C. Maltzahn. Modeling a leadership-scale storage system. In Parallel Processing and Applied Mathematics, pages 10--19. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Z. Liu, P. Balaprakash, R. Kettimuthu, and I. Foster. Explaining wide area data transfer performance. http://hdl.handle.net/11466/globus_A4N55BB, 2017.Google ScholarGoogle Scholar
  28. D. Lu, Y. Qiao, P. Dinda, and F. Bustamante. Characterizing and predicting TCP throughput on the wide area network. In 25th IEEE Intl Conf.\ on Distributed Computing Systems, pages 414--424, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Lu, Y. Qiao, P. A. Dinda, and F. E. Bustamante. Modeling and taming parallel TCP on the wide area network. In 19th IEEE Intl Parallel and Distributed Processing Symp., page 68b, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. H. Ohsaki and M. Imase. On modeling GridFTP using fluid-flow approximation for high speed Grid networking. In Symp.\ on Applications and the Internet--Workshops, pages 638--, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Padhye, V. Firoiu, D. F. Towsley, and J. F. Kurose. Modeling TCP Reno performance: A simple model and its empirical validation. IEEE/ACM Trans.\ Networking, 8(2):133--145, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. B. W. Settlemyer, J. D. Dobson, S. W. Hodson, J. A. Kuehn, S. W. Poole, and T. M. Ruwart. A technique for moving large data sets over high-performance long distance networks. In 27th Symp.\ on Mass Storage Systems and Technologies, pages 1--6, May 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. B. Tierney, W. Johnston, B. Crowley, G. Hoo, C. Brooks, and D. Gunter. The NetLogger methodology for high performance distributed systems performance analysis. In 7th Intl Symp.\ on High Performance Distributed Computing, pages 260--267, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. G. Vardoyan, N. S. V. Rao, and D. Towsley. Models of TCP in high-BDP environments and their experimental validation. In 24th Intl Conf.\ on Network Protocols, pages 1--10, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  35. S. Vazhkudai and J. Schopf. Using regression techniques to predict large data transfers. Int. J. High Perf. Comp. Appl., 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. D. X. Wei, C. Jin, S. H. Low, and S. Hegde. FAST TCP: Motivation, architecture, algorithms, performance. IEEE/ACM Trans.\ Networking, 14(6):1246--1259, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. W. Weibull. A statistical distribution function of wide applicability. Journal of Applied Mechanics, pages 293--297, 1951.Google ScholarGoogle Scholar
  38. R. Wolski. Forecasting network performance to support dynamic scheduling using the Network Weather Service. In 6th IEEE Symp.\ on High Performance Distributed Computing, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. J. M. Wozniak, S. W. Son, and R. Ross. Distributed object storage rebuild analysis via simulation with GOBS. In Intl Conf.\ on Dependable Systems and Networks Workshops, pages 23--28, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Q. M. Wu, K. Xie, M. F. Zhu, L. M. Xiao, and L. Ruan. DMFSsim: A distributed metadata file system simulator. Applied Mechanics and Materials, 241:1556--1561, 2013.Google ScholarGoogle Scholar
  41. E. Yildirim, D. Yin, and T. Kosar. Prediction of optimal parallelism level in wide area data transfers. IEEE Trans. Parallel Distrib. Syst., 22(12):2033--2045, Dec. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Explaining Wide Area Data Transfer Performance

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        HPDC '17: Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing
        June 2017
        254 pages
        ISBN:9781450346993
        DOI:10.1145/3078597

        Copyright © 2017 ACM

        Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 26 June 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        HPDC '17 Paper Acceptance Rate19of100submissions,19%Overall Acceptance Rate166of966submissions,17%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader