skip to main content
survey

Survey: Live Migration and Disaster Recovery over Long-Distance Networks

Authors Info & Claims
Published:19 July 2016Publication History
Skip Abstract Section

Abstract

We study the virtual machine live migration (LM) and disaster recovery (DR) from a networking perspective, considering long-distance networks, for example, between data centers. These networks are usually constrained by limited available bandwidth, increased latency and congestion, or high cost of use when dedicated network resources are used, while their exact characteristics cannot be controlled. LM and DR present several challenges due to the large amounts of data that need to be transferred over long-distance networks, which increase with the number of migrated or protected resources. In this context, our work presents the way LM and DR are currently being performed and their operation in long-distance networking environments, discussing related issues and bottlenecks and surveying other works. We also present the way networks are evolving today and the new technologies and protocols (e.g., software-defined networking, or SDN, and flexible optical networks) that can be used to boost the efficiency of LM and DR over long distances. Traffic redirection in a long-distance environment is also an important part of the whole equation, since it directly affects the transparency of LM and DR. Related works and solutions both from academia and the industry are presented.

References

  1. R. Ahmad, A. Gani, S. Hamid, M. Shiraz, A. Yousafzai, and F. Xia. 2015. A survey on virtual machine migration and server consolidation frameworks for cloud data centers. Journal of Network and Computer Applications 52, 11--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Akoush, R. Sohan, B. Roman, A. Rice, and A. Hopper. 2011. Activity based sector synchronisation: Efficient transfer of disk-state for wan live migration. MASCOTS. 22--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alcatel Lucent. 2013. Bell Labs, The Cloud-Optimized MAN and WAN: Leveraging a Multi-Layer SDN Framework to Deliver Scalable and Agile Cloud Services.Google ScholarGoogle Scholar
  4. O. Alhazmi and Y. Malaiya. 2013. Evaluating disaster recovery plans using the cloud. IEEE Reliability and Maintainability Symposium.Google ScholarGoogle Scholar
  5. S. Al-Kiswany, D. Subhraveti, P. Sarkar, and M. Ripeanu. 2011. VMFlock: Virtual machine co-migration for the cloud. International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC’11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Amazon. 2011. Summary of the Amazon EC2 and Amazon rds service disruption in the us east region. http://aws.amazon.com/message/65648/.Google ScholarGoogle Scholar
  7. Amazon. 2015. EC2 Instances. https://aws.amazon.com/ec2/instance-types./ Retrieved November 2015.Google ScholarGoogle Scholar
  8. Amazon. 2016. Route 53. http://aws.amazon.com/route53/.Google ScholarGoogle Scholar
  9. Amazon. 2016. Using Amazon Web Services for Disaster Recovery. http://media.amazonwebservices.com/ AWS_Disaster_Recovery.pdf.Google ScholarGoogle Scholar
  10. A. Anand, V. Sekar, and A. Akella. 2009. SmartRE: An architecture for coordinated network-wide redundancy elimination. SIGCOMM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Asensio and L. Velasco. Managing transfer-based datacenter connections. Journal of Optical Communications and Networking 6, 7, 660--669.Google ScholarGoogle Scholar
  12. A. Asensio, M. Ruiz, and L. Velasco. 2015. Orchestrating connectivity services to support elastic operations in datacenter federations. Photonic Network Communications. 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. F. Balus, D. Stiliadis, and N. Bitar. 2012. Federated SDN-based Controllers for NVO3. www.ietf.org/proceedings/86/slides/slides-86-nvo3-7.pdf.Google ScholarGoogle Scholar
  14. J. Barrera, M. Ruiz, and L. Velasco. 2015. Orchestrating virtual machine migrations in telecom clouds. In Proceedings of IEEE/OSA Optical Fiber Communication Conference (OFC’15).Google ScholarGoogle Scholar
  15. A. Bianco, J. Finochietto, L. Giraudo, M. Modesti, and F. Neri. 2008. Network planning for disaster recovery. IEEE Workshop in Local and Metropolitan Area Networks. 43--48.Google ScholarGoogle Scholar
  16. S. Bose, S. Brock, R. Skeoch, and S. Rao. 2011. Cloud spider: Combining replication with scheduling for optimizing live migration of virtual machines across wide area networks. IEEE CCGRID. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Boughzala, R. Ben Ali, M. Lemay, Y. Lemieux, and O. Cherkaoui. 2011. OpenFlow supporting inter-domain virtual machine migration. International Conference on Wireless and Optical Communications Networks.Google ScholarGoogle Scholar
  18. R. Bradford, E. Kotsovinos, A. Feldmann, and H. Schioberg. 2007. Live wide-area migration of virtual machines including local persistent state. International Conference on Virtual Execution Environments (VEE’07). 169--179. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. C. Bressoud and F. B. Schneider. 1996. Hypervisor based fault tolerance. ACM Transactions on Computer Systems (TOCS). 14, 1, 80--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. BT. 2015. Hourly Network Summary. http://ippm.bt.net./ Retrieved November 2015.Google ScholarGoogle Scholar
  21. M. Casado, T. Koponen, R. Ramanathan, and S. Shenker. 2010. Virtualizing the network forwarding plane. ACM SIGCOMM Workshop on Programmable Routers for Extensible Services of Tomorrow. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. W. Cerroni. 2015. Network performance of multiple virtual machine live migration in cloud federations. Journal of Internet Services and Applications. 6, 1, 1--20.Google ScholarGoogle ScholarCross RefCross Ref
  23. R. Chakravorty, S. Katti, J. Crowcroft, and I. Pratt. 2003. Flow aggregation for enhanced TCP over wide-area wireless. INFOCOM. 1754--1764.Google ScholarGoogle Scholar
  24. X. Chen, S. Chen, F. Tseng, L. Chou, and H. Chao. 2013. Minimizing virtual machine migration probability for cloud environments. HPCC.Google ScholarGoogle Scholar
  25. K. Christodoulopoulos, I. Tomkos, and E. A. Varvarigos. 2011. Elastic bandwidth allocation in flexible OFDM-based optical networks. Journal of Lightwave Technology. 29, 9, 1354--1366.Google ScholarGoogle ScholarCross RefCross Ref
  26. Cisco. 2006. InfiniBand SDR, DDR, and QDR Technology GuideGoogle ScholarGoogle Scholar
  27. Cisco. 2015. Cisco Visual Networking Index: Forecast and Methodology, 2014-2019. http://www.cisco.com/ c/en/us/solutions/collateral/service-provider/ip-ngn-ip-next-generation-network/white_paper_c11-4813 60.html. Retrieved November 2015.Google ScholarGoogle Scholar
  28. Cloudping. 2015. http://www.cloudping.info./ Retrieved November 2015.Google ScholarGoogle Scholar
  29. CloudFlare. 2011. A Brief Primer on Anycast. http://blog.cloudflare.com/a-brief-anycast-primer.Google ScholarGoogle Scholar
  30. CloudFlare. 2013. Load Balancing Without Load Balancers.Google ScholarGoogle Scholar
  31. Gluster. 2015. Managing GlusterFS Geo-replication. http://www.gluster.org/community/documentation/index. php/Gluster_3.2:_Managing_GlusterFS_Geo-replication. Retrieved September 2015.Google ScholarGoogle Scholar
  32. C. Clark, K. Fraser, S. Hand, and J. G. Hansen. 2005. Live migration of virtual machines. Network System Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Contrail EU project. 2014. Overview of the Contrail System, Components and Usage.Google ScholarGoogle Scholar
  34. T. Costello. 2012. Business continuity: Beyond disaster recovery. Journal IT Professional. 14, 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. R. Couto, S. Secci, M. Campista, and L. Costa. 2014. Network design requirements for disaster resilience in IaaS clouds. IEEE Communications Magazine. 52, 10, 52--58.Google ScholarGoogle ScholarCross RefCross Ref
  36. R. Couto, S. Secci, M. Campista, and L. Costa. 2015. Server placement with shared backups for disaster-resilient clouds. Computer Networks. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, and A. Warfield. 2008. Remus: High availability via asynchronous virtual machine replication. USENIX Symposium on Networked Systems Design and Implementation. 161--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. D. Darsena, G. Gelli, A. Manzalini, F. Melito, and F. Verde. 2013. Live migration of virtual machines among edge networks via WAN links. IEEE Future Network and Mobile Summit (FutureNetworkSummit’13).Google ScholarGoogle Scholar
  39. B. Davie and J. Gross. 2014 April. Stateless transport tunneling protocol for network virtualization (STT). Draft-Davie-Stt-06 (Work in Progress).Google ScholarGoogle Scholar
  40. U. Deshpande, U. Kulkarni, and K. Gopalan. 2012. Inter-rack live migration of multiple virtual machines. International Workshop on Virtualization Technologies in Distributed Computing (VTDC’12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. U. Deshpande, X. Wang, and K. Gopalan. 2011. Live gang migration of virtual machines. International ACM Symposium on High Performance Parallel and Distributed Computing (HPDC’11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Y. Dong, W. Ye, Y. Jiang, I. Pratt, S. Ma, J. Li, and H. Guan. 2013. COLO: COarse-grained LOck-stepping virtual machines for non-stop service. Symposium on Cloud Computing (SOCC’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. D. Erickson, G. Gibb, B. Heller, D. Underhill, J. Naous, G. Appenzeller, G. Parulkar, N. McKeown, M. Rosenblum, M. Lam, S. Kumar, V. Alaria, P. Monclus, F. Bonomi, J. Tourrilhes, P. Yalagandula, S. Banerjee, C. Clark, and R. McGeer. 2008. Demo: A demonstration of virtual machine mobility in an openflow network. ACM SIGCOMM.Google ScholarGoogle Scholar
  44. Ericsson Review. 2015. IP-optical convergence: A complete solution. https://www.ericsson.com/res/thecompany/docs/publications/ericsson_review/2014/er-ip-optical-convergence.pdf. Retrieved October 2015.Google ScholarGoogle Scholar
  45. EVault Cloud Disaster Recovery. 2014. http://www.seagate.com/files/www-content/services-software/cloud-resiliency-services/_shared/masters/docs/wp-cloud-disaster-recovery-ready-for-midmarket-2014-09-0019-w-us.pdf.Google ScholarGoogle Scholar
  46. Facebook Hits New Peak Of 1 Billion Users On A Single Day. 2015 http://techcrunch.com/2015/08/27/ facebook-hits-1-billion-users-in-a-single-day/. Retrieved October 2015.Google ScholarGoogle Scholar
  47. T. C. Ferreto, M. A. S. Netto, R. N. Calheiros, and C. A. F. De Rose. 2011. Server consolidation with migration control for virtualized data centers. Future Generation Computer Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. F5 Network and VMware. 2011. Enabling Long Distance Live Migration with F5 and VMware vMotion.Google ScholarGoogle Scholar
  49. Forbes. 2015. The big bang: How the cloud is changing resilience in the expanding universe of digital data. http://www.forbes.com/forbesinsights/ibm_big_bang/index.html. Retrieved October 2015.Google ScholarGoogle Scholar
  50. A. Ganguly, A. Agrawal, P. Boykin, and R. Figueiredo. 2006. WOW: self-organizing wide area overlay networks of virtual workstations, International Symposium on High-Performance Distributed Computing.Google ScholarGoogle Scholar
  51. Gartner. 2015. Magic Quadrant Disaster Recovery as a Service. https://www.gartner.com/doc/3033519/magic-quadrant-disaster-recovery-service. Retrieved September 2015.Google ScholarGoogle Scholar
  52. Gartner. 2015. Magic Quadrant for WAN Optimization. https://www.gartner.com/doc/3008618/magic-quadrant-wan-optimization. Retrieved September 2015.Google ScholarGoogle Scholar
  53. Gartner. 2015. Magic Quadrant for x86 Server Virtualization Infrastructure. https://www.gartner.com/doc/ 3093222/magic-quadrant-x-server-virtualization. Retrieved September 2015Google ScholarGoogle Scholar
  54. Gartner. 2015. Magic Quadrant for Enterprise Backup Software and Integrated Appliances. https://www.gartner.com/doc/3074822/magic-quadrant-enterprise-backup-software. Retrieved September 2015.Google ScholarGoogle Scholar
  55. B. Gerofi and Y. Ishikawa. 2011. Workload adaptive checkpoint scheduling of virtual machine replication. Pacific Rim International Symposium on Dependable Computing (PRDC’11). 204--213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. O. Gerstel, M. Jinno, A. Lord, and S. J. B. Yoo. 2012. Elastic optical networking: A new dawn for the optical layer? IEEE Communication Magazine. 50, 2, 12--20.Google ScholarGoogle ScholarCross RefCross Ref
  57. S. Ghorbani, C. Schlesinger, M. Monaco, E. Keller, M. Caesar, J. Rexford, and D. Walker. 2014. Transparent, live migration of a software-defined network. ACM Symposium on Cloud Computing. 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. V. Gramoli, G. Jourjon, and O. Mehani. 2014. Can SDN mitigate disasters? arXiv:1410.4296.Google ScholarGoogle Scholar
  59. V. Gramoli, G. Jourjon, and O. Mehani. 2015. Disaster-tolerant storage with SDN. International Conference on Networked Systems.Google ScholarGoogle Scholar
  60. F. Hao, T. Lakshman, S. Mukherjee, and H. Song. 2009. Enhancing dynamic cloud-based services using network virtualization. ACM Workshop on Virtualized Infrastructure Systems and Architectures. 37--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. E. Harney, S. Goasguen, J. Martin, M. Murphy, and M. Westall. 2007. The efficacy of live virtual machine migrations over the internet. International Workshop on Virtualization Technology in Distributed Computing (VTDC’07), 1--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. M. R. Hines, U. Deshpande, and K. Gopalan. 2009. Post-copy live migration of virtual machines. ACM SIGOPS Operating Systems Review. 43, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. T. Hirofuchi, H. Nakada, S. Itoh, and S. Sekiguchi. 2012. Kagemusha: A guest-transparent mobile IPv6 mechanism for wide-area live VM migration. IEEE Network Operations and Management Symposium (NOMS’12). 1319--1326.Google ScholarGoogle Scholar
  64. T. Hirofuchi, H. Ogawa, H. Nakada, S. Itoh, and S. Sekiguchi. 2009. A live storage migration mechanism over WAN for relocatable virtual machine services over clouds. IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid’09). 460--465. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. T. Hirofuchi, M. Tsugawa, H. Nakada, and T. Kudoh. 2012. A wan-optimized live storage migration mechanism toward virtual machine evacuation upon severe disasters. IEICE Transactions on Information and Systems 96, 12, 2663--2674.Google ScholarGoogle Scholar
  66. K. Hou, K. G. Shin, Y. Turner, and S. Singhal. 2013. Tradeoffs in compressing virtual machine checkpoints. International Workshop on Virtualization Technologies in Distributed Computing (VTDC’13). 41--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. W. Huang, Q. Gao, J. Liu, and D. K. Panda. 2007. High performance virtual machine migration with RDMA over modern interconnects. IEEE International Conference on Cluster Computing (CLUSTER’07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Huawei. 2014. Huawei Grandly Launches Active-Active Data Center Disaster Recovery Solution. http://pr.huawei.com/en/news/hw-371633-recoverysolution.htm#.VHeShDGUcYM. Retrieved November 2016.Google ScholarGoogle Scholar
  69. Infinera. 2015. http://www.infinera.com./ Retrieved September 2015.Google ScholarGoogle Scholar
  70. Infonetics. 2015. IHS Forecasts Huge Growth for 100 Gigabit Optical Ports as Operators Increase Network Capacity. http://www.infonetics.com/pr/2015/100G-Coherent-Optical-Ports-Highlights.asp. Retrieved September 2015.Google ScholarGoogle Scholar
  71. Infonetics. 2015. Carriers on Track to Spend $5.7B on SDN Hardware, Software and Services by 2019. http://www.infonetics.com/pr/2015/Carrier-SDN-Market-Forecast.asp.Google ScholarGoogle Scholar
  72. InfiniBand Trade Association. 2016. InfiniBand Architecture Specification. http://www.infinibandta.org/.Google ScholarGoogle Scholar
  73. IEEE802, Data Center Bridging. 2013. http://www.ieee802.org/1/pages/dcbridges.html.Google ScholarGoogle Scholar
  74. IETF. 2016. Locator/ID Separation Protocol (lisp). http://datatracker.ietf.org/wg/lisp/charter/.Google ScholarGoogle Scholar
  75. IBM. 2014. High availability vs. fault tolerance. http://www-01.ibm.com/support/knowledgecenter/SSPHQG_ 6.1.0/com.ibm.hacmp.concepts/ha_concepts_fault.htm. Retrieved November 2014.Google ScholarGoogle Scholar
  76. Infonetics. 2015. http://www.infonetics.com/pr/2014/Cloud-Services-IT-Market-Highlights.asp. Retrieved October 2015.Google ScholarGoogle Scholar
  77. ISO/IEC 27031:2011. 2011. http://www.iso.org/iso/catalogue_detail?csnumber=44374.Google ScholarGoogle Scholar
  78. ISO 22301:2012. 2012. http://www.iso.org/iso/catalogue_detail.htm?csnumber=50038.Google ScholarGoogle Scholar
  79. ISO 22313:2012. 2012. http://www.iso.org/iso/catalogue_detail?csnumber=50050.Google ScholarGoogle Scholar
  80. A. Izaddoost and S. Heydari. 2014. Enhancing network service survivability in large-scale failure scenarios. Journal of Communications and Networks 16, 5, 534--547.Google ScholarGoogle ScholarCross RefCross Ref
  81. X. Jiang and D. Xu. 2004. VIOLIN: Virtual internetworking on overlay infrastructure. ISPA. 937--946. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. H. Jin, L. Deng, S. Wu, X. Shi, and X. Pan. 2009. Live virtual machine migration with adaptive memory compression. IEEE International Conference on Cluster Computing.Google ScholarGoogle Scholar
  83. U. Kalim, M. Gardner, E. Brown, and W. Feng. 2013. Seamless migration of virtual machines across networks. IEEE Computer Communications and Networks (ICCCN’13).Google ScholarGoogle Scholar
  84. T. S. Kang, M. Tsugawa, A. Matsunaga, T. Hirofuchi, and J. A. Fortes. 2014. Design and implementation of middleware for cloud disaster recovery via virtual machine migration management. IEEE/ACM 7th International Conference on Utility and Cloud Computing. 166--175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. T. Kang, M. Tsugawa, J. Fortes, and T. Hirofuchi. 2013. Reducing the migration times of multiple VMs on WANs using a feedback controller. IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW’’13). 1480--1489. Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. D. Kapil, E. Pilli, and R. Joshi. 2013. Live virtual machine migration techniques: Survey and research challenges. IEEE International Advance Computing Conference (IACC’13). 963--969.Google ScholarGoogle Scholar
  87. A. Khoshkholghi, A. Abdullah, R. Latip, S. Subramaniam, and M. Othman. 2014. Disaster recovery in cloud computing: A survey. Computer and Information Science. 7, 4.Google ScholarGoogle ScholarCross RefCross Ref
  88. S. Kihara and S. Moriai. 2008. Kemari: Virtual machine synchronization for fault tolerance. USENIX Annual Technical Conference.Google ScholarGoogle Scholar
  89. J. Kim, D. Chae, J. Kim, and J. Kim. 2013. Guide-copy: Fast and silent migration of virtual machine for datacenters. International Conference on High Performance Computing, Networking, Storage and Analysis. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. KVM. 2016. http://www.linux-kvm.org/page/Main_Page.Google ScholarGoogle Scholar
  91. H. Lai, Y. Wu, and Y. Cheng. 2013. Exploiting neigborhood similarity for virtual machine migration over wide-area network. IEEE International Conference on Software Security and Reliability (SERE’13). 149--158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. A. Lenk and S. Tai. 2014. Cloud standby: disaster recovery of distributed systems in the cloud. In Service-Oriented and Cloud Computing. 32--46.Google ScholarGoogle Scholar
  93. L. Lewin-Eytan, K. Barabash, R. Cohen, V. Jain, and A. Levin. 2012. Designing modular overlay solutions for network virtualization. IBM Technical Paper.Google ScholarGoogle Scholar
  94. Q. Li, J. Huai, J. Li, Tianyu Wo, and Minxiong Wen. 2008. HyperMIP: Hypervisor controlled mobile IP for virtual machine live migration across networks. 11th IEEE High Assurance Systems Engineering Symposium. 80--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. H. Liu and B. He. 2015. VMbuddies: Coordinating live migration of multi-tier applications in cloud environments. IEEE Transactions on Parallel and Distributed Systems. 26, 4.Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. H. Liu, H Jin, X. Liao, C. Yu, and C. Xu. 2011. Live virtual machine migration via asynchronous replication and state synchronization. IEEE Transactions on Parallel and Distributed Systems. 22, 12, 1986--1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. J. Liu, Y. Li, and D. Jin. 2014. SDN-based live VM migration across datacenters. ACM SIGCOMM. 583--584. Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. Alcatel Lucent and Bell Labs. 2013. Metro Network Traffic Growth: An Architecture Impact Study.Google ScholarGoogle Scholar
  99. D. Malanik and R. Jaek. 2014. The performance of the data-cluster based on the CEPH platform with geographically separated nodes. IEEE International Conference Mathematics and Computers in Sciences and in Industry (MCSI’14). 299--307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. T. Malleswari, D. Malathi, and G. Vadivu. 2015. Deduplication techniques: A technical survey. International Journal for Innovative Research in Science and Technology. 1, 7, 318--325.Google ScholarGoogle Scholar
  101. U. Mandal, M. Habib, S. Zhang, P. Chowdhury, M. Tornatore, and B. Mukherjee. 2014. Heterogeneous bandwidth provisioning for virtual machine migration over SDN-enabled optical networks. IEEE Optical Fiber Communications Conference and Exhibition (OFC’14).Google ScholarGoogle Scholar
  102. V. Mann et al. 2012. Crossroads: Seamless vm mobility across datacenters through software defined networking. IEEE Network Operations and Management Symposium (NOMS’12). 88--96.Google ScholarGoogle ScholarCross RefCross Ref
  103. A. J. Mashtizadeh, M. Cai, G. Tarasuk-Levin, R. Koller, T. Garfinkel, and S. Setty. 2014. XvMotion: Unified virtual machine migration over long distance. USENIX Annual Technical Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  104. F. Mattos, D. Menezez, and O. C. Muniz Bandeira Duarte. 2014. XenFlow: Seamless migration primitive and quality of service for virtual networks. IEEE Global Communications Conference (GLOBECOM’14). 2326--2331.Google ScholarGoogle Scholar
  105. A. Mayoral, R. Vilalta, R. Munoz, R. Casellas, and R. Martinez. 2015. Experimental seamless virtual machine migration using an integrated SDN IT and network orchestrator. IEEE Optical Fiber Communications Conference and Exhibition (OFC’15).Google ScholarGoogle Scholar
  106. N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker, and J. Turner. 2008. Openflow: Enabling innovation in campus networks. SIGCOMM Computer Communication Review. 38, 2, 69--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. V. Medina and J. García. 2014. A survey of migration mechanisms of virtual machines. ACM Computing Surveys (CSUR). 46, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. Microsoft. 2015. Azure VMs. https://azure.microsoft.com/en-us/pricing/details/virtual-machines. Retrieved November 2015.Google ScholarGoogle Scholar
  109. Microsoft. 2016. Hyper-V. http://www.microsoft.com/en-us/server-cloud/solutions/virtualization.aspx.Google ScholarGoogle Scholar
  110. U. F. Minhas, S. Rajagopalan, B. Cully, A. Aboulnaga, K. Salem, and A. Warfield. 2013. RemusDB: Transparent high availability for database systems. International Journal on Very Large Data Bases (VLDB). 22, 1, 29--45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  111. K. Nagin, D. Hadas, Z. Dubitzky, A. Glikson, I. Loy, B. Rochwerger, and L. Schour. 2011. Inter-cloud mobility of virtual machines. Annual International Conference on Systems and Storage (SYSTOR’11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  112. I. Nakagawa, K. Ichikawa, T. Kondo, Y. Kitaguchi, H. Kashiwazaki, and S. Shimojo. 2014. Transpacific live migration with wide area distributed storage. IEEE Computer Software and Applications Conference (COMPSAC’14). 486--492. Google ScholarGoogle ScholarDigital LibraryDigital Library
  113. NetApp. 2016. SnapMirror. http://www.netapp.com/us/products/protection-software/snapmirror.aspx.Google ScholarGoogle Scholar
  114. Nuage Networks. 2015. http://www.nuagenetworks.net./ Retrieved September 2015.Google ScholarGoogle Scholar
  115. C. Oberg, A. Whitt, and R. Mills. 2011. Disasters will happen-are you ready? IEEE Communications Magazine. 1, 49, 36--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  116. Kei Ohmura. 2011. Rapid VM Synchronization with I/O Emulation Logging-Replay.Google ScholarGoogle Scholar
  117. Open Networking Foundation. 2015. https://www.opennetworking.org. Retrieved November 2015Google ScholarGoogle Scholar
  118. Ovirt. 2015. Storage Live Migration. http://www.ovirt.org/Features/Design/StorageLiveMigration. Retrieved November 2015.Google ScholarGoogle Scholar
  119. A. Peddemors, R. Spoor, P. Dekkers, and C. den Besten. 2011. Using DRBD over Wide Area Networks.Google ScholarGoogle Scholar
  120. P. Pisa, N. Fernandes, H. Carvalho, M. Moreira, M. E. Campista, L. H. Costa, and O. C. Duarte. 2010. OpenFlow and xen-based virtual network migration. Communications: Wireless in Developing Countries and Networks of the Future. 170--181.Google ScholarGoogle Scholar
  121. Y. Pu, Y. Deng, and A. Nakao. 2011. Cloud rack: Enhanced virtual topology migration approach with Open vSwitch. International Conference on Information Networking. 160--164.Google ScholarGoogle Scholar
  122. P. Raad, G. Colombo, D. Chi, S. Secci, A. Cianfrani, P. Gallard, and G. Pujolle. 2013. Achieving sub-second downtimes in internet-wide virtual machine live migrations in LISP networks. IFIP/IEEE International Symposium on Integrated Network Management. 286--293.Google ScholarGoogle Scholar
  123. P. Raad, G. Colombo, D. Phung Chi, S. Secci, A. Cianfrani, P. Gallard, and G. Pujolle. Demonstrating LISP-based virtual machine mobility for cloud networks. IEEE 1st International Conference on Cloud Networking (CLOUDNET’12). 200--202.Google ScholarGoogle Scholar
  124. C. Raiciu, D. Niculescu, M. Bagnulo, and M. J. Handley. 2011. Opportunistic mobility with multipath TCP. MobiArch. Google ScholarGoogle ScholarDigital LibraryDigital Library
  125. S. Rajagopalan, B. Cully, R. O'Connor, and A. Warfield. 2012. Secondsite: Disaster tolerance as a service. ACM SIGPLAN/SIGOPS Conference on Virtual Execution Environments. 97--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  126. S. Rajagopalan, D. Williams, and H. Jamjoom. 2013. Pico replication: A high availability framework for middleboxes. Annual Symposium on Cloud Computing (SOCC’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  127. K. Ramakrishnan, P. Shenoy, and J. Van der Merwe, Live data center migration across WANs: A robust cooperative context aware approach. ACM SIGCOMM, Workshop on Internet Network Management. 262--267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  128. RFC 7348. 2014 August. VxLAN: A framework for overlaying virtualized layer 2 networks over layer 3 networks.Google ScholarGoogle Scholar
  129. P. Riteau, C. Morin, and T. Priol. 2011. Shrinker: Improving live migration of virtual clusters over WANs with distributed data deduplication and content-based addressing. European Conference on Parallel Processing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  130. A. Sahoo, K. Kant, and P. Mohapatra. 2009. BGP convergence delay under large-scale failures: Characterization and solutions. Computer Communications. 32, 7, 1207--1218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  131. P. Samadi, J. Xu, and K. Bergman. 2015. Virtual machine migration over optical circuit switching network in a converged inter/intra data center architecture. Optical Fiber Communication Conference.Google ScholarGoogle Scholar
  132. C. P. Sapuntzakis, R. Chandra, B. Pfaff, J. Chow, M. S. Lam, and M. Rosenblum. 2002. Optimizing the migration of virtual computers. USENIX Symposium on Operating Systems Design and Implementation (OSDI’02). Google ScholarGoogle ScholarDigital LibraryDigital Library
  133. T. Sarker and M. Tang. 2013. Performance-driven live migration of multiple virtual machines in datacenters. IEEE International Conference on Granular Computing (GrC’13).Google ScholarGoogle Scholar
  134. D. J. Scales, M. Nelson, and G. Venkitachalam. 2010. The design of a practical system for fault-tolerant virtual machines. ACM SIGOPS Operating Systems Review. 44, 4, 30--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  135. Serverdensity. 2014. Network performance at AWS, Google, Rackspace and Softlayer. https://blog. serverdensity.com/network-performance-aws-google-rackspace-softlayer. Retrieved November 2015.Google ScholarGoogle Scholar
  136. K. Shima, and N Dang. 2012. Indexes for Distributed File/Storage Systems as a Large Scale Virtual Machine Disk Image Storage in a Wide Area Network.Google ScholarGoogle Scholar
  137. V. Shrivastava, P. Zerfos, L. Kang-won, H. Jamjoom, L. Yew-Huey, and S. Banerjee. 2011. Application-aware virtual machine migration in data centers. IEEE INFOCOM. 66--70.Google ScholarGoogle Scholar
  138. E. Silvera, G. Sharaby, D. Lorenz, and I. Shapira. 2009. IP mobility to support live migration of virtual machines across subnets. SYSTOR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  139. Silver-peak. 2015. Silver Peak and VMware vSphere Replication. https://www.silver-peak.com/sites/default/ files/infoctr/silver-peak_ss_vmware-vsphere-replication.pdf. Retrieved September 2015.Google ScholarGoogle Scholar
  140. A. Snoeren, D. Andersen, and H. Balakrishnan. 2001. Fine-grained failover using connection migration. Conference on USENIX Symposium on Internet Technologies and Systems (USITS’01). Google ScholarGoogle ScholarDigital LibraryDigital Library
  141. Solutions-review. 2015. Backup and Disaster Recovery Buyers Guide. http://solutions-review.com/backup-disaster-recovery/get-a-free-backup-and-disaster-recovery-buyers-guide. Retrieved September 2015.Google ScholarGoogle Scholar
  142. M. Sridharan, K. Duda, I. Ganga, A. Greenberg, G. Lin, M. Pearson, P. Thaler, C. Tumuluri, N. Venkataramiah, and Y. Wang. 2013. NVGRE: Network virtualization using generic routing encapsulation. Draft-Sridharan-Virtualization-Nvgre-03.Google ScholarGoogle Scholar
  143. T. E. Stern and K. Bala. 1999. Multiwavelength Optical Networks: A Layered Approach. Prentice Hall. Google ScholarGoogle ScholarDigital LibraryDigital Library
  144. A. Strunk. 2012. Costs of virtual machine live migration: A survey. IEEE 8th World Congress on Services. 323--329. Google ScholarGoogle ScholarDigital LibraryDigital Library
  145. P. Svard, B. Hudzia, J. Tordsson, and E. Elmroth. 2011. Evaluation of delta compression techniques for effcient live migration of large virtual machines. Conference on Virtual Execution Environments. Google ScholarGoogle ScholarDigital LibraryDigital Library
  146. P. Svärd, B. Hudzia, S. Walsh, J. Tordsson, and E. Elmroth. 2015. Principles and performance characteristics of algorithms for live VM migration. ACM SIGOPS Operating Systems Review. 49, 1, 142--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  147. Y. Tan, H. Jiang, D. Feng, L. Tian, and Z. Yan. 2011. CABdedupe: A causality-based deduplication performance booster for cloud backup services. IEEE International Parallel & Distributed Processing Symposium (IPDPS’’11). 1266--1277. Google ScholarGoogle ScholarDigital LibraryDigital Library
  148. F. Travostino, P. Daspit, L. Gommans, C. Jog, C. de Laat, J. Mambretti, I. Monga, B. van Oudenaarde, S. Raghunath, and P. Yonghui Wang. 2006. Seamless live migration of virtual machines over the long distance. Future Generation Computer Systems. 22, 8, 901--907. Google ScholarGoogle ScholarDigital LibraryDigital Library
  149. M. Tsugawa, P. Riteau, A. Matsunaga, and J. Fortes. 2010. User-level virtual networking mechanisms to support virtual machine migration over multiple clouds. IEEE GLOBECOM Workshops. 568--572.Google ScholarGoogle Scholar
  150. A Vahdat. 2013. Scale and programmability in google's software defined data center WAN. ACM Symposium on Cloud Computing (SoCC’13).Google ScholarGoogle Scholar
  151. Velocloud. 2016. from: http://www.velocloud.com./ Retrieved February 2016.Google ScholarGoogle Scholar
  152. VirtualBox. 2016. https://www.virtualbox.org/.Google ScholarGoogle Scholar
  153. VMWare. 2016. http://www.vmware.com/.Google ScholarGoogle Scholar
  154. VMWare. 2015. Long Distance vMotion requirements in VMware vSphere 6.0. http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd==displayKC&externalId==2106949. Retrieved September 2015.Google ScholarGoogle Scholar
  155. VMware. 2015. vMotion. https://www.vmware.com/products/vsphere/features/vmotion. Retrieved November 2015.Google ScholarGoogle Scholar
  156. VMware. 2015. VMware pushes the envelope with vSphere 6.0 vMotion. https://blogs.vmware.com/ performance/2015/02/vmware-pushes-envelope-vsphere-6-0-vmotion.html. Retrieved September 2015.Google ScholarGoogle Scholar
  157. VMware. 2015. vSphere Replication. http://www.vmware.com/products/vsphere/features/replication. Retrieved September 2015.Google ScholarGoogle Scholar
  158. VMware. 2015c. vSphere 6.0 Advantages Over Hyper-V. https://www.vmware.com/files/pdf/vSphere-6.0-Advantages-Over-Hyper-V.pdf. Retrieved September 2015.Google ScholarGoogle Scholar
  159. VMWare, VMWare vCenter Site Recovery Manager. 2016. https://www.vmware.com/products/site-recovery-manager.Google ScholarGoogle Scholar
  160. VMWare and Cisco. 2009. Virtual Machine Mobility with Vmware VMotion and Cisco Data Center Interconnect Technologies.Google ScholarGoogle Scholar
  161. G. Wang, D. G. Andersen, M. Kaminsky, K. Papagiannaki, T. S. E. Ng, M. Kozuch, and M. P. Ryan. 2010. c-Through: Part-time optics in data centers. ACM SIGCOMM. 327--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  162. Y. Wang, E. Keller, B. Biskeborn, J. van der Merwe, and J. Rexford. 2008. Virtual routers on the move: Live router migration as a network management primitive. ACM SIGCOMM Computer Communication Review. 38, 4, 231--242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  163. L. Wang, H. Ramasamy, R. Harper, M. Viswanathan, and E. Plattier. 2015. Experiences with building disaster recovery for enterprise-class clouds. Annual IEEE/IFIP International Conference on Dependable Systems and Networks. 231--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  164. L. Wang. 2006. Desigh and implementation of TCPHA. Draft Release. http://dragon.linux-vs.org/∼dragonfly/.Google ScholarGoogle Scholar
  165. H. Watanabe, T. Ohigashi, T. Kondo, K. Nishimura, and R. Aibara. 2010. A performance improvement method for the global live migration of virtual machine with IP mobility. International Conference on Mobile Computing and Ubiquitous Networking (ICMU’10).Google ScholarGoogle Scholar
  166. T. Wood, E. Cecchet, K. Ramakrishnan, P. Shenoy, J. Van Der Merwe, and A. Venkataramani. 2010. Disaster recovery as a cloud service: Economic benefits & deployment challenges. 2nd USENIX Workshop on Hot Topics in Cloud Computing. 1--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  167. T. Wood, H. Lagar-Cavilla, K. Ramakrishnan, P. Shenoy, and J. Van der Merwe. 2011. PipeCloud: Using causality to overcome speed-of-light delays in cloud-based disaster recovery. SoCC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  168. T. Wood, K. Ramakrishnan, P. Shenoy, and J. van der Merwe. 2011. CloudNet: Dynamic pooling of cloud resources by live WAN migration of virtual machines. ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  169. Xen. 2016. http://www.xenproject.org/.Google ScholarGoogle Scholar
  170. R. Xie, Y. Wen, X. Jia, and H. Xie. 2014. Supporting seamless virtual machine migration via named data networking in cloud data center. IEEE Transactions on Parallel and Distributed Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  171. K. Ye, X Jiang, R Ma, and F Yan. 2012. VC-migration: Live migration of virtual clusters in the cloud. ACM/IEEE International Conference on Grid Computing (GRID’12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  172. Zerto. 2015. http://www.zerto.com. Retrieved September 2015.Google ScholarGoogle Scholar
  173. W. Zhang, K. T. Lam, and C. L. Wang. 2014. Adaptive live VM migration over a WAN: Modeling and implementation. IEEE International Conference Cloud Computing (CLOUD’13). 368--375. Google ScholarGoogle ScholarDigital LibraryDigital Library
  174. X. Zhang, Z. Huo, J. Ma, and D. Meng. 2010. Exploiting data deduplication to accelerate live virtual machine migration. IEEE International Conference on Cluster Computing. 88--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  175. J. Zheng, T. Eugene Ng, K. Sripanidkulchai, and Z. Liu. 2014. COMMA: Coordinating the migration of multi-tier applications. ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  176. J. Zheng, T. Sing Eugene Ng, and K. Sripanidkulchai. 2011. Workload-aware live storage migration for clouds. ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’11). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Survey: Live Migration and Disaster Recovery over Long-Distance Networks

    Recommendations

    Reviews

    Naga R Narayanaswamy

    Live migration (LM) is dealt with in this paper as a one-time task or function, performed by moving a virtual machine from one physical machine to another, located in the same or a different data center, without interrupting its operation. The paper also introduces readers to disaster recovery (DR) as a set of practices and activities regarding the continuity of operation of the physical and virtual information technology assets of an organization. With more and more companies moving to cloud services, and companies deploying redundant data centers so that businesses can operate 24/7 without failures, the paper is relevant for defining the expectations and needs of data center solutions. The paper defines terms such as RPO (recovery point objective) and RTO (recovery time objective), which are benchmarks for measuring the effectiveness of LM and DR in systems. Because the paper is intended to be a very detailed survey of the landscape comparing companies such as VMWare, Cisco, Netapp, and so on, concepts that optimize the time such as deduplication and compression are surveyed too. There are several research papers that are compared in this paper. Networking terminologies such as BGP multihoming are also surveyed extensively. The paper also surveys many combinations of industry-leading solutions to see how they offer DR options. One example is how Silver Peak solutions can be combined with Netapp's Snapmirror to provide good DR solutions. The paper is targeted at three types of professionals: industry executives and product managers who want to see the landscape and make improvements on their existing products; CIOs and IT professionals interested in what solution will be better for their companies' LM and DR problems; and academics who want to research the existing solutions and come up with completely new paradigms or solutions that will make significant improvements. The paper achieves these objectives by explaining the topic in a very rigid and detailed manner. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Computing Surveys
      ACM Computing Surveys  Volume 49, Issue 2
      June 2017
      747 pages
      ISSN:0360-0300
      EISSN:1557-7341
      DOI:10.1145/2966278
      • Editor:
      • Sartaj Sahni
      Issue’s Table of Contents

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 July 2016
      • Accepted: 1 April 2016
      • Revised: 1 February 2016
      • Received: 1 April 2015
      Published in csur Volume 49, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • survey
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader