skip to main content
10.1145/2882903.2915237acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

Simba: Efficient In-Memory Spatial Analytics

Authors Info & Claims
Published:14 June 2016Publication History

ABSTRACT

Large spatial data becomes ubiquitous. As a result, it is critical to provide fast, scalable, and high-throughput spatial queries and analytics for numerous applications in location-based services (LBS). Traditional spatial databases and spatial analytics systems are disk-based and optimized for IO efficiency. But increasingly, data are stored and processed in memory to achieve low latency, and CPU time becomes the new bottleneck. We present the Simba (Spatial In-Memory Big data Analytics) system that offers scalable and efficient in-memory spatial query processing and analytics for big spatial data. Simba is based on Spark and runs over a cluster of commodity machines. In particular, Simba extends the Spark SQL engine to support rich spatial queries and analytics through both SQL and the DataFrame API. It introduces indexes over RDDs in order to work with big spatial data and complex spatial operations. Lastly, Simba implements an effective query optimizer, which leverages its indexes and novel spatial-aware optimizations, to achieve both low latency and high throughput. Extensive experiments over large data sets demonstrate Simba's superior performance compared against other spatial analytics system.

References

  1. http://www.comp.nus.edu.sg/~dbsystem/source.html.Google ScholarGoogle Scholar
  2. https://github.com/amplab/spark-indexedrdd.Google ScholarGoogle Scholar
  3. Apache accumulo. http://accumulo.apache.org.Google ScholarGoogle Scholar
  4. Apache avro project. http://avro.apache.org.Google ScholarGoogle Scholar
  5. Apache parquet project. http://parquet.incubator.apache.org.Google ScholarGoogle Scholar
  6. Apache spark project. http://spark.apache.org.Google ScholarGoogle Scholar
  7. Apache zookeeper. https://zookeeper.apache.org/.Google ScholarGoogle Scholar
  8. Gdelt project. http://www.gdeltproject.org.Google ScholarGoogle Scholar
  9. Openstreepmap project. http://www.openstreetmap.org.Google ScholarGoogle Scholar
  10. R project for statistical computing. http://www.r-project.org.Google ScholarGoogle Scholar
  11. A. Aji, F. Wang, H. Vo, R. Lee, Q. Liu, X. Zhang, and J. Saltz. Hadoop gis: a high performance spatial data warehousing system over mapreduce. In VLDB, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Akdogan, U. Demiryurek, F. Banaei-Kashani, and C. Shahabi. Voronoi-based geospatial query processing with mapreduce. In CouldCom, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, et~al. Spark sql: Relational data processing in spark. In SIGMOD, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger. The r*-tree: An efficient and robust access method for points and rectangles. In SIGMOD, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Cary, Z. Sun, V. Hristidis, and N. Rishe. Experiences on processing spatial data with mapreduce. In Scientific and Statistical Database Management, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. TOCS, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Chaudhuri. An overview of query optimization in relational systems. In PODS, pages 34--43, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Eldawy, L. Alarabi, and M. F. Mokbel. Spatial partitioning techniques in spatial hadoop. PVLDB, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Eldawy, Y. Li, M. F. Mokbel, and R. Janardan. Cg_hadoop: computational geometry in mapreduce. In SIGSPATIAL, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Eldawy and M. F. Mokbel. Pigeon: A spatial mapreduce language. In ICDE, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  22. A. Eldawy and M. F. Mokbel. Spatialhadoop: A mapreduce framework for spatial data. In ICDE, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  23. A. Guttman. R-trees: A dynamic index structure for spatial searching. In SIGMOD, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. N. Hughes, A. Annex, C. N. Eichelberger, A. Fox, A. Hulbert, and M. Ronquest. Geomesa: a distributed architecture for spatio-temporal fusion. In SPIE DefenseGoogle ScholarGoogle Scholar
  25. Security, 2015.Google ScholarGoogle Scholar
  26. V. Leis, A. Kemper, and T. Neumann. The adaptive radix tree: Artful indexing for main-memory databases. In ICDE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. T. Leutenegger, M. Lopez, J. Edgington, et~al. STR: A simple and efficient algorithm for R-tree packing. In ICDE, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. W. Lu, Y. Shen, S. Chen, and B. C. Ooi. Efficient processing of k nearest neighbor joins using mapreduce. In VLDB, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Q. Ma, B. Yang, W. Qian, and A. Zhou. Query processing of massive trajectory data based on mapreduce. In Proceedings of the first international workshop on Cloud data management, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Nishimura, S. Das, D. Agrawal, and A. El~Abbadi. MD-hbase: design and implementation of an elastic data infrastructure for cloud-scale location services. In DAPD, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. N. Roussopoulos, S. Kelley, and F. Vincent. Nearest neighbor queries. In SIGMOD, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. H. Samet. Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling). Morgan Kaufmann Publishers Inc., 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: A warehousing solution over a map-reduce framework. In PVDLB, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. H. Vo, A. Aji, and F. Wang. Sato: A spatial data partitioning framework for scalable query processing. In SIGSPATIAL, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. R. S. Xin, J. Rosen, M. Zaharia, M. J. Franklin, S. Shenker, and I. Stoica. Shark: Sql and rich analytics at scale. In SIGMOD, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. S. You, J. Zhang, and L. Gruenwald. Large-scale spatial join query processing in cloud. In IEEE CloudDM workshop (To Appear), 2015.Google ScholarGoogle ScholarCross RefCross Ref
  38. J. Yu, J. Wu, and M. Sarwat. Geospark: A cluster computing framework for processing large-scale spatial data. In SIGSPATIAL GIS, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. C. Zhang, F. Li, and J. Jestes. Efficient parallel knn joins for large data in mapreduce. In EDBT, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. S. Zhang, J. Han, Z. Liu, K. Wang, and S. Feng. Spatial queries evaluation with mapreduce. In ICGCC, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. S. Zhang, J. Han, Z. Liu, K. Wang, and Z. Xu. Sjmr: Parallelizing spatial join with mapreduce on clusters. In IEEE ICCC, 2009.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Simba: Efficient In-Memory Spatial Analytics

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
      June 2016
      2300 pages
      ISBN:9781450335317
      DOI:10.1145/2882903

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 June 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader