skip to main content
10.1145/3183713.3183743acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

DITA: Distributed In-Memory Trajectory Analytics

Published:27 May 2018Publication History

ABSTRACT

Trajectory analytics can benefit many real-world applications, e.g., frequent trajectory based navigation systems, road planning, car pooling, and transportation optimizations. Existing algorithms focus on optimizing this problem in a single machine. However, the amount of trajectories exceeds the storage and processing capability of a single machine, and it calls for large-scale trajectory analytics in distributed environments. The distributed trajectory analytics faces challenges of data locality aware partitioning, load balance, easy-to-use interface, and versatility to support various trajectory similarity functions. To address these challenges, we propose a distributed in-memory trajectory analytics system DITA. We propose an effective partitioning method, global index and local index, to address the data locality problem. We devise cost-based techniques to balance the workload. We develop a filter-verification framework to improve the performance. Moreover, DITA can support most of existing similarity functions to quantify the similarity between trajectories. We integrate our framework seamlessly into Spark SQL, and make it support SQL and DataFrame API interfaces. We have conducted extensive experiments on real world datasets, and experimental results show that DITA outperforms existing distributed trajectory similarity search and join approaches significantly.

References

  1. Apache avro. https://avro.apache.org/.Google ScholarGoogle Scholar
  2. Apache parquet. https://parquet.apache.org/.Google ScholarGoogle Scholar
  3. A. Aji, F. Wang, H. Vo, R. Lee, Q. Liu, X. Zhang, and J. Saltz. Hadoop gis: a high performance spatial data warehousing system over mapreduce. PVLDB, 6:1009--1020, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. H. Alt and M. Godau. Computing the fréchet distance between two polygonal curves. Int. J. Comput. Geometry Appl., 5:75--91, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  5. M. Armbrust, R. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, and M. Zaharia. Spark sql: relational data processing in spark. In SIGMOD, pages 1383--1394, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y. Asahiro, E. Miyano, H. Ono, and K. Zenmyo. Graph orientation algorithms to minimize the maximum outdegree. Int. J. Found. Comput. Sci., 18:197--215, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  7. P. Bakalov, M. Hadjieleftheriou, E. J. Keogh, and V. J. Tsotras. Efficient trajectory joins using symbolic representations. In Mobile Data Management, pages 86--93, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Bakalov, M. Hadjieleftheriou, and V. J. Tsotras. Time relaxed spatiotemporal trajectory joins. In GIS, pages 182--191, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. Chen and R. T. Ng. On the marriage of lp-norms and edit distance. In VLDB, pages 792--803, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Chen, M. T. Özsu, and V. Oria. Robust and fast similarity search for moving object trajectories. In SIGMOD, pages 491--502, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Cudré-Mauroux, E. Wu, and S. Madden. Trajstore: An adaptive storage system for very large trajectory data sets. In ICDE, pages 109--120, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  12. H. Ding, G. Trajcevski, and P. Scheuermann. Efficient similarity join of large sets of moving object trajectories. In TIME, pages 79--87, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, and E. J. Keogh. Querying and mining of time series data: experimental comparison of representations and distance measures. PVLDB, 1:1542--1552, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Eldawy and M. F. Mokbel. Spatialhadoop: A mapreduce framework for spatial data. In ICDE, pages 1352--1363, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  15. C. Engle, A. Lupher, R. Xin, M. Zaharia, M. J. Franklin, S. Shenker, and I. Stoica. Shark: fast data analysis using coarse-grained distributed memory. In SIGMOD, pages 689--692, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching in time-series databases. In SIGMOD, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. Fang, R. Cheng, W. Tang, S. Maniu, and X. S. Yang. Scalable algorithms for nearest-neighbor joins on big trajectory data. ICDE, pages 1528--1529, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  18. E. Frentzos, K. Gratsias, and Y. Theodoridis. Index-based most similar trajectory search. In ICDE, pages 816--825, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  19. A. W.-C. Fu, P. M. shuen Chan, Y.-L. Cheung, and Y. S. Moon. Dynamic vptree indexing for n-nearest neighbor search given pair-wise distances. VLDBJ, 9:154--173, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Gaffney and P. Smyth. Trajectory clustering with mixtures of regression models. In KDD, pages 63--72, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. E. J. Keogh. Exact indexing of dynamic time warping. In VLDB, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J.-G. Lee, J. Han, and X. Li. Trajectory outlier detection: A partition-and-detect framework. In ICDE, pages 140--149, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J.-G. Lee, J. Han, X. Li, and H. Gonzalez. Traclass: trajectory classification using hierarchical region-based and trajectory-based clustering. PVLDB, 1:1081--1094, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J.-G. Lee, J. Han, and K.-Y. Whang. Trajectory clustering: a partition-and-group framework. In SIGMOD, pages 593--604, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. T. Leutenegger, J. M. Edgington, and M. A. López. Str: A simple and efficient algorithm for r-tree packing. In ICDE, pages 497--506, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Z. Li, B. Ding, J. Han, and R. Kays. Swarm: Mining relaxed temporal moving object clusters. PVLDB, 3:723--734, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Z. Li, M. Ji, J.-G. Lee, L. A. Tang, Y. Yu, J. Han, and R. Kays. Movemine: mining moving object databases. In SIGMOD, pages 1203--1206, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. X. Lin, S. Ma, H. Zhang, T. Wo, and J. Huai. One-pass error bounded trajectory simplification. PVLDB, 10:841--852, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. C. Long, R. C.-W. Wong, and H. V. Jagadish. Direction-preserving trajectory simplification. PVLDB, 6:949--960, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. C. Long, R. C.-W. Wong, and H. V. Jagadish. Trajectory simplification: On minimizing the direction-based error. PVLDB, 8:49--60, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. S. Myers and L. R. Rabiner. A comparative study of several dynamic timewarping algorithms for connected-word recognition. Bell System Technical Journal, 60:1389--1409, 1981.Google ScholarGoogle ScholarCross RefCross Ref
  32. N. Pelekis, I. Kopanakis, E. E. Kotsifakos, E. Frentzos, and Y. Theodoridis. Clustering trajectories of moving objects in an uncertain world. In ICDM, pages 417--427, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. Ranu, D. P, A. Telang, P. Deshpande, and S. Raghavan. Indexing and matching trajectories under inconsistent sampling rates. ICDE, pages 999--1010, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  34. S. Ray, A. D. Brown, N. Koudas, R. Blanco, and A. K. Goel. Parallel in-memory trajectory-based spatiotemporal topological join. Big Data, pages 361--370, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. L. K. Sharma, O. P. Vyas, S. Scheider, and A. K. Akasapu. Nearest neighbour classification for trajectory data. In ICT, pages 180--185, 2010.Google ScholarGoogle Scholar
  36. N. Ta, G. Li, Y. Xie, C. Li, S. Hao, and J. Feng:. Signature-based trajectory similarity join. IEEE Trans. Knowl. Data Eng., 29(4):870--883, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. N. Ta, G. Li, T. Zhao, J. Feng, H. Ma, and Z. Gong. An efficient ride-sharing framework for maximizing shared route. IEEE Trans. Knowl. Data Eng., 30(2):219-- 233, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  38. H. Tan, W. Luo, and L. M. Ni. Clost: a hadoop-based storage system for big spatio-temporal data analytics. In CIKM, pages 2139--2143, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. K. Toohey and M. Duckham. Trajectory similarity measures. SIGSPATIAL Special, 7:43--50, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. J. K. Uhlmann. Satisfying general proximity/similarity queries with metric trees. Inf. Process. Lett., 40:175--179, 1991.Google ScholarGoogle ScholarCross RefCross Ref
  41. M. Vlachos, D. Gunopulos, and G. Kollios. Discovering similar multidimensional trajectories. In ICDE, pages 673--684, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. M. Vlachos, M. Hadjieleftheriou, D. Gunopulos, and E. J. Keogh. Indexing multidimensional time-series with support for multiple distance measures. In KDD, pages 216--225, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. H. Wang, H. Su, K. Zheng, S. W. Sadiq, and X. Zhou. An effectiveness study on trajectory similarity measures. In ADC, pages 13--22, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. H. Wang, K. Zheng, X. Zhou, and S. W. Sadiq. Sharkdb: An in-memory storage system for massive trajectory data. In SIGMOD, pages 1099--1104, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. X. Wang, H. Ding, G. Trajcevski, P. Scheuermann, and E. J. Keogh. Experimental comparison of representation methods and distance measures for time series data. Data Mining and Knowledge Discovery, 26:275--309, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. D. Xie, F. Li, and J. Phillips. Distributed trajectory similarity search. PVLDB, 10:1478--1489, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. D. Xie, F. Li, B. Yao, G. Li, L. Zhou, and M. Guo. Simba: efficient in-memory spatial analytics. In SIGMOD, pages 1071--1085, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. B.-K. Yi, H. V. Jagadish, and C. Faloutsos. Efficient retrieval of similar time sequences under time warping. In ICDE, pages 201--208, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. P. N. Yianilos. Data structures and algorithms for nearest neighbor search in general metric spaces. In SODA, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. S. You, J. Zhang, and L. Gruenwald. Large-scale spatial join query processing in cloud. In ICDE Workshops, pages 34--41, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  51. J. Yu, J. Wu, and M. Sarwat. Geospark: A cluster computing framework for processing large-scale spatial data. In SIGSPATIAL/GIS, page 70, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. S. Zhang, J. Han, Z. Liu, K. Wang, and Z. Xu. Sjmr: Parallelizing spatial join with mapreduce on clusters. In CLUSTER, pages 1--8, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  53. Y. Zheng and X. Zhou. Computing with spatial trajectories. Springer Science &Business Media, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. DITA: Distributed In-Memory Trajectory Analytics

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
        May 2018
        1874 pages
        ISBN:9781450347037
        DOI:10.1145/3183713

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 27 May 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        SIGMOD '18 Paper Acceptance Rate90of461submissions,20%Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader