ABSTRACT
Trajectory analytics can benefit many real-world applications, e.g., frequent trajectory based navigation systems, road planning, car pooling, and transportation optimizations. Existing algorithms focus on optimizing this problem in a single machine. However, the amount of trajectories exceeds the storage and processing capability of a single machine, and it calls for large-scale trajectory analytics in distributed environments. The distributed trajectory analytics faces challenges of data locality aware partitioning, load balance, easy-to-use interface, and versatility to support various trajectory similarity functions. To address these challenges, we propose a distributed in-memory trajectory analytics system DITA. We propose an effective partitioning method, global index and local index, to address the data locality problem. We devise cost-based techniques to balance the workload. We develop a filter-verification framework to improve the performance. Moreover, DITA can support most of existing similarity functions to quantify the similarity between trajectories. We integrate our framework seamlessly into Spark SQL, and make it support SQL and DataFrame API interfaces. We have conducted extensive experiments on real world datasets, and experimental results show that DITA outperforms existing distributed trajectory similarity search and join approaches significantly.
- Apache avro. https://avro.apache.org/.Google Scholar
- Apache parquet. https://parquet.apache.org/.Google Scholar
- A. Aji, F. Wang, H. Vo, R. Lee, Q. Liu, X. Zhang, and J. Saltz. Hadoop gis: a high performance spatial data warehousing system over mapreduce. PVLDB, 6:1009--1020, 2013. Google ScholarDigital Library
- H. Alt and M. Godau. Computing the fréchet distance between two polygonal curves. Int. J. Comput. Geometry Appl., 5:75--91, 1995.Google ScholarCross Ref
- M. Armbrust, R. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, and M. Zaharia. Spark sql: relational data processing in spark. In SIGMOD, pages 1383--1394, 2015. Google ScholarDigital Library
- Y. Asahiro, E. Miyano, H. Ono, and K. Zenmyo. Graph orientation algorithms to minimize the maximum outdegree. Int. J. Found. Comput. Sci., 18:197--215, 2006.Google ScholarCross Ref
- P. Bakalov, M. Hadjieleftheriou, E. J. Keogh, and V. J. Tsotras. Efficient trajectory joins using symbolic representations. In Mobile Data Management, pages 86--93, 2005. Google ScholarDigital Library
- P. Bakalov, M. Hadjieleftheriou, and V. J. Tsotras. Time relaxed spatiotemporal trajectory joins. In GIS, pages 182--191, 2005. Google ScholarDigital Library
- L. Chen and R. T. Ng. On the marriage of lp-norms and edit distance. In VLDB, pages 792--803, 2004. Google ScholarDigital Library
- L. Chen, M. T. Özsu, and V. Oria. Robust and fast similarity search for moving object trajectories. In SIGMOD, pages 491--502, 2005. Google ScholarDigital Library
- P. Cudré-Mauroux, E. Wu, and S. Madden. Trajstore: An adaptive storage system for very large trajectory data sets. In ICDE, pages 109--120, 2010.Google ScholarCross Ref
- H. Ding, G. Trajcevski, and P. Scheuermann. Efficient similarity join of large sets of moving object trajectories. In TIME, pages 79--87, 2008. Google ScholarDigital Library
- H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, and E. J. Keogh. Querying and mining of time series data: experimental comparison of representations and distance measures. PVLDB, 1:1542--1552, 2008. Google ScholarDigital Library
- A. Eldawy and M. F. Mokbel. Spatialhadoop: A mapreduce framework for spatial data. In ICDE, pages 1352--1363, 2015.Google ScholarCross Ref
- C. Engle, A. Lupher, R. Xin, M. Zaharia, M. J. Franklin, S. Shenker, and I. Stoica. Shark: fast data analysis using coarse-grained distributed memory. In SIGMOD, pages 689--692, 2012. Google ScholarDigital Library
- C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching in time-series databases. In SIGMOD, 1994. Google ScholarDigital Library
- Y. Fang, R. Cheng, W. Tang, S. Maniu, and X. S. Yang. Scalable algorithms for nearest-neighbor joins on big trajectory data. ICDE, pages 1528--1529, 2016.Google ScholarCross Ref
- E. Frentzos, K. Gratsias, and Y. Theodoridis. Index-based most similar trajectory search. In ICDE, pages 816--825, 2007.Google ScholarCross Ref
- A. W.-C. Fu, P. M. shuen Chan, Y.-L. Cheung, and Y. S. Moon. Dynamic vptree indexing for n-nearest neighbor search given pair-wise distances. VLDBJ, 9:154--173, 2000. Google ScholarDigital Library
- S. Gaffney and P. Smyth. Trajectory clustering with mixtures of regression models. In KDD, pages 63--72, 1999. Google ScholarDigital Library
- E. J. Keogh. Exact indexing of dynamic time warping. In VLDB, 2002. Google ScholarDigital Library
- J.-G. Lee, J. Han, and X. Li. Trajectory outlier detection: A partition-and-detect framework. In ICDE, pages 140--149, 2008. Google ScholarDigital Library
- J.-G. Lee, J. Han, X. Li, and H. Gonzalez. Traclass: trajectory classification using hierarchical region-based and trajectory-based clustering. PVLDB, 1:1081--1094, 2008. Google ScholarDigital Library
- J.-G. Lee, J. Han, and K.-Y. Whang. Trajectory clustering: a partition-and-group framework. In SIGMOD, pages 593--604, 2007. Google ScholarDigital Library
- S. T. Leutenegger, J. M. Edgington, and M. A. López. Str: A simple and efficient algorithm for r-tree packing. In ICDE, pages 497--506, 1997. Google ScholarDigital Library
- Z. Li, B. Ding, J. Han, and R. Kays. Swarm: Mining relaxed temporal moving object clusters. PVLDB, 3:723--734, 2010. Google ScholarDigital Library
- Z. Li, M. Ji, J.-G. Lee, L. A. Tang, Y. Yu, J. Han, and R. Kays. Movemine: mining moving object databases. In SIGMOD, pages 1203--1206, 2010. Google ScholarDigital Library
- X. Lin, S. Ma, H. Zhang, T. Wo, and J. Huai. One-pass error bounded trajectory simplification. PVLDB, 10:841--852, 2017. Google ScholarDigital Library
- C. Long, R. C.-W. Wong, and H. V. Jagadish. Direction-preserving trajectory simplification. PVLDB, 6:949--960, 2013. Google ScholarDigital Library
- C. Long, R. C.-W. Wong, and H. V. Jagadish. Trajectory simplification: On minimizing the direction-based error. PVLDB, 8:49--60, 2014. Google ScholarDigital Library
- C. S. Myers and L. R. Rabiner. A comparative study of several dynamic timewarping algorithms for connected-word recognition. Bell System Technical Journal, 60:1389--1409, 1981.Google ScholarCross Ref
- N. Pelekis, I. Kopanakis, E. E. Kotsifakos, E. Frentzos, and Y. Theodoridis. Clustering trajectories of moving objects in an uncertain world. In ICDM, pages 417--427, 2009. Google ScholarDigital Library
- S. Ranu, D. P, A. Telang, P. Deshpande, and S. Raghavan. Indexing and matching trajectories under inconsistent sampling rates. ICDE, pages 999--1010, 2015.Google ScholarCross Ref
- S. Ray, A. D. Brown, N. Koudas, R. Blanco, and A. K. Goel. Parallel in-memory trajectory-based spatiotemporal topological join. Big Data, pages 361--370, 2015. Google ScholarDigital Library
- L. K. Sharma, O. P. Vyas, S. Scheider, and A. K. Akasapu. Nearest neighbour classification for trajectory data. In ICT, pages 180--185, 2010.Google Scholar
- N. Ta, G. Li, Y. Xie, C. Li, S. Hao, and J. Feng:. Signature-based trajectory similarity join. IEEE Trans. Knowl. Data Eng., 29(4):870--883, 2017. Google ScholarDigital Library
- N. Ta, G. Li, T. Zhao, J. Feng, H. Ma, and Z. Gong. An efficient ride-sharing framework for maximizing shared route. IEEE Trans. Knowl. Data Eng., 30(2):219-- 233, 2018.Google ScholarCross Ref
- H. Tan, W. Luo, and L. M. Ni. Clost: a hadoop-based storage system for big spatio-temporal data analytics. In CIKM, pages 2139--2143, 2012. Google ScholarDigital Library
- K. Toohey and M. Duckham. Trajectory similarity measures. SIGSPATIAL Special, 7:43--50, 2015. Google ScholarDigital Library
- J. K. Uhlmann. Satisfying general proximity/similarity queries with metric trees. Inf. Process. Lett., 40:175--179, 1991.Google ScholarCross Ref
- M. Vlachos, D. Gunopulos, and G. Kollios. Discovering similar multidimensional trajectories. In ICDE, pages 673--684, 2002. Google ScholarDigital Library
- M. Vlachos, M. Hadjieleftheriou, D. Gunopulos, and E. J. Keogh. Indexing multidimensional time-series with support for multiple distance measures. In KDD, pages 216--225, 2003. Google ScholarDigital Library
- H. Wang, H. Su, K. Zheng, S. W. Sadiq, and X. Zhou. An effectiveness study on trajectory similarity measures. In ADC, pages 13--22, 2013. Google ScholarDigital Library
- H. Wang, K. Zheng, X. Zhou, and S. W. Sadiq. Sharkdb: An in-memory storage system for massive trajectory data. In SIGMOD, pages 1099--1104, 2015. Google ScholarDigital Library
- X. Wang, H. Ding, G. Trajcevski, P. Scheuermann, and E. J. Keogh. Experimental comparison of representation methods and distance measures for time series data. Data Mining and Knowledge Discovery, 26:275--309, 2012. Google ScholarDigital Library
- D. Xie, F. Li, and J. Phillips. Distributed trajectory similarity search. PVLDB, 10:1478--1489, 2017. Google ScholarDigital Library
- D. Xie, F. Li, B. Yao, G. Li, L. Zhou, and M. Guo. Simba: efficient in-memory spatial analytics. In SIGMOD, pages 1071--1085, 2016. Google ScholarDigital Library
- B.-K. Yi, H. V. Jagadish, and C. Faloutsos. Efficient retrieval of similar time sequences under time warping. In ICDE, pages 201--208, 1998. Google ScholarDigital Library
- P. N. Yianilos. Data structures and algorithms for nearest neighbor search in general metric spaces. In SODA, 1993. Google ScholarDigital Library
- S. You, J. Zhang, and L. Gruenwald. Large-scale spatial join query processing in cloud. In ICDE Workshops, pages 34--41, 2015.Google ScholarCross Ref
- J. Yu, J. Wu, and M. Sarwat. Geospark: A cluster computing framework for processing large-scale spatial data. In SIGSPATIAL/GIS, page 70, 2015. Google ScholarDigital Library
- S. Zhang, J. Han, Z. Liu, K. Wang, and Z. Xu. Sjmr: Parallelizing spatial join with mapreduce on clusters. In CLUSTER, pages 1--8, 2009.Google ScholarCross Ref
- Y. Zheng and X. Zhou. Computing with spatial trajectories. Springer Science &Business Media, 2011. Google ScholarDigital Library
Index Terms
- DITA: Distributed In-Memory Trajectory Analytics
Recommendations
DITA: A Distributed In-Memory Trajectory Analytics System
SIGMOD '18: Proceedings of the 2018 International Conference on Management of DataTrajectory analytics can benefit many real-world applications, e.g., frequent trajectory based navigation systems, road planning, car pooling, and transportation optimizations. In this paper, we demonstrate a distributed in-memory trajectory analytics ...
The hyperdyadic index and generalized indexing and query with PIQUE
SSDBM '15: Proceedings of the 27th International Conference on Scientific and Statistical Database ManagementMany scientists rely on indexing and query to identify trends and anomalies within extreme-scale scientific data. Compressed bitmap indexing (e.g., FastBit) is the go-to indexing method for many scientific datasets and query workloads. Recently, the ...
Transform-Space View: Performing Spatial Join in the Transform Space Using Original-Space Indexes
Spatial joins find all pairs of objects that satisfy a given spatial relationship. In spatial joins using indexes, original-space indexes such as the R-tree are widely used. An original-space index is the one that indexes objects as represented in the ...
Comments