Abstract
Due to the high uptake of location-based services (LBSs), large spatio-temporal datasets of moving objects’ trajectories are being created every day. An important task in spatial data analytics is to service range queries by returning trajectory counts within a queried region. The question of how to keep an individual user’s data private whilst enabling spatial data analytics by third parties has become an urgent research direction. Indeed, it is increasingly becoming a concern for users. To preserve privacy we discard individual trajectories and aggregate counts over a spatial and temporal partition. However the privacy gained comes at a cost to utility: trajectories passing through multiple cells and re-entering a query region, lead to inaccurate query responses. This is known as the distinct counting problem. We propose the Connection Aware Spatial Euler (CASE) histogram to address this long-standing problem. The CASE histogram maintains the connectivity of a moving object path, but does not require the ID of an object to distinguish multiple entries into an arbitrary query region. Our approach is to process trajectories offline into aggregate counts which are sent to third parties, rather than the original trajectories. We also explore modifications of our aggregate counting approach that preserve differential privacy. Theoretically and experimentally we demonstrate that our method provides a high level of accuracy compared to the best known methods for the distinct counting problem, whilst preserving privacy. We conduct our experiments on both synthetic and real datasets over two competitive Euler histogram-based methods presented in the literature. Our methods enjoy improvements to accuracy from 10 % up to 70 % depending on trip data and query region size, with the greatest increase seen on the Microsoft T-Drive real dataset, representing a more than tripling of accuracy.
Similar content being viewed by others
Notes
This expectation is taken with respect to the random possibilities.
References
Kerckhoffs A (1883) Journal des sciences militaires IX:5–38
Barak B, Chaudhuri K, Dwork C, Kale S, McSherry F, Talwar K (2007) Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In: Proceedings of the Twenty-Sixth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 11-13, 2007, Beijing, China, pp 273–282
Beigel R, Tanin E (1998) The geometry of browsing. In: LATIN ’98: Theoretical Informatics, Third Latin American Symposium, pp 331–340
Beresford AR, Stajano F (2003) Location privacy in pervasive computing. IEEE Pervasive Comput 2(1):46–55
Beresford AR, Stajano F (2004) Mix zones: User privacy in location-aware services. In: 2nd IEEE Conference on Pervasive Computing and Communications Workshops (PerCom 2004 Workshops), pp 127–131
Bogorny V, Shekhar S (2010) Spatial and spatio-temporal data mining. In: ICDM 2010, The 10th IEEE International Conference on Data Mining, p 1217
Braz F, Orlando S, Orsini R, Raffaetà A, Roncato A, Silvestri C (2007) Approximate aggregations in trajectory data warehouses. In: Proceedings of the 23rd International Conference on Data Engineering Workshops, ICDE 2007, pp 536–545
Buchin K, Buchin M, van Kreveld MJ, Löffler M, Luo J, Silveira RI (2012) Processing aggregated data: the location of clusters in health data. GeoInformatica 16 (3):497–521
Chawla S, Dwork C, McSherry F, Talwar K (2005) On the utility of privacy-preserving histograms. In: Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence
Chow CY, Mokbel MF (2011) Privacy of spatial trajectories. In: Computing with Spatial Trajectories, pp 109–141
Chow CY, Mokbel MF (2011) Trajectory privacy in location-based services and data publication. SIGKDD Explorations 13(1):19–29
Dingledine R, Mathewson N, Syverson PF (2004) Tor: The second-generation onion router. In: Proceedings of the 13th USENIX Security Symposium, August 9-13, 2004, San Diego, CA, USA, pp 303–320
Dwork C (2008) Differential privacy: A survey of results. In: Theory and Applications of Models of Computation, 5th International Conference, TAMC 2008, Xi’an, China, April 25-29, 2008. Proceedings, pp 1–19
Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006, Proceedings, Lecture Notes in Computer Science, vol 3876, pp 265–284. Springer
Dwork C, Naor M, Pitassi T, Rothblum GN, Yekhanin S (2010) Pan-private streaming algorithms. In: Innovations in Computer Science - ICS 2010, Tsinghua University, Beijing, China, January 5-7, 2010. Proceedings, pp 66–80
Giannotti F, Nanni M, Pinelli F, Pedreschi D (2007) Trajectory pattern mining. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 330–339
Gómez LI, Kuijpers B, Moelans B, Vaisman AA (2011) A state-of-the-art in spatio-temporal data warehousing, OLAP and mining. In: Integrations of Data Warehousing, Data Mining and Database Technologies, pp 200–236
Gruteser M, Liu X (2004) Protecting privacy in continuous location-tracking applications. IEEE Secur Priv 2(2):28–34
Jeung H, Yiu ML, Jensen CS (2011) Trajectory pattern mining. In: Computing with Spatial Trajectories, pp 143–177
Krumm J (2007) Inference attacks on location tracks
Leonardi L, Orlando S, Raffaetà A, Roncato A, Silvestri C, Andrienko GL, Andrienko NV (2014) A general framework for trajectory data warehousing and visual OLAP. GeoInformatica 18(2):273– 312
Loo BP (2006) Validating crash locations for quantitative spatial analysis: A GIS-based approach. Accid Anal Prev 38(5):879–886
López IFV, Snodgrass RT, Moon B (2005) Spatiotemporal aggregate computation: a survey. IEEE Trans Knowl Data Eng, TKDE 17(2):271–286
Marketos G, Frentzos E, Ntoutsi I, Pelekis N, Raffaetà A, Theodoridis Y (2008) Building real-world trajectory warehouses. In: Seventh ACM International Workshop on Data Engineering for Wireless and Mobile Access, Mobide 2008, pp 8–15
MicrosoftNewsCenter (2011) Data privacy day tackles concerns as location-based services grow in popularity. Accessed: 2013-09-15. http://www.microsoft.com/en-us/news/features/2011/jan11/01-26dataprivacyday.aspx
Narayanan A (2009) Data privacy: The non-interactive setting. Ph.D. thesis, Austin, TX, USA. AAI3368859
OpenStreetMap The free wiki world map. Accessed: 2013-05-10. http://www.openstreetmap.org/
Orlando S, Orsini R, Raffaetà A, Roncato A, Silvestri C (2007) Spatio-temporal aggregations in trajectory data warehouses. In: Data Warehousing and Knowledge Discovery, 9th International Conference, DaWaK 2007, pp 66–77
Orlando S, Orsini R, Raffaetà A, Roncato A, Silvestri C (2007) Trajectory data warehouses: Design and implementation issues. J Comput Sci Eng, JCSE 1(2):211–232
Papadias D, Kalnis P, Zhang J, Tao Y (2001) Efficient OLAP operations in spatial data warehouses. In: Advances in Spatial and Temporal Databases, 7th International Symposium, SSTD 2001, pp 443– 459
Pedersen TB, Tryfona N (2001) Pre-aggregation in spatial data warehouses. In: Advances in Spatial and Temporal Databases, 7th International Symposium, SSTD 2001, pp 460–480
Phillips P, Lee I (2011) Crime analysis through spatial areal aggregated density patterns. GeoInformatica 15(1):49–74
Sakr MA, Güting RH (2011) Spatiotemporal pattern queries. GeoInformatica 15(3):497–540
Samet H (2006) Foundations of multidimensional and metric data structures. Morgan Kaufmann
Sun C, Agrawal D, El Abbadi A (2002) Exploring spatial datasets with histograms. In: Proceedings of the 18th International Conference on Data Engineering, ICDE, pp 93–102
Sun C, Bandi N, Agrawal D, El Abbadi A (2006) Exploring spatial datasets with histograms. Distrib Parallel Databases 20(1):57–88
Sweeney L (2002) k-anonymity: A model for protecting privacy. Int J Uncertainty Fuzziness Knowledge Based Syst 10(5):557–570
Tao Y, Kollios G, Considine J, Li F, Papadias D (2004) Spatio-temporal aggregation using sketches. In: Proceedings of the 20th International Conference on Data Engineering, ICDE 2004, pp 214– 225
Tao Y, Papadias D, Zhang J (2002) Aggregate processing of planar points. In: Advances in Database Technology - EDBT 2002, 8th International Conference on Extending Database Technology, pp 682– 700
Timko I, Böhlen MH, Gamper J (2009) Sequenced spatio-temporal aggregation in road networks. In: EDBT 2009, 12th International Conference on Extending Database Technology, pp 48–59
Trudeau R (1993) Introduction to Graph Theory. Dover Books on Mathematics Series. Dover Pub
Viswanathan G, Schneider M (2011) On the requirements for user-centric spatial data warehousing and SOLAP. In: Database Systems for Adanced Applications - 16th International Conference, DASFAA 2011, International Workshops, pp 144–15
Wernke M, Skvortsov P, Du̇rr F, Rothermel K (2014) A classification of location privacy attacks and approaches. Pers Ubiquit Comput 18(1):163–175
Willer DJ (1990) A spatial decision support system for bank location: A case study. Tech. rep., University of New York at Buffalo, Department of Geography State, National Center for Geographic Information and Analysis
Xie H, Kulik L, Tanin E (2010) Privacy-aware traffic monitoring. IEEE Trans Intell Transp Syst 11(1):61–70
Xie H, Tanin E, Kulik L (2007) Distributed histograms for processing aggregate data from moving objects. In: 8th International Conference on Mobile Data Management (MDM 2007), pp 152– 157
Xie H, Tanin E, Kulik L, Scheuermann P, Trajcevski G, Fanaeepour M (2014) Euler histogram tree: A spatial data structure for aggregate range queries on vehicle trajectories. In: 7th ACM SIGSPATIAL International Workshop on Computational Transportation Science, IWCTS 2014
Xue AY, Qi J, Xie X, Zhang R, Huang J, Li Y (2015) Solving the data sparsity problem in destination prediction. The International Journal on Very Large Data Bases, VLDB J. 24(2):219–243
Xue AY, Zhang R, Zheng Y, Xie X, Huang J, Xu Z (2013) Destination prediction by sub-trajectory synthesis and privacy protection against such prediction. 29th IEEE International Conference on Data Engineering, ICDE 2013, pp 254–265
Xue AY, Zhang R, Zheng Y, Xie X, Huang J, Xu Z (2013) Destination prediction by sub-trajectory synthesis and privacy protection against such prediction. In: 29th IEEE International Conference on Data Engineering, ICDE 2013, pp 254–265
Yaagoub A, Liu X, Trajcevski G, Tanin E, Scheuermann P (2012) Materialized views for count aggregates of spatial data. In: Advances in Databases and Information Systems - 16th East European Conference, ADBIS 2012, pp 427–440
Yuan J, Zheng Y, Xie X, Sun G (2011) Driving with knowledge from the physical world. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 316–324
Yuan J, Zheng Y, Zhang C, Xie W, Xie X, Sun G, Huang Y (2010) T-drive: driving directions based on taxi trajectories. In: 18th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, ACM-GIS 2010, pp 99–108
Acknowledgments
We thank Dr. Hairuo Xie for his helpful comments regarding the experimental study.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Fanaeepour, M., Kulik, L., Tanin, E. et al. The CASE histogram: privacy-aware processing of trajectory data using aggregates. Geoinformatica 19, 747–798 (2015). https://doi.org/10.1007/s10707-015-0228-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-015-0228-8