Skip to main content
Log in

The CASE histogram: privacy-aware processing of trajectory data using aggregates

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

Due to the high uptake of location-based services (LBSs), large spatio-temporal datasets of moving objects’ trajectories are being created every day. An important task in spatial data analytics is to service range queries by returning trajectory counts within a queried region. The question of how to keep an individual user’s data private whilst enabling spatial data analytics by third parties has become an urgent research direction. Indeed, it is increasingly becoming a concern for users. To preserve privacy we discard individual trajectories and aggregate counts over a spatial and temporal partition. However the privacy gained comes at a cost to utility: trajectories passing through multiple cells and re-entering a query region, lead to inaccurate query responses. This is known as the distinct counting problem. We propose the Connection Aware Spatial Euler (CASE) histogram to address this long-standing problem. The CASE histogram maintains the connectivity of a moving object path, but does not require the ID of an object to distinguish multiple entries into an arbitrary query region. Our approach is to process trajectories offline into aggregate counts which are sent to third parties, rather than the original trajectories. We also explore modifications of our aggregate counting approach that preserve differential privacy. Theoretically and experimentally we demonstrate that our method provides a high level of accuracy compared to the best known methods for the distinct counting problem, whilst preserving privacy. We conduct our experiments on both synthetic and real datasets over two competitive Euler histogram-based methods presented in the literature. Our methods enjoy improvements to accuracy from 10 % up to 70 % depending on trip data and query region size, with the greatest increase seen on the Microsoft T-Drive real dataset, representing a more than tripling of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30

Similar content being viewed by others

Notes

  1. This expectation is taken with respect to the random possibilities.

References

  1. Kerckhoffs A (1883) Journal des sciences militaires IX:5–38

    Google Scholar 

  2. Barak B, Chaudhuri K, Dwork C, Kale S, McSherry F, Talwar K (2007) Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In: Proceedings of the Twenty-Sixth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 11-13, 2007, Beijing, China, pp 273–282

  3. Beigel R, Tanin E (1998) The geometry of browsing. In: LATIN ’98: Theoretical Informatics, Third Latin American Symposium, pp 331–340

  4. Beresford AR, Stajano F (2003) Location privacy in pervasive computing. IEEE Pervasive Comput 2(1):46–55

    Article  Google Scholar 

  5. Beresford AR, Stajano F (2004) Mix zones: User privacy in location-aware services. In: 2nd IEEE Conference on Pervasive Computing and Communications Workshops (PerCom 2004 Workshops), pp 127–131

  6. Bogorny V, Shekhar S (2010) Spatial and spatio-temporal data mining. In: ICDM 2010, The 10th IEEE International Conference on Data Mining, p 1217

  7. Braz F, Orlando S, Orsini R, Raffaetà A, Roncato A, Silvestri C (2007) Approximate aggregations in trajectory data warehouses. In: Proceedings of the 23rd International Conference on Data Engineering Workshops, ICDE 2007, pp 536–545

  8. Buchin K, Buchin M, van Kreveld MJ, Löffler M, Luo J, Silveira RI (2012) Processing aggregated data: the location of clusters in health data. GeoInformatica 16 (3):497–521

    Article  Google Scholar 

  9. Chawla S, Dwork C, McSherry F, Talwar K (2005) On the utility of privacy-preserving histograms. In: Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence

  10. Chow CY, Mokbel MF (2011) Privacy of spatial trajectories. In: Computing with Spatial Trajectories, pp 109–141

  11. Chow CY, Mokbel MF (2011) Trajectory privacy in location-based services and data publication. SIGKDD Explorations 13(1):19–29

    Article  Google Scholar 

  12. Dingledine R, Mathewson N, Syverson PF (2004) Tor: The second-generation onion router. In: Proceedings of the 13th USENIX Security Symposium, August 9-13, 2004, San Diego, CA, USA, pp 303–320

  13. Dwork C (2008) Differential privacy: A survey of results. In: Theory and Applications of Models of Computation, 5th International Conference, TAMC 2008, Xi’an, China, April 25-29, 2008. Proceedings, pp 1–19

  14. Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006, Proceedings, Lecture Notes in Computer Science, vol 3876, pp 265–284. Springer

  15. Dwork C, Naor M, Pitassi T, Rothblum GN, Yekhanin S (2010) Pan-private streaming algorithms. In: Innovations in Computer Science - ICS 2010, Tsinghua University, Beijing, China, January 5-7, 2010. Proceedings, pp 66–80

  16. Giannotti F, Nanni M, Pinelli F, Pedreschi D (2007) Trajectory pattern mining. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 330–339

  17. Gómez LI, Kuijpers B, Moelans B, Vaisman AA (2011) A state-of-the-art in spatio-temporal data warehousing, OLAP and mining. In: Integrations of Data Warehousing, Data Mining and Database Technologies, pp 200–236

  18. Gruteser M, Liu X (2004) Protecting privacy in continuous location-tracking applications. IEEE Secur Priv 2(2):28–34

    Article  Google Scholar 

  19. Jeung H, Yiu ML, Jensen CS (2011) Trajectory pattern mining. In: Computing with Spatial Trajectories, pp 143–177

  20. Krumm J (2007) Inference attacks on location tracks

  21. Leonardi L, Orlando S, Raffaetà A, Roncato A, Silvestri C, Andrienko GL, Andrienko NV (2014) A general framework for trajectory data warehousing and visual OLAP. GeoInformatica 18(2):273– 312

    Article  Google Scholar 

  22. Loo BP (2006) Validating crash locations for quantitative spatial analysis: A GIS-based approach. Accid Anal Prev 38(5):879–886

    Article  Google Scholar 

  23. López IFV, Snodgrass RT, Moon B (2005) Spatiotemporal aggregate computation: a survey. IEEE Trans Knowl Data Eng, TKDE 17(2):271–286

    Article  Google Scholar 

  24. Marketos G, Frentzos E, Ntoutsi I, Pelekis N, Raffaetà A, Theodoridis Y (2008) Building real-world trajectory warehouses. In: Seventh ACM International Workshop on Data Engineering for Wireless and Mobile Access, Mobide 2008, pp 8–15

  25. MicrosoftNewsCenter (2011) Data privacy day tackles concerns as location-based services grow in popularity. Accessed: 2013-09-15. http://www.microsoft.com/en-us/news/features/2011/jan11/01-26dataprivacyday.aspx

  26. Narayanan A (2009) Data privacy: The non-interactive setting. Ph.D. thesis, Austin, TX, USA. AAI3368859

  27. OpenStreetMap The free wiki world map. Accessed: 2013-05-10. http://www.openstreetmap.org/

  28. Orlando S, Orsini R, Raffaetà A, Roncato A, Silvestri C (2007) Spatio-temporal aggregations in trajectory data warehouses. In: Data Warehousing and Knowledge Discovery, 9th International Conference, DaWaK 2007, pp 66–77

  29. Orlando S, Orsini R, Raffaetà A, Roncato A, Silvestri C (2007) Trajectory data warehouses: Design and implementation issues. J Comput Sci Eng, JCSE 1(2):211–232

    Article  Google Scholar 

  30. Papadias D, Kalnis P, Zhang J, Tao Y (2001) Efficient OLAP operations in spatial data warehouses. In: Advances in Spatial and Temporal Databases, 7th International Symposium, SSTD 2001, pp 443– 459

  31. Pedersen TB, Tryfona N (2001) Pre-aggregation in spatial data warehouses. In: Advances in Spatial and Temporal Databases, 7th International Symposium, SSTD 2001, pp 460–480

  32. Phillips P, Lee I (2011) Crime analysis through spatial areal aggregated density patterns. GeoInformatica 15(1):49–74

    Article  Google Scholar 

  33. Sakr MA, Güting RH (2011) Spatiotemporal pattern queries. GeoInformatica 15(3):497–540

    Article  Google Scholar 

  34. Samet H (2006) Foundations of multidimensional and metric data structures. Morgan Kaufmann

  35. Sun C, Agrawal D, El Abbadi A (2002) Exploring spatial datasets with histograms. In: Proceedings of the 18th International Conference on Data Engineering, ICDE, pp 93–102

  36. Sun C, Bandi N, Agrawal D, El Abbadi A (2006) Exploring spatial datasets with histograms. Distrib Parallel Databases 20(1):57–88

    Article  Google Scholar 

  37. Sweeney L (2002) k-anonymity: A model for protecting privacy. Int J Uncertainty Fuzziness Knowledge Based Syst 10(5):557–570

    Article  Google Scholar 

  38. Tao Y, Kollios G, Considine J, Li F, Papadias D (2004) Spatio-temporal aggregation using sketches. In: Proceedings of the 20th International Conference on Data Engineering, ICDE 2004, pp 214– 225

  39. Tao Y, Papadias D, Zhang J (2002) Aggregate processing of planar points. In: Advances in Database Technology - EDBT 2002, 8th International Conference on Extending Database Technology, pp 682– 700

  40. Timko I, Böhlen MH, Gamper J (2009) Sequenced spatio-temporal aggregation in road networks. In: EDBT 2009, 12th International Conference on Extending Database Technology, pp 48–59

  41. Trudeau R (1993) Introduction to Graph Theory. Dover Books on Mathematics Series. Dover Pub

  42. Viswanathan G, Schneider M (2011) On the requirements for user-centric spatial data warehousing and SOLAP. In: Database Systems for Adanced Applications - 16th International Conference, DASFAA 2011, International Workshops, pp 144–15

  43. Wernke M, Skvortsov P, Du̇rr F, Rothermel K (2014) A classification of location privacy attacks and approaches. Pers Ubiquit Comput 18(1):163–175

    Article  Google Scholar 

  44. Willer DJ (1990) A spatial decision support system for bank location: A case study. Tech. rep., University of New York at Buffalo, Department of Geography State, National Center for Geographic Information and Analysis

  45. Xie H, Kulik L, Tanin E (2010) Privacy-aware traffic monitoring. IEEE Trans Intell Transp Syst 11(1):61–70

    Article  Google Scholar 

  46. Xie H, Tanin E, Kulik L (2007) Distributed histograms for processing aggregate data from moving objects. In: 8th International Conference on Mobile Data Management (MDM 2007), pp 152– 157

  47. Xie H, Tanin E, Kulik L, Scheuermann P, Trajcevski G, Fanaeepour M (2014) Euler histogram tree: A spatial data structure for aggregate range queries on vehicle trajectories. In: 7th ACM SIGSPATIAL International Workshop on Computational Transportation Science, IWCTS 2014

  48. Xue AY, Qi J, Xie X, Zhang R, Huang J, Li Y (2015) Solving the data sparsity problem in destination prediction. The International Journal on Very Large Data Bases, VLDB J. 24(2):219–243

    Article  Google Scholar 

  49. Xue AY, Zhang R, Zheng Y, Xie X, Huang J, Xu Z (2013) Destination prediction by sub-trajectory synthesis and privacy protection against such prediction. 29th IEEE International Conference on Data Engineering, ICDE 2013, pp 254–265

  50. Xue AY, Zhang R, Zheng Y, Xie X, Huang J, Xu Z (2013) Destination prediction by sub-trajectory synthesis and privacy protection against such prediction. In: 29th IEEE International Conference on Data Engineering, ICDE 2013, pp 254–265

  51. Yaagoub A, Liu X, Trajcevski G, Tanin E, Scheuermann P (2012) Materialized views for count aggregates of spatial data. In: Advances in Databases and Information Systems - 16th East European Conference, ADBIS 2012, pp 427–440

  52. Yuan J, Zheng Y, Xie X, Sun G (2011) Driving with knowledge from the physical world. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 316–324

  53. Yuan J, Zheng Y, Zhang C, Xie W, Xie X, Sun G, Huang Y (2010) T-drive: driving directions based on taxi trajectories. In: 18th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, ACM-GIS 2010, pp 99–108

Download references

Acknowledgments

We thank Dr. Hairuo Xie for his helpful comments regarding the experimental study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maryam Fanaeepour.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fanaeepour, M., Kulik, L., Tanin, E. et al. The CASE histogram: privacy-aware processing of trajectory data using aggregates. Geoinformatica 19, 747–798 (2015). https://doi.org/10.1007/s10707-015-0228-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-015-0228-8

Keywords

Navigation