Skip to main content
Log in

Supporting geospatial privacy-preserving data mining of social media

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

With the global adoption of smart mobile devices equipped with localization capabilities and broad popularity of microblogging facilities like Twitter, the need for personal privacy has never been greater. This is especially so with computational and data processing infrastructures such as clouds that support big data analysis. Differential privacy of geospatially tagged data such as tweets can potentially ensure that degrees of location privacy can be preserved while allowing the information (tweet contents) to be used for research and analysis, e.g., sentiment analysis. In this paper, we evaluate differential location pattern-mining approaches considering both privacy and precision of geo-located tweets clustered according to Geo-Locations of Interest (GLI). We consider both the privacy protection strength and the accuracy of results, measuring the Euclidean distance between centroids of real GLIs and obfuscated ones, i.e., those incorporating privacy-preserving noise. We record the performance and sensitivity of the approach. We show how privacy and location precision are trade-offs, i.e., the higher the degree of privacy protection, the fewer the GLIs will be identified. We also quantify these trade-offs and their associated sensitivity levels. We illustrate the work through a big data case study on use of Twitter data for traffic-related data protection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  • Abul O, Bonchi F, Nanni M (2008) Never walk alone: uncertainty for anonymity in moving objects databases. In: Data engineering, 2008. ICDE 2008. IEEE 24th international conference on, pp 376–385. doi:10.1109/icde.2008.4497446

  • Ali I, Kantarcioglu M, Ghinita G, Bertino E (2010) Private record matching using differential privacy. In: Manolescu I, Spaccapietra S, Teubner J, Kitsuregawa M, Leger A, Naumann F, Ailamaki A, Ozcan F (eds) Proceedings of the 13th international conference on extending database technology (EDBT ‘10). ACM, New York, pp 123–134. doi:10.1145/1739041.1739059

  • Andrés ME, Bordenabe NE, Chatzikokolakis K, Palamidessi C (2013) Geo-indistinguishability: differential privacy for location-based systems. In: Proceedings of the 2013 ACM SIGSAC conference on computer & communications security (CCS ‘13). ACM, New York, pp 901–914. doi:10.1145/2508859.2516735

  • Arik F, Schuster A (2010) Data mining with differential privacy. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ‘10). ACM, New York, pp 493–502. doi:10.1145/1835804.1835868

  • Ashwin M, Kifer D, Abowd JM, Gehrke J, Vilhuber L (2008) Privacy: theory meets practice on the map. In: Alonso G, Blakeley JA, Chen ALP (eds) Proceedings of the 24th international conference on data engineering, ICDE 2008, April 7–12, 2008, Cancún, México, pp. 277–286. IEEE

  • Changqing Z, Frankowski D, Ludford P, Shekhar S, Terveen L (2004) Discovering personal gazetteers: an interactive clustering approach. In: Proceedings of the 12th annual ACM international workshop on Geographic information systems (GIS ‘04). ACM, New York, pp 266–273. doi:10.1145/1032222.1032261

  • Chow C-Y, Mokbel MF, Aref WG (2009) Casper*: query processing for location services without compromising privacy. ACM Trans Database Syst. Article 24 (December 2009). doi:10.1145/1620585.1620591

  • Cormode G, Procopiuc C, Srivastava D, Shen E, Yu T (2012) Differentially private spatial decompositions. In: Data engineering (ICDE), 2012 IEEE 28th international conference on, 20–31. doi:10.1109/icde.2012.16

  • de Berg M, Cheong O, van Kreveld M, Overmars M (2008) Computational geometry: algorithms and applications. Springer, Berlin

    Book  MATH  Google Scholar 

  • Dewri Rinku (2012) Location privacy and attacker knowledge: Who are we fighting against? Lect Notes Inst Comput Sci Soc Inform Telecommun Eng. doi:10.1007/978-3-642-31909-9_6

    Google Scholar 

  • Dwork C (2006) Differential privacy. In: Automata, languages and programming, ser. Lecture Notes in Computer Science. Springer, Berlin, vol 4052, pp 1–12. doi:10.1007/11787006_1

  • Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Proceeding of the 3rd conference on theory of cryptography, NY, pp 265–284. doi:10.1007/11681878_14

  • Hasan S, Zhan X, Ukkusuri SV (2013) Understanding urban human activity and mobility patterns using large-scale location-based data from online social media. In: Proceedings of the 2nd ACM SIGKDD international workshop on urban computing (UrbComp ‘13). ACM, New York, Article 6. doi:10.1145/2505821.2505823

  • Ho S-S, Ruan S (2011) Differential privacy for location pattern mining. In: Proceedings of the 4th ACM SIGSPATIAL international workshop on security and privacy in GIS and LBS (SPRINGL ‘11). ACM, New York, pp 17–24. doi:10.1145/2071880.2071884

  • Ho S-S, Ruan S (2013) Preserving privacy for interesting location pattern mining from trajectory data. Trans Data Priv 6(1):87–106

    MathSciNet  Google Scholar 

  • Hu H, Xu J, On ST, Du J, Ng JK-Y (2010) Privacy-aware location data publishing. ACM Trans Database Syst. Article 18 (July 2010). doi:10.1145/1806907.1806910

  • Jiang K, Shao D, Bressan S, Kister T, Tan K-L (2013) Publishing trajectories with differential privacy guarantees. In: Szalay A, Budavari T, Balazinska M, Meliou A, Sacan A (eds) Proceedings of the 25th international conference on scientific and statistical database management (SSDBM). ACM, New York, Article 12. doi:10.1145/2484838.2484846

  • Kido H, Yanagisawa Y, Satoh T (2005) Protection of location privacy using dummies for location-based services. In: Data engineering workshops, 2005. 21st international conference on (ICDEW’05), IEEE, pp 1248–1248. doi:10.1109/icde.2005.269

  • McSherry F (2009) Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: SIGMOD, 2009. doi:10.1145/1559845.1559850

  • Ninghui L, Li T, Venkatasubramanian S (2007) t-Closeness: privacy beyond k-anonymity and l-diversity. ICDE 7:106–115

    Google Scholar 

  • Nissim K, Raskhodnikova S, Smith A (2007) Smooth sensitivity and sampling in private data analysis. In: Proceedings of the thirty-ninth annual ACM symposium on theory of computing (STOC ‘07). ACM, New York, pp 75–84. doi:10.1145/1250790.1250803

  • Primault V, Mokhtar SB, Lauradoux C, Brunie L (2014) Differentially private location privacy in practice. Dans mobile security technologies conference, San Jose, pp 1–10

  • Sadeh Norman, Hong J, Cranor L, Fette I, Kelley P, Prabaker M, Rao J (2009) Understanding and capturing people’s privacy policies in a mobile social networking application. Pers Ubiquitous Comput 13(6):401–412. doi:10.1007/s00779-008-0214-3

    Article  Google Scholar 

  • Su Z, Yang Q, Zhang H, Xu X, Hu Y (2001) Correlation-based document clustering using web logs. In: Proceedings of the 34th Annual Hawaii international conference on system sciences (HICSS-34), 2001, vol 5, pp 5022–5028. doi:10.1109/hicss.2001.926536

  • Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl-Based Syst 10(05):557–570. doi:10.1142/s0218488502001648

    Article  MathSciNet  MATH  Google Scholar 

  • Terrovitis M, Mamoulis N (2008) Privacy preservation in the publication of trajectories. In: Mobile data management, 2008. MDM’08. 9th international conference on, pp 65–72. doi:10.1109/mdm.2008.29

  • Xiao Y, Xiong L, Yuan C (2010) Differentially private data release through multidimensional partitioning. In: Proceedings of the secure data management, 7th VLDB workshop, Singapore, Sep. 2010, pp 150–168. doi:10.1007/978-3-642-15546-8_11

  • Xiao X, Wang G, Gehrke J (2011) Differential privacy via wavelet transforms. Knowl Data Eng IEEE Trans 23(8):1200–1214. doi:10.1109/icde.2010.5447831

    Article  Google Scholar 

  • Xue M, Kalnis P, Pung HK (2009) Location diversity: enhanced privacy protection in location based services. In: Location and context awareness. Springer, Berlin, pp 70–87. doi:10.1007/978-3-642-01721-6_5

  • Yu Z, Zhang L, Ma Z, Xie X, Ma W-Y (2011) Recommending friends and locations based on individual location history. ACM Trans Web, Article 5 (February 2011). doi:10.1145/1921591.1921596

Download references

Acknowledgements

We would like to thank the NeCTAR Research cloud for the (free) use of the cloud resources and the Melbourne eResearch Group for support on Twitter access, use and analysis. Figure 1 was produced as part of the Australian Urban Research Infrastructure Network (AURIN—www.aurin.org.au) project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuo Wang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, S., Sinnott, R.O. Supporting geospatial privacy-preserving data mining of social media. Soc. Netw. Anal. Min. 6, 109 (2016). https://doi.org/10.1007/s13278-016-0417-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-016-0417-y

Keywords

Navigation