skip to main content
10.1145/2556288.2557139acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article

Estimating county health statistics with twitter

Published:26 April 2014Publication History

ABSTRACT

Understanding the relationships among environment, behavior, and health is a core concern of public health researchers. While a number of recent studies have investigated the use of social media to track infectious diseases such as influenza, little work has been done to determine if other health concerns can be inferred. In this paper, we present a large-scale study of 27 health-related statistics, including obesity, health insurance coverage, access to healthy foods, and teen birth rates. We perform a linguistic analysis of the Twitter activity in the top 100 most populous counties in the U.S., and find a significant correlation with 6 of the 27 health statistics. When compared to traditional models based on demographic variables alone, we find that augmenting models with Twitter-derived information improves predictive accuracy for 20 of 27 statistics, suggesting that this new methodology can complement existing approaches.

References

  1. Anselin, L. Spatial econometrics: methods and models. Kluwer Academic Publishers, Dordrecht; Boston, 1988.Google ScholarGoogle Scholar
  2. Chen, M. K. The effect of language on economic behavior: Evidence from savings rates, health behaviors, and retirement assets. American Economic Review 103, 2 (Apr. 2013), 690--731.Google ScholarGoogle ScholarCross RefCross Ref
  3. Chiswick, B. R., and Miller, P. W. The economics of language international analyses. Routledge, London; New York, 2007.Google ScholarGoogle Scholar
  4. Clifford, P., Richardson, S., and Hmon, D. Assessing the significance of the correlation between two spatial processes. Biometrics 45, 1 (Mar. 1989), 123--134. PMID: 2720048.Google ScholarGoogle ScholarCross RefCross Ref
  5. Culotta, A. Towards detecting influenza epidemics by analyzing twitter messages. In Proceedings of the First Workshop on Social Media Analytics, ACM (New York, NY, USA, 2010), 115--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Culotta, A. Lightweight methods to estimate influenza rates and alcohol sales volume from twitter messages. Lang. Resour. Eval. 47, 1 (Mar. 2013), 217238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Danner, D. D., Snowdon, D. A., and Friesen, W. V. Positive emotions in early life and longevity: findings from the nun study. Journal of personality and social psychology 80, 5 (May 2001), 804--813. PMID: 11374751.Google ScholarGoogle Scholar
  8. De Choudhury, M., Gamon, M., Counts, S., and Horvitz, E. Predicting depression via social media. In ICWSM (2013).Google ScholarGoogle Scholar
  9. Dredze, M. How social media will change public health. IEEE Intelligent Systems 27, 4 (2012), 81--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Duggan, M., and Brenner, J. The demographics of social media users - 2012. Pew Internet & American Life Project, Feb 2013.Google ScholarGoogle Scholar
  11. Flores, B. E. A pragmatic view of accuracy measurement in forecasting. Omega 14, 2 (1986), 93--98.Google ScholarGoogle ScholarCross RefCross Ref
  12. Ghosh, D. D., and Guha, R. What are we tweeting about obesity? Mapping tweets with topic modeling and geographic information system. Cartography and Geographic Information Science 40, 2 (2013), 90--102.Google ScholarGoogle ScholarCross RefCross Ref
  13. Gottschalk, L. A., and Gleser, G. C. The Measurement of Psychological States Through the Content Analysis of Verbal Behavior. University of California Press, Jan. 1979.Google ScholarGoogle Scholar
  14. Graham, L E, n., Scherwitz, L., and Brand, R. Self-reference and coronary heart disease incidence in the western collaborative group study. Psychosomatic medicine 51, 2 (Apr. 1989), 137--144. PMID: 2710908.Google ScholarGoogle Scholar
  15. Hanson, C. L., Burton, S. H., Giraud-Carrier, C., West, J. H., Barnes, M. D., and Hansen, B. Tweaking and tweeting: Exploring twitter for nonmedical use of a psychostimulant drug (adderall) among college students. Journal of Medical Internet Research 15, 4 (Apr. 2013), e62.Google ScholarGoogle ScholarCross RefCross Ref
  16. Hecht, B., Hong, L., Suh, B., and Chi, E. H. Tweets from Justin Bieber's heart: The dynamics of the location field in user profiles. In CHI (New York, NY, USA, 2011), 237--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Howell, R. T., Kern, M. L., and Lyubomirsky, S. Health benefits: Meta-analytically determining the impact of well-being on objective health outcomes. Health Psychology Review 1, 1 (2007), 83--136.Google ScholarGoogle ScholarCross RefCross Ref
  18. James W Pennebaker, M. R. M. Psychological aspects of natural language. use: our words, our selves. Annual review of psychology 54 (2003), 547--77.Google ScholarGoogle Scholar
  19. Jamison-Powell, S., Linehan, C., Daley, L., Garbett, A., and Lawson, S. "I can't get no sleep": discussing #insomnia on Twitter. In CHI, ACM (New York, NY, USA, 2012), 1501--1510. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Lampos, V., De Bie, T., and Cristianini, N. Flu detector: tracking epidemics on twitter. In ECML/PKDD (2010), 599--602. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Messer, L. C. Neighborhood-level characteristics as predictors of preterm birth: Examples from wake county, north carolina. Tech. rep., North Carolina Dept. of Health and Human Services, 2005.Google ScholarGoogle Scholar
  22. Paul, M. J., and Dredze, M. You are what you tweet: Analyzing Twitter for public health. In ICWSM (2011).Google ScholarGoogle Scholar
  23. Pedregosa, F., et al. Scikit-learn: Machine learning in python. Machine Learning Research 12 (2011), 28252830. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Pennebaker, J., Francis, J., and Booth, R. Linguistic inquiry and word count: LIWC 2001. World Journal of the International Linguistic Association (2001).Google ScholarGoogle Scholar
  25. Qiu, L., Lin, H., Ramsay, J., and Yang, F. You are what you tweet: Personality expression and perception on twitter. Journal of Research in Personality 46, 6 (Dec. 2012), 710--718.Google ScholarGoogle ScholarCross RefCross Ref
  26. Rabi, D. M., et al. Association of socio-economic status with diabetes prevalence and utilization of diabetes care services. BMC Health Services Research 6, 1 (Oct. 2006), 124. PMID: 17018153.Google ScholarGoogle ScholarCross RefCross Ref
  27. Sadilek, A., Kautz, H., and Silenzio, V. Predicting disease transmission from geo-tagged micro-blog data. In AAAI (Dec. 2012).Google ScholarGoogle Scholar
  28. Schwartz, H. A., et al. Characterizing geographic variation in well-being using tweets. In Seventh International AAAI Conference on Weblogs and Social Media (ICWSM) (2013).Google ScholarGoogle Scholar
  29. Seligman, M. E. P. Flourish: a visionary new understanding of happiness and well-being. Free Press, New York, 2011.Google ScholarGoogle Scholar
  30. Signorini, A., Segre, A. M., and Polgreen, P. M. The use of Twitter to track levels of disease activity and public concern in the U.S. during the influenza a H1N1 pandemic. PLoS ONE 6, 5 (May 2011), e19467.Google ScholarGoogle ScholarCross RefCross Ref
  31. Sobal, J., and Stunkard, A. J. Socioeconomic status and obesity: A review of the literature. Psychological Bulletin 105, 2 (1989), 260--275.Google ScholarGoogle ScholarCross RefCross Ref
  32. Stewart, A., and Diaz, E. Epidemic intelligence: For the crowd, by the crowd. In Web Engineering, M. Brambilla, T. Tokuda, and R. Tolksdorf, Eds., no. 7387 in Lecture Notes in Computer Science. Springer Berlin Heidelberg, Jan. 2012, 504--505. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Estimating county health statistics with twitter

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            CHI '14: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
            April 2014
            4206 pages
            ISBN:9781450324731
            DOI:10.1145/2556288

            Copyright © 2014 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 26 April 2014

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            CHI '14 Paper Acceptance Rate465of2,043submissions,23%Overall Acceptance Rate6,199of26,314submissions,24%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader