Skip to main content
Log in

Everyday concept detection in visual lifelogs: validation, relationships and trends

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The Microsoft SenseCam is a small lightweight wearable camera used to passively capture photos and other sensor readings from a user’s day-to-day activities. It captures on average 3,000 images in a typical day, equating to almost 1 million images per year. It can be used to aid memory by creating a personal multimedia lifelog, or visual recording of the wearer’s life. However the sheer volume of image data captured within a visual lifelog creates a number of challenges, particularly for locating relevant content. Within this work, we explore the applicability of semantic concept detection, a method often used within video retrieval, on the domain of visual lifelogs. Our concept detector models the correspondence between low-level visual features and high-level semantic concepts (such as indoors, outdoors, people, buildings, etc.) using supervised machine learning. By doing so it determines the probability of a concept’s presence. We apply detection of 27 everyday semantic concepts on a lifelog collection composed of 257,518 SenseCam images from 5 users. The results were evaluated on a subset of 95,907 images, to determine the accuracy for detection of each semantic concept. We conducted further analysis on the temporal consistency, co-occurance and relationships within the detected concepts to more extensively investigate the robustness of the detectors within this domain.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Bell G, Gemmell J (2007) A digital life. Scientific American, New York

    Google Scholar 

  2. Bovik A, Clark M, Geisler W (1990) Multichannel texture analysis using localized spatial filters. IEEE Trans Pattern Anal Mach Intell 12(1):55–73

    Article  Google Scholar 

  3. Byrne D, Doherty AR, Snoek CG, Jones GG, Smeaton AF (2008) Validating the detection of everyday concepts in visual lifelogs. In: SAMT ’08: proceedings of the 3rd international conference on semantic and digital media technologies. Springer, Berlin, pp 15–30

    Google Scholar 

  4. Byrne D, Lavelle B, Doherty AR, Jones GJF, Smeaton AF (2007) Using bluetooth and GPS metadata to measure event similarity in sensecam images. In: IMAI’07 - 5th international conference on intelligent multimedia and ambient intelligence, Salt Lake City, pp 1454–1460

  5. Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm

  6. Chang SF, He J, Jiang YG, Khoury EE, Ngo CW, Yanagawa A, Zavesky E (2008) Columbia University/VIREO-CityU/IRIT TRECVid2008 High-Level feature extraction and interactive video search. In: Proceedings of TRECVid workshop, Gaithersburg, 2008

  7. DeVaul R (2001) Real-time motion classification for wearable computing applications. Tech. rep., Massachusetts Institute of Technology, MIT, Cambridge

    Google Scholar 

  8. Doherty A, Smeaton AF (2008) Combining face detection and novelty to identify important events in a visual lifelog. In: CIT 2008—IEEE international conference on computer and information technology, workshop on image- and video-based pattern analysis and applications, Sydney

  9. Doherty AR, Byrne D, Smeaton AF, Jones GJF, Hughes M (2008) Investigating keyframe selection methods in the novel domain of passively captured visual lifelogs. In: CIVR ’08: proceedings of the 2008 international conference on content-based image and video retrieval, Niagara Falls, Canada. ACM, New York, pp 259–268

    Chapter  Google Scholar 

  10. Doherty AR, Smeaton AF (2008) Automatically segmenting lifelog data into events. In: WIAMIS ’08: proceedings of the 2008 ninth international workshop on image analysis for multimedia interactive services, Klagenfurt, Germany. IEEE Computer Society, Washington, DC, pp 20–23

    Google Scholar 

  11. Fleiss J (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5):378–382

    Article  Google Scholar 

  12. Fuller M, Kelly L, Jones GJF (2008) Applying contextual memory cues for retrieval from personal information archives. In: PIM 2008 - proceedings of personal information management, workshop at CHI 2008

  13. Geusebroek JM (2006) Compact object descriptors from local colour invariant histograms. In: British machine vision conference, vol 3, pp 1029–1038

  14. Geusebroek JM, Smeulders AWM (2005) A six-stimulus theory for stochastic texture. Int J Comput Vis 62:7–16

    Google Scholar 

  15. Gurrin C, Smeaton AF, Byrne D, O’Hare N, Jones GJF, O’Connor NE (2008) An examination of a large visual lifelog. In: AIRS 2008—Asia information retrieval symposium, Harbin

  16. Hauptmann A, Yan R, Lin WH (2007) How many high-level concepts will fill the semantic gap in news video retrieval? In: CIVR ’07: proceedings of the 6th ACM international conference on image and video retrieval. ACM, New York, pp 627–634

    Google Scholar 

  17. Hoang MA, Geusebroek JM, Smeulders AWM (2005) Color texture measurement and segmentation. Signal Process 85(2):265–275

    Article  MATH  Google Scholar 

  18. Hodges S, Williams L, Berry E, Izadi S, Srinivasan J, Butler A, Smyth G, Kapur N, Wood K (2006) SenseCam: a retrospective memory aid. In: UbiComp - 8th international conference on ubiquitous computing, Calif., USA

  19. Jiang YG, Ngo CW, Yang J (2007) Towards optimal bag-of-features for object categorization and semantic video retrieval. In: CIVR ’07: proceedings of the 6th ACM international conference on image and video retrieval. ACM, New York, NY, USA, pp 494–501

    Google Scholar 

  20. Jurie F, Triggs B (2005) Creating efficient codebooks for visual recognition. In: Computer vision, 2005. ICCV 2005. Tenth IEEE international conference on 1, 604–610, vol 1

  21. Kapur J, Sahoo P, Wong A (1985) A new method for gray-level picture thresholding using the entropy of the histogram. Comput Vis Graph Image Process 29(3):273–285

    Article  Google Scholar 

  22. Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174

    Article  MATH  MathSciNet  Google Scholar 

  23. Lee H, Smeaton AF, O’Connor NE, Jones GJ (2006) Adaptive visual summary of lifelog photos for personal information management. In: AIR Workshop—1st international workshop on adaptive information retrieval, Glasgow, pp 22–23

  24. Lin HT, Lin CJ, Weng R (2007) A note on Platt’s probabilistic outputs for support vector machines. Mach Learn 68(3):267–276

    Article  Google Scholar 

  25. Naphade H, Huang T (2001) A probabilistic framework for semantic video indexing, filtering, and retrieval. IEEE Trans Multimedia 3(1):141–151

    Article  Google Scholar 

  26. Naphade MR, Kennedy L, Kender JR, Chang SF, Smith JR, Over P, Hauptmann A (2005) A light scale concept ontology for multimedia understanding for TRECVid 2005. Tech. rep., In IBM Research Technical Report

  27. Natsev A, Jiangy W, Merlery M, Smith JR, Tesic J, Xie L, Yan R (2008) IBM research TRECVid-2008 video retrieval system. In: Proceedings of TRECVid workshop, 2008, Gaithersburg

  28. O’Hare N, Lee H, Cooray S, Gurrin C, Jones GJF, Malobabic J, O’Connor NE, Smeaton AF, Uscilowski B (2006) MediAssist: using content-based analysis and context to manage personal photo collections. In: CIVR2006 - 5th international conference on image and video retrieval. Springer, Tempe, pp 529–532

    Google Scholar 

  29. Smeaton A, Over P, Kraaij W (2006) Evaluation campaigns and TRECVid. In: Proceedings of the 8th ACM international workshop on multimedia information retrieval. ACM, New York, pp 321–330

    Chapter  Google Scholar 

  30. Snoek CGM, Everts I, van Gemert JC, Geusebroek JM, Huurnink B, Koelma DC, van Liempt M, de Rooij O, van de Sande KEA, Smeulders AWM, Uijlings JRR, Worring M (2007) The MediaMill TRECVid 2007 semantic video search engine. In: Proceedings of TRECVid workshop, Gaithersburg, 2007

  31. Snoek CGM, van Gemert JC, Gevers T, Huurnink B, Koelma DC, van Liempt M, de Rooij O, van de Sande KEA, Seinstra FJ, Smeulders AWM, Thean AHC, Veenman CJ, Worring M (2006) The MediaMill TRECVID 2006 semantic video search engine. In: Proceedings of the TRECVID workshop, Gaithersburg

  32. Snoek CGM, Worring M, van Gemert JC, Geusebroek JM, Smeulders AWM (2006) The challenge problem for automated detection of 101 semantic concepts in multimedia. In: MULTIMEDIA ’06: proceedings of the 14th annual ACM international conference on multimedia, Santa Barbara, CA, USA. ACM, New York, pp 421–430

    Chapter  Google Scholar 

  33. van Gemert JC, Snoek CGM, Veenman CJ, Smeulders AWM, Geusebroek JM (2009) Comparing compact codebooks for visual categorization. Comput Vis Image Underst. doi:10.1016/j.cviu.2009.08.004

    Google Scholar 

  34. Vapnik VN (2000) The nature of statistical learning theory, 2nd edn. Springer, New York

    MATH  Google Scholar 

  35. Wang D, Liu X, Luo L, Li J, Zhang B (2007) Video diver: generic video indexing with diverse features. In: MIR ’07: proceedings of the 9th ACM international workshop on workshop on multimedia information retrieval, Augsburg, Germany. ACM, New York, pp 61–70

    Chapter  Google Scholar 

  36. Yanagawa A, Chang SF, Kennedy L, Hsu W (2007) Columbia University’s baseline detectors for 374 LSCOM semantic visual concepts. Tech. rep., Columbia University

  37. Yang J, Hauptmann AG (2006) Exploring temporal consistency for video analysis and retrieval. In: MIR ’06: proceedings of the 8th ACM international workshop on multimedia information retrieval, Santa Barbara, pp 33–42

Download references

Acknowledgements

We are grateful to the AceMedia project and Microsoft Research for support. This work is supported by the Irish Research Council for Science Engineering and Technology, by Science Foundation Ireland under grant 07/CE/I1147 and by the EU IST-CHORUS project. We would also like to extend our thanks to the participants who made their personal lifelog collection available for these experiments, and who partook in the annotation effort.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daragh Byrne.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Byrne, D., Doherty, A.R., Snoek, C.G.M. et al. Everyday concept detection in visual lifelogs: validation, relationships and trends. Multimed Tools Appl 49, 119–144 (2010). https://doi.org/10.1007/s11042-009-0403-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-009-0403-8

Keywords

Navigation