Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/103472
Citations
Scopus Web of ScienceĀ® Altmetric
?
?
Type: Theses
Title: Source profiling for smart city sensing
Author: Zhang, Yihong
Issue Date: 2016
School/Discipline: School of Computer Science
Abstract: Recent years have seen the emergence of smart cities, which utilize various sensing data for applications such as pollution monitoring, infrastructure planning and traffic control. Current sensing projects tend to deploy a large number of low-cost and unreliable sensing sources, rather than a small number of high-quality sensing sources. It is therefore critical to provide data analysis in the face of unreliable sources. This thesis focuses on two types of sensing sources that have been used in smart city sensing projects, namely, environmental sensors and human sensors. The environmental sensors are physical sensors that are made to monitor certain environmental features, such as temperature, humidity, and pollutant concentration. An environmental sensor can fail frequently and will start generating faulty data when there is chemical compound decay, battery exhaustion, or calibration problems. Human sensors, as recently proposed in a new area called social sensing, are online messaging platform users who post observations about their surrounding environments. The data generated by human sensors can be erroneous because the natural language used in their messages does not conform to a machine-readable standard. Based on a survey of existing literature, this thesis presents source profiling-based solutions for three data analysis problems, data cleaning in environmental sensing, observation message classification in social sensing, and message location inference. Each of the solutions is validated with various real-world data and extensive experiments. For data cleaning in environmental sensing, we propose two solutions, approaching from a frequentist perspective and a Bayesian perspective, respectively. The frequentist approach determines sensor reliability based on the frequency of reliable behavior in the past, and in each data collection iteration updates a reliability score, which can be used to weight down or remove the data from unreliable sources. The Bayesian approach models sensor reliability as a latent variable, and applies the Expectation Maximization framework to discover the latent sensor reliability and correct reading values for the environmental feature. For observation message classification, we propose supervised and unsupervised solutions. We propose a supervised solution to distinguish messages according to three perspectives, namely, observation, affection, and speculation. We next propose a supervised solution based on user features such as trending activity, communication status, and writing styles. And finally, we propose an unsupervised solution based on lexical analysis and user profiling in four user attributes, namely, originality, interactivity, objectivity, and topic focus. For location inference, we propose a solution based on name entity extraction and user message histories. The proposed solution extracts location names from text messages using a gazetteer, and after retrieving a number of past locations from the message history of a user, it applies outlier removal before inferring the current location. Incorporating observation classification and location inference, we propose an event detection system called Sense and Focus (SNAF), which detects real world events based on discussions exchanged on Twitter. A prototype implementation of the system has shown a number of detection results, 54% of which corresponding to real-world events, and in many case detected earlier than news reports, and with less than 1.5km location error.
Advisor: Szabo, Claudia
Sheng, Michael
Dissertation Note: Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 2016.
Keywords: data mining
source profiling
sensor
Provenance: This electronic version is made publicly available by the University of Adelaide in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material you wish to be removed from this electronic version, please complete the take down form located at: http://www.adelaide.edu.au/legals
DOI: 10.4225/55/58ae2cd79a387
Appears in Collections:Research Theses

Files in This Item:
File Description SizeFormat 
01front.pdf126.11 kBAdobe PDFView/Open
02whole.pdf6.02 MBAdobe PDFView/Open
Permissions
  Restricted Access
Library staff access only224.54 kBAdobe PDFView/Open
Restricted
  Restricted Access
Library staff access only8.74 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.