When things matter: A survey on data-centric internet of things

https://doi.org/10.1016/j.jnca.2015.12.016Get rights and content

Abstract

With the recent advances in radio-frequency identification (RFID), low-cost wireless sensor devices, and Web technologies, the Internet of Things (IoT) approach has gained momentum in connecting everyday objects to the Internet and facilitating machine-to-human and machine-to-machine communication with the physical world. IoT offers the capability to connect and integrate both digital and physical entities, enabling a whole new class of applications and services, but several significant challenges need to be addressed before these applications and services can be fully realized. A fundamental challenge centers around managing IoT data, typically produced in dynamic and volatile environments, which is not only extremely large in scale and volume, but also noisy and continuous. This paper reviews the main techniques and state-of-the-art research efforts in IoT from data-centric perspectives, including data stream processing, data storage models, complex event processing, and searching in IoT. Open research issues for IoT data management are also discussed.

Introduction

The Internet is a global system of networks that interconnect computers using the standard Internet protocol suite. It has significant impact on the world as it can serve billions of users worldwide. Millions of private, public, academic, business, and government networks, of local to global scope, all contribute to the formation of the Internet. The traditional Internet has a focus on computers and can be called the Internet of Computers. In contrast, evolving from the Internet of Computers, the Internet of Things (IoT) emphasizes things rather than computers (Ashton, 2009). It aims to connect everyday objects, such as coats, shoes, watches, ovens, washing machines, bikes, cars, even humans, plants, animals, and changing environments, to the Internet to enable communication/interactions between these objects. The ultimate goal of IoT is to enable computers to see, hear and sense the real world. It is predicted by Ericsson that the number of Internet-connected things will reach 50 billion by 2020. Electronic devices and systems exist around us providing different services to the people in different situations: at home, at work, in their office, or driving a car on the street. IoT also enables the close relationship between human and opportunistic connection of smart things (Guo et al., 2013).

There are several definitions or visions of IoT from different perspectives. From the viewpoint of services provided by things, IoT means “a world where things can automatically communicate to computers and each other providing services to the benefit of the human kind” (CASAGRAS, 2000). From the viewpoint of connectivity, IoT means “from anytime, anyplace connectivity for anyone, we will now have connectivity for anything” (ITU, 2005). From the viewpoint of communication, IoT refers to “a world-wide network of interconnected objects uniquely addressable, based on standard communication protocols” (INFSO, 2008). Finally, from the viewpoint of networking, IoT is the Internet evolved “from a network of interconnected computers to a network of interconnected objects” (European Commission, 2009).

We focus on our study of the Internet of Things from a data perspective. As shown in Fig. 1, data is processed differently in the Internet of Things and traditional Internet environments (i.e., Internet of Computers). In the Internet of Computers, both main data producers and consumers are human beings. However, in the Internet of Things, the main actors become things, which means things are the majority of data producers and consumers. Therefore, we give our definition of the Internet of Things as follows:

In the context of the Internet, addressable and interconnected things, instead of humans, act as the main data producers, as well as the main data consumers. Computers will be able to learn and gain information and knowledge to solve real world problems directly with the data fed from things. As an ultimate goal, computers enabled by the Internet of Things technologies will be able to sense and react to the real world for humans.”

As of 2012, 2.5 quintillion (2.5×1018) bytes of data are created daily.1 In IoT, connecting all of the things that people care about in the world becomes possible. All these things would be able to produce much more data than nowadays. The volumes of data are vast, the generation speed of data is fast and the data/information space is global (James et al., 2009). Indeed, IoT is one of the major driving forces for big data analytics. Given the scale of IoT, topics such as storage, distributed processing, real-time data stream analytics, and event processing are all critical, and we may need to revisit these areas to improve upon existing technologies for applications of this scale.

In this paper, we systematically investigate the key technologies related to the development of IoT and its applications, particularly from a data-centric perspective. The aim of this work is to provide a better understanding of the current research activities and issues. Fig. 2 shows the roadmap of this paper. As can be seen from the figure, we review and compare technologies including data streams, data storage models, searching, and event processing technologies, which play a vital role in enabling the vision of IoT. We also describe some relevant applications from several representative areas. Although some reviews about IoT have been conducted recently (e.g., Atzori et al., 2010, Zeng et al., 2011, An et al., 2013, Perera et al., 2013, Li et al., 2016, Yan et al., 2014), they focus on high level general issues and are mostly fragmented. In addition, these articles do not specifically cover techniques on data processing and management, which is fundamentally critical to fully embrace IoT. To the best of our knowledge, this is the first article that studies and discusses state-of-the-art techniques of IoT from the data-centric perspective.

The remainder of the article is organized as follows. Section 2 identifies an IoT data taxonomy. Section 3 reviews the data streaming techniques and Section 4 focuses on the data models and storage technologies for IoT. Search and event processing technologies are discussed in 5 Search techniques, 6 Complex event processing, respectively. In Section 7, some typical ongoing and/or potential IoT applications where data techniques for IoT can bring significant changes are described. Finally, Section 8 highlights some research open issues on IoT from the data perspective and Section 9 offers some concluding remarks.

Section snippets

IoT data taxonomy

In this section, we identify the intrinsic characteristics of IoT data and classify them into three categories, including Data Generation, Data Quality, and Data Interoperability. We also identify specific characteristics of each category, and the overall IoT data taxonomy is shown in Fig. 3.

Data streams

A data stream is a sequence of data objects of which the number is potentially unbounded. A data stream may be continuously generated at a rapid rate. In the data stream, each data object can be described by a multi-dimensional attribute vector within a continuous, categorical, or mixed attribute space (de Andrade Silva et al., 2013). There are some typical characteristics of data streams:

  • Continuous arrival of data objects.

  • Disordered arrival of data objects.

  • Potentially unbounded size of a

Data storage models

The nature of data produced by the Internet of Things calls for a revisit of data storage techniques, which will be further discussed in this section.

Search techniques

Searching and finding relevant objects from billions of things is one of the major challenges for the future Internet of Things and can bring about huge potential impact to humans. Supporting technologies for searching things in the IoT are very different from those used in searching Web documents because things are tightly bound to contextual information (e.g., location) and have no easily indexable properties (e.g., human readable text in the case of Web documents). In addition, the state

Complex event processing

Data streaming techniques typically process incoming data through a sequence of transformations based on common SQL operators, like selection, aggregate, join, and these operators are defined in general by relational algebra. By contrast, the complex event processing (CEP) model views the information in the streams as events in the physical world. These events must be filtered, combined and transformed into higher-level events for better understanding by computers and humans. Similar to

Potential IoT applications

As pointed out by Ashton (2009) that IoT “has the potential to change the world, just as the Internet did”. The ongoing and/or potential IoT applications show that IoT can bring significant changes in many domains, i.e., cities and homes, environment monitoring, health, energy, and business. IoT can bring the ability to react to events in the physical world in an automatic, rapid and informed manner. This also opens up new opportunities for dealing with complex or critical situations and

Open issues

The development of IoT technologies and applications is merely beginning. Many new challenges and issues have not been addressed, which require substantial efforts from both academia and industry. In this section, we identify some key directions for future research and development from a data-centric perspective:

  • Data quality and uncertainty: In IoT, as data volume increases, inconsistency and redundancy within data would become paramount issues. One of the central problems for data quality is

Summary

It is predicted that the next generation of the Internet will be composed of trillions of connected computing nodes at a global scale. Through these nodes, everyday objects in the world can be identified, connected to the Internet and take decisions independently. In this context, Internet of Things (IoT) is considered a new revolution of the Internet. In IoT, the possibility of seamlessly merging the real and the virtual worlds, through the massive deployment of embedded devices, opens up many

Acknowledgements

Quan Z. Sheng׳s work has been supported by the Australian Research Council Discovery Grant DP140100104. We express our gratitude to the anonymous reviewers for their comments and suggestions which have greatly helped us to improve this work.

References (91)

  • Artikis A, Paliouras G. Tutorial: formal methods for event processing. In: Proceedings of the 17th international...
  • Ashton K. ‘That internet of things’ thing 〈http://www.rfidjournal.com/article/view/4986〉;...
  • Babcock B, Olston C. Distributed top-k monitoring. In: Proceedings of the 2003 ACM SIGMOD international conference on...
  • Barbieri DF, Braga D, Ceri S, Grossniklaus M. An execution environment for C-SPARQL queries. In: Proceedings of the...
  • Bolles A, Grawunder M, Jacobi J. Streaming SPARQL-Extending SPARQL to Process Data Streams. In: Proceedings of the...
  • Calbimonte J-P, Corcho Ó, Gray AJG. Enabling Ontology-Based Access to Streaming Data Sources. In: Proceedings of the...
  • Z. Cao et al.

    Distributed inference and query processing for RFID tracking and monitoring

    Proc VLDB Endow

    (2011)
  • CASAGRAS. CASAGRAS (Coordination and support action for global rfid-related activities and standardisation);...
  • F. Chang et al.

    Bigtablea distributed storage system for structured data

    ACM Trans Comput Syst

    (2008)
  • Chaudhuri S, Ganti V, Xin D. Exploiting web search to generate synonyms for entities. In: Proceedings of the 18th...
  • Chen C, Li F, Ooi BC, Wu S. TI: an efficient indexing mechanism for real-time search on tweets. In: Proceedings of the...
  • M. Chen et al.

    Big dataa survey

    Mob Netw Appl J

    (2014)
  • J. de Andrade Silva et al.

    Data stream clusteringa survey

    ACM Comput Surv

    (2013)
  • DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, et al. Dynamo: amazon׳s highly available...
  • Dong A, Zhang R, Kolari P, Bai J, Diaz F, Chang Y, et al. Time is of the Essence: improving recency ranking using...
  • Elahi BM, Römer K, Ostermaier B, Fahrmair M, Kellerer W. Sensor ranking: a primitive for efficient content-based sensor...
  • European Commission. European Commission: internet of things, an action plan for Europe...
  • Fan W, Geerts F, Ma S, Müller H. Detecting Inconsistencies in Distributed Data. In: Proceedings of the 26th...
  • Fazzinga B, Flesca S, Furfaro F, Parisi F. Cleaning trajectory data of RFID-monitored objects through conditioning...
  • Gama J. Knowledge discovery from data streams. Chapman and Hall/CRC data mining and knowledge discovery series. Boca...
  • Ganti RK, Pham N, Ahmadi H, Nangia S, Abdelzaher TF. GreenGPS: a participatory sensing fuel-efficient maps application....
  • Gerber D, Hellmann S, Bühmann L, Soru T, Usbeck R, Ngomo A-CN. Real-time RDF extraction from unstructured data streams....
  • S. Gilbert et al.

    Brewer׳s conjecture and the feasibility of consistent, available, partition-tolerant web services

    SIGACT News

    (2002)
  • Guha S, Plarre K, Lissner D, Mitra S, Krishna B, Dutta P, et al. AutoWitness: locating and tracking stolen property...
  • Guinard D. A web of things for smarter cities. In: Technical talk; 2010. p....
  • Gupta M, Intille SS, Larson K. Adding GPS-control to traditional thermostats: an exploration of potential energy...
  • Hasan S, O׳Riain S, Curry E. Approximate semantic matching of heterogeneous events. In: Proceedings of the sixth ACM...
  • Hasan S, O׳Riain S, Curry E. Towards unified and native enrichment in event processing systems. In: Proceedings of the...
  • He Y, Barman S, Naughton JF. On load shedding in complex event processing. In: Proceedings of the 17th international...
  • Heinze T, Ji Y, Pan Y, Grüneberger FJ, Jerzak Z, Fetzer C. Elastic complex event processing under varying query load....
  • INFSO. INFSO D.4 Networked enterprise & RFID INFSO G.2 micro & nanosystems. In: Co-operation with the working group...
  • ITU. International telecommunication union (ITU) internet reports. The internet of things; November...
  • James AE, Cooper J, Jeffery KG, Saake G. Research directions in database architectures for the internet of things: a...
  • Jeffery SR, Garofalakis MN, Franklin MJ. Adaptive cleaning for RFID data streams. In: Proceedings of the 32nd...
  • S. Kadambi et al.

    Where in the world is my data?

    Proc VLDB Endow

    (2011)
  • Cited by (0)

    View full text