Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

ODT FLOW: Extracting, analyzing, and sharing multi-source multi-scale human mobility

  • Zhenlong Li ,

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Methodology, Project administration, Resources, Software, Supervision, Visualization, Writing – original draft, Writing – review & editing

    zhenlong@sc.edu

    Affiliation Geoinformation and Big Data Research Lab, Department of Geography, University of South Carolina, Columbia, South Carolina, United States of America

  • Xiao Huang,

    Roles Formal analysis, Methodology, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Geosciences, University of Arkansas, Fayetteville, Arkansas, United States of America

  • Tao Hu,

    Roles Methodology, Software, Visualization, Writing – original draft

    Affiliation Center for Geographic Analysis, Harvard University, Cambridge, Massachusetts, United States of America

  • Huan Ning,

    Roles Methodology, Visualization, Writing – original draft

    Affiliation Geoinformation and Big Data Research Lab, Department of Geography, University of South Carolina, Columbia, South Carolina, United States of America

  • Xinyue Ye,

    Roles Validation, Writing – review & editing

    Affiliation Department of Landscape Architecture & Urban Planning, Texas A&M University, Texas, United States of America

  • Binghu Huang,

    Roles Validation, Writing – review & editing

    Affiliation College of Oceanography and Space Informatics, China University of Petroleum, Qingdao, Shandong, China

  • Xiaoming Li

    Roles Conceptualization, Funding acquisition, Writing – review & editing

    Affiliation Department of Health Promotion, Education, and Behavior, University of South Carolina, Columbia, South Carolina, United States of America

Abstract

In response to the soaring needs of human mobility data, especially during disaster events such as the COVID-19 pandemic, and the associated big data challenges, we develop a scalable online platform for extracting, analyzing, and sharing multi-source multi-scale human mobility flows. Within the platform, an origin-destination-time (ODT) data model is proposed to work with scalable query engines to handle heterogenous mobility data in large volumes with extensive spatial coverage, which allows for efficient extraction, query, and aggregation of billion-level origin-destination (OD) flows in parallel at the server-side. An interactive spatial web portal, ODT Flow Explorer, is developed to allow users to explore multi-source mobility datasets with user-defined spatiotemporal scales. To promote reproducibility and replicability, we further develop ODT Flow REST APIs that provide researchers with the flexibility to access the data programmatically via workflows, codes, and programs. Demonstrations are provided to illustrate the potential of the APIs integrating with scientific workflows and with the Jupyter Notebook environment. We believe the platform coupled with the derived multi-scale mobility data can assist human mobility monitoring and analysis during disaster events such as the ongoing COVID-19 pandemic and benefit both scientific communities and the general public in understanding human mobility dynamics.

1. Introduction

Reliable monitoring of human mobility, referring to the movement of human beings (individuals as well as groups) in space and time, plays a fundamental role in a variety of fields, such as tourist management [1, 2], migration [3, 4], urban planning [57], demand forecasting [8, 9], disaster management [10, 11], and epidemic modelling [1214], to name a few. The ongoing COVID-19 pandemic (as of the time of writing) uniquely highlights the necessity of human mobility monitoring in a rapid and comprehensive manner.

Traditionally, human mobility studies rely on either aggregated and temporally sparse governmental statistics (e.g., census data) [15] or selective, small-scale surveys (e.g., local travel survey and evacuation survey) [16, 17]. While these well-documented mobility records from official statistics and surveys facilitate our understanding of spatial interactions, their intrinsic limitations stand out: the former provides only coarse-grained spatiotemporal human movement patterns while the latter is limited by the spatial scale. With the advent of the Global Positioning System (GPS) technologies, game-changing data sources of fine-scale human mobility become available. Fostered by the concept of “Citizen as Sensors” [18] and the popularity of mobile devices, timely geospatial information can be extracted from the enormous sensing network constituted by millions and even billions of mobile devices holders in both passive (e.g., wireless networks and cellphone GPS) and active (e.g., navigation services and social media posts) fashions. Despite the availability of mobility data on a fine-grained spatiotemporal scale, new challenges start to appear, as such detailed human mobility records fall in the category of Big Data that can be characterized by the 5Vs, i.e., Volume, Velocity, Variety, Veracity, and Value [1922], thus demanding a paradigm shift of data handling approach from a traditional static paradigm to an accelerating arena.

Digitally traced human mobility records are generally with a large Volume. For example, the size of daily geolocation reports from smartphones and other mobile devices are in GB- and even TB-level [23]. In addition, massive social media users produce hundreds of millions of posts every day [19, 24, 25]. Although only a small proportion of them are geotagged, such an amount already exceeds the capability of traditional data handing approaches and demands improved strategies to tackle the computing, storing, and analyzing issues. Velocity refers to the fast generation of human mobility data, as human mobility records from digital devices are constantly being produced. Such a high velocity in the data flow demands the capability of cyberinfrastructures to organize, summarize, visualize, and analyze data in a rapid manner [26]. Variety, on the one hand, points to the multi-faceted nature of human mobility originated from its sources whose population penetration rates and covered population spectrums are not necessarily the same. Studies revealed that mobility records from different sources present a certain level of similarity in general, however, with notably unique and even contrasting characteristics, reflecting human mobility from varying yet valuable perspectives [27, 28]. On the other hand, Variety also highlights the multi-scale nature of mobility data. During the COVID-19 pandemic, for example, mobility records are reported at a variety of levels that include the country [29, 30], state/province [31], county [32, 33], and even finer census levels [34]. Veracity refers to the biases, noises, and abnormality [35], to list a few, which can be traced back to the source provenance in mobility datasets. In the context of human mobility datasets, Veracity not only demands proper cleaning, preprocessing, and aggregating procedures, but also highlights the importance and necessity of fusing and sharing multiple data sources for human mobility studies. Last but not least, Value refers to the insights derived from human mobility patterns to support decision-making and policy implementation. As the knowledge of human mobility patterns plays an essential role in a wide range of research domains, designing platforms that facilitate human mobility data analysis and sharing is much needed, especially during disaster events like the COVID-19 pandemic that requires a rapid sharing of such knowledge. In tandem with the challenges of 5Vs in big mobility data is the growing awareness of reproducibility and replicability [36, 37], as the extensive usage of locational data in various domains uniquely highlights such concerns. The 5V challenges, coupled with the necessity of facilitating reproducibility and replicability, largely raise the bar for not just mobility visualization platforms but data sharing platforms in general.

The COVID-19 pandemic has fostered many open-sourced mobility datasets and online mobility visualization platforms, as scholars realize the importance of fast insights from mobility patterns for better, timely decision-making. Some notable open-sourced mobility datasets include social media derived mobility datasets, Apple/Google mobility reports, SafeGraph social distancing metrics, Cuebiq Mobility Index, Baidu Mobility Index, Descartes Labs Mobility Index, University of Maryland Mobility Metrics and Social Distancing Index, and Unacast Social Distancing Index. These datasets differ in sources, spatiotemporal scales, geographic coverages [28]. In terms of online visualization platforms, one notable effort is by [38], who designed a dashboard to present mobility dynamics at the U.S. county level using mobility records from Descartes Labs. Cuebiq developed a mobility dashboard featured by the visualization of Cuebiq’s Mobility Index (https://www.cuebiq.com/visitation-insights-covid19/), which enables brands, researchers, government agencies, and health professionals to understand shifts in mobility trends across the U.S. at multiple levels (nationally, state-wide, county and industry vertical). In collaborating with COVID-19 Mobility Data Network (https://www.covid19mobility.org/), Stamen, a data visualization and cartography design studio, established a visualization platform that provides mobility insights at the country level using aggregated population movement data from Facebook’s Data for Good program. However, report-based mobility data (e.g., Google and Apple mobility reports), despite that they can be easily handled, do not provide origin-destination (OD) flows that are essential for certain location-based studies. In addition, dashboards that facilitate OD visualization often lack flexibility, as most dashboards are constructed from pre-computed statistics with highly aggregated results, posing challenges for their practicality in multi-scale studies. After reviewing existing efforts in open-source mobility datasets and visualization platforms, we notice that few frameworks have been designed with the capacity to integrate/compare multi-source and multi-scale global mobility data as well as to provide interactive query, visualization, downloadable, and programmatic access options according to user-defined spatiotemporal restrictions (e.g., [39]). A notable study is by [40] who developed and open sourced a multiscale dynamic human mobility flow dataset using SafeGraph data. However, the dataset only covers the US limited by the single data source and is shared as individual downloadable files lacking interactive data accessibility.

In response to the soaring needs of human mobility data during the COVID-19 pandemic and to fill the aforementioned gaps, we develop a scalable online platform for extracting, analyzing, and sharing multi-source multi-scale human mobility data. At the core of the platform is an origin-destination-time (ODT) data model designed to work with scalable query engines to handle heterogenous mobility data in large volumes with extensive spatial coverage, which allows for efficient extraction, query, and aggregation of billion-level OD flows in parallel at the server-side. Built upon the ODT model and scalable query engines, an online ODT Flow Explorer is developed that allows researchers to explore multi-source mobility datasets with flexible configurations. In response to the challenges in reproducibility and replicability, we develop ODT Flow REST APIs that provide researchers with the flexibility to access the data programmatically via workflows, codes, and programs. In addition, we provide case studies to demonstrate the usage and potential of the APIs along with scientific workflows and Jupyter Notebooks to facilitate on-demand mobility data access, analysis and visualization.

2. Methodology

2.1 System design

The ODT Flow platform consists of five layers, including the data source layer, data processing and management layer, web server layer, user interface layer, and user community layer (Fig 1). The system is designed to tackle the big data challenges in a scalable and open computing environment, aiming to serve as a bridge to connect heterogeneous big human movement data sources (the first layer) to user communities for research and applications (the fifth layer). The movement data sources include data that contain or can be used to extract OD flows with a time interval such as daily. Such data sources include, for example, geotagged Twitter data, mobile phone data, and transportation data. Different from the index-based mobility data that are highly aggregated without OD information, these data sources are big in volume, variety or heterogeneous in format and spatiotemporal resolutions, and noisy and inconsistent in data quality, which highlights the well-defined Big Data challenges [28]. Besides, these data also pose shareability, reproducibility, and replicability challenges in the Big Data era that draw attention to the scientific community [41].

In the system, the volume challenge of the data is addressed in a scalable distributed database (HDFS, Hadoop Distributed File System), coupled with a spatiotemporal partition mechanism for efficient data query and access. The variety challenge is handled with an ODT data model by extracting spatiotemporal standardized OD flows from heterogeneous movement data sources to construct a unified ODT cube. The computing- and data-intensive OD flow extraction process is carried out in a parallel computing cluster using the data source-specific human mobility extraction algorithms. The ability to integrate multi-source mobility data also helps tackle the veracity challenge, as fusing multiple mobility sources can provide a more comprehensive picture of human mobility [27]. The shareability and usability of the big movement data are facilitated with a spatial web portal [42] allowing for interactive query, visualization, and download of ODT flows. RESTful Application Programming Interfaces (ODT Flow REST APIs) are developed to allow users to query and access the data in a flexible programmatical way using scientific workflow and Jupyter notebook, to list a few. Accessing data with APIs can help to achieve research reproducibility and replicability [36].

2.2 ODT cube

ODT cube, a conceptual data model, is designed to work with a scalable computing cluster to efficiently manage, query, and aggregate billions of OD flows at different spatial and temporal scales (Fig 2). The ODT cube was previously introduced in our research proposal [43] and is implemented in this study. Data cube has been widely used to model and visualize multi-dimension and multi-scale data [4446]. In the geospatial domain, a special data cube called Space-Time Cube is often used to model time series spatial data and visualize movement data [4749]. Different from the space-time cube, where space is treated as two dimensions (X and Y, or latitude and longitude) in the cube, the ODT cube is place-based with a time dimension added to the OD matrix. For example, instead of using latitude and longitude in the data cube, we use places such as a city or a county as origin and destination locations. ODT cube does not directly capture the movement trajectory, because the OD matrix for each timestamp is independent that records the number of population flows between places during a specific time period (e.g., in an hour or day).

thumbnail
Fig 2. Illustration of origin-destination-time (ODT) cube for big OD data query analytics.

https://doi.org/10.1371/journal.pone.0255259.g002

To construct the ODT cube, an individual level 4D data cube (entity, origin, destination, time) is first constructed for each entity to extract the individual OD flows from millions or billions of data points. The entity can be a social media user, a cell phone user, or a place if individual flows are not available. As different movement data sources have different data formats, structures, and spatiotemporal resolutions, a data source-specific algorithm/program is needed to extract individual OD flows. Section 2.3 elaborates on our approaches of extracting OD flows from Twitter data and SafeGraph data. The 4D data cubes are then aggregated along the origin, destination, and time dimensions to construct the ODT cube. Each cell of the ODT cube records the number of people that moved from the origin location to the destination location during a time period. The ODT cube can be further used to derive cubes at different geographic scales using spatial aggregation. This process can be performed on-the-fly based on the request or pre-computed and cached to optimize the performance. By slicing the ODT cube from different dimensions, three types of matrices can be derived: OD matrix recording population flows between places during a time period; destination-time (DT) matrix recording population inflows; and origin-time (OT) matrix recording population outflows. The diagonal cells of the OD matrix record the intra-movements for a place (O = D) during a time period.

Constructing the ODT cube needs to perform point-in-polygon spatial operations for billions of data points and thus is extremely data- and computation-intensive. For example, creating an ODT cube using 2019 geotagged Twitter data needs to conduct point-in-polygon checking for 1.4 billion geotagged tweets (posted by 17 million unique Twitter users) against hundreds or thousands of places depending on the geographic level. To address this challenge, the cube building process is carried out in a scalable parallel computing environment based on a stack of open-source solutions, including Apache HDFS, Hive, Impala, and GIS Tools for Hadoop by Esri. The environment can be configured and reproduced using the open-source Cloudera Distribution Hadoop [50] either on an on-premise computing cluster or cloud-based computing cluster. HDFS is used as a scalable data storage for big movement datasets. Hive, coupled with GIS Tools for Hadoop, is used for performing spatial operations in parallel for billions of data points to extract OD flows. The generated ODT cube, which may contain billion of cells, is stored in HDFS as a big table distributed across computer nodes.

2.3 Human mobility extraction

In this study, we generate ODT cubes by extracting population flows from two data sources: worldwide geotagged tweets collected using the Twitter public API [51], and Social Distancing Metrics (SDM) provided by SafeGraph based on U.S. mobile devices [34]. As these two data sources are different in data structure, format, and spatiotemporal resolution, we develop two computational approaches for the data sources, respectively.

2.3.1 ODT cube construction from Twitter data.

To generate the ODT cube from Twitter data, two types of OD flows are extracted and combined from geotagged tweets capturing Twitter users’ single-day movement and cross-day movement. The concept of single-day and cross-day movements is introduced in [30]. In general, the single-day movement represents the users’ daily maximum travel distance of all locations relative to the initial location, and cross-day movement measures the mean center shift between two consecutive days. In this study, instead of computing the movement distance as [30], we are interested in each user’s origin location and destination location on a daily basis to construct the entity-level 4D cube, denoted as (user, origin, destination, day). Note that the Twitter-derived OD flows do not consider users’ home location. The movements were directly derived from the locations of geotagged tweets at the user level on a daily basis. Based on the 4D cube, ODT cubes are derived by aggregating the individual flows at specific geographic levels (e.g., county, state, world first-level subdivision, or country) as explained in section 2.2. Non-human tweets (posted by bots, such as weather reports and job advertisements) need to be filtered out when computing the OD flows as they are irrelevant to human mobility [10]. These tweets are removed by checking the tweet source. For example, tweets automatically posted for jobs from the source TweetMyJOBS were excluded. A list of Twitter sources that indicate human-posted tweets and the descriptive statistics of the worldwide geotagged tweets collected using the Twitter public API in 2019 is detailed in [52].

2.3.2 ODT cube construction from SafeGraph data.

To generate the ODT cube using SafeGraph data, we first extract the daily OD flows from SDM [34]. There are 23 fields in the SDM table, and we use 3 of them to extract the daily population movement, including origin_census_block_group, destination_cbgs, and date_range_start. The origin_census_block_group is the unique 12-digit FIPS (Federal Information Processing Standards) code for the Census Block Group (cbg). destination_cbgs contains a list of key-value pairs with the key indicating the destination census block group (from the origin census block group) and “value is the number of devices with a home in census_block_group that stopped in the given destination census block group for >1 minute during the time period” (https://docs.safegraph.com/docs/social-distancing-metrics). The date_range_start was used to extract the date information. Based on the three fields, we generate the entity-level 4D cube, denoted as (cbg, origin, destination, day), with each cell showing the number of devices from an original block group to a destination block group on a daily basis. Note that the SafeGraph-derived OD flows consider devices’ home location (the movements are originated from home). For example, a flow of 100 devices (users) from county A to county B indicates that the home location of the 100 devices is in county A. Based on the 4D cube, ODT cubes are derived by aggregating the block group level flows into other geographic levels (e.g., county and state) in the U.S.

The processes of constructing the ODT cubes from Twitter data and SafeGraph data are carried out using Impala and Hive coupled with Esri GIS tools for Hadoop on the scalable computing environment. New computational approaches can be developed to create ODT cubes from other movement data sources following the two examples.

2.4 ODT cube-based human mobility analysis

Once the ODT cube is built, the traditional data cube atomic operations including slice, dice, drill up, drill down, and pivot can be applied to conduct various spatiotemporal query, analysis, extraction, and aggregation of billions of flows to support interactive exploration of human mobility at various geographic scales. These atomic cube operations are detailed in [53]. For example, the slice operation can be used to derive three matrices (OD, OT, and DT) from ODT cube to capture different aspects of the flows. From the performance perspective, as the ODT cube is stored and managed in HDFS, these cube operations can be efficiently conducted using Structured Query Language (SQL) with Apache Impala, an open-source parallel computing engine that offers low latency and high concurrency for analytic queries on Hadoop. Here, we provide four application scenarios to illustrate how ODT cube coupled with the cube operations can help in exploring and analyzing mobility data (Fig 3).

  • Computing the number of population flows (inflow, outflow, or both) between a place (e.g., county) and all other places in the study area during a selected time period. This information can be applied in infectious disease modeling and other applications that benefit from the knowledge of population movement between places. This computation can be achieved by slicing the ODT cube to create an OT or DT matrix and then perform a temporal aggregation with the selected time period. The result can be visualized as choropleth maps.
  • Computing the number of daily movements (inflow, outflow, or intraflow) for a place. This information reveals population movement trend over a time period, which can be used to examine, for example, the response to stay-at-home orders during the COVID-19 pandemic. This computation can be achieved by slicing the ODT cube to create an OT or DT matrix and then perform a spatial aggregation with the selected place. The result can be visualized as time series charts.
  • Computing the number of population flows among all places (or selected places) in the study area during a specific time period. This computation can be achieved by dicing the ODT cube and aggregate along the time dimension. The result is an OD matrix that can be visualized as flow maps.
  • Extracting a subset of the mobility data based on the user-defined criteria including the area of interest, geographic levels, time period, and so on. This operation can be achieved by dicing the ODT cube into a subcube. The result can be returned as CSV (comma-separated values) data files for further analysis. The ability to extract customized subcubes on-the-fly is essential as it is often challenging and time-consuming to download the entire big movement dataset to extract the needed information.
thumbnail
Fig 3. Four application scenarios exemplifying how ODT cube coupled with the traditional cube operations and a scalable parallel computing environment can help analyze big mobility data.

https://doi.org/10.1371/journal.pone.0255259.g003

3. Results

3.1 Global multi-scale daily OD flows

In this study, the daily OD flows in 2019 and 2020 were extracted using worldwide geotagged tweets collected with Twitter public API and U.S. SafeGraph social distancing metrics data downloaded from SafeGraph website (Table 1). For Twitter data, about 2.7 billion geotagged tweets (excluded non-human tweets) posted by about 25 million Twitter users were used in the flow extraction computation, resulting in about 637 million entity level daily OD flows (user, O, D, T). For SafeGraph data, over 160 million social distancing metrics records were used in the flow extraction computation, resulting in over 11 billion entity level daily OD flows (cbg, O, D, T).

thumbnail
Table 1. Statistics of the derived daily OD flows from Twitter data and SafeGraph data.

https://doi.org/10.1371/journal.pone.0255259.t001

The entity level OD flows were further aggregated into ODT cubes with five geographic scales, including world country/territory (Twitter), world first-level subdivision (Twitter), U.S. state (Twitter and SafeGraph), U.S. county (Twitter and SafeGraph), and U.S. census tract (SafeGraph). The number of daily OD flows (cube cells) for each geographic level and data source are listed in Table 1. These multi-scale OD flow datasets can be queried, extracted, and visualized using the ODT Flow Explorer interactively and the ODT Flow REST APIs programmatically.

3.2 ODT Flow Explorer

The ODT Flow Explorer is an interactive spatial web portal for on-demand querying, slicing, aggregating, and visualizing the multi-scale daily population flows with a few clicks (Fig 4). The front-end of the portal was developed with JavaScript, Html, and CSS using a stack of open-source libraries including jQuery (https://jquery.com), Bootstrap (for web interface, https://getbootstrap.com), and Leaflet (for mapping, https://leafletjs.com). The backend of the portal was developed using Java and is hosted by Apache Tomcat (http://tomcat.apache.org). The portal is connected to query engines (Hive and Impala) on a scalable computing cluster to access and analyze billion-level OD flows in parallel. The web portal has attracted over 2,200 visits from 31 countries according to the RevolverMaps’ real-time visitor statistics widget [54] as the time of writing.

thumbnail
Fig 4. The user interface of the interactive spatial web portal.

The world country and world first-level subdivision boundary data are derived from https://gadm.org.

https://doi.org/10.1371/journal.pone.0255259.g004

3.3 ODT Flow REST APIs

Representational State Transfer (REST) is a widely used architectural style for developing web applications and web services [55]. The ODT Flow REST APIs provide functions for on-demand query, extraction, and aggregation of ODT cube, and deliver the results as a flat data table to be used by other applications in a programming way. Three types of APIs have been implemented in this study at the time of writing: 1) get the aggregated movement between a selected place and other places during a time period, 2) get the daily inter-unit movements between the selected place and other places or the selected place’s daily intra-unit movements, and 3) extract the OD flows in either temporally aggregated format (OD matrix) or daily format (ODT cube) for a specified geographic area and time period.

As the APIs allow access billion-level flows programmatically, they can be integrated with other computing environments such as KNIME workflows and Jupyter Notebooks to develop sharable and reproducibility analyses that involve human mobility, which is demonstrated with the case studies in the following section. More APIs will be added in the future development of the ODT Flow platform.

4. Demonstration

4.1 Using the ODT Flow Explorer to explore, visualize, and download human mobility at different geographic scales

To use the portal, users start by choosing a mobility dataset, a geographic scale, and a time period, and then choose what to do with the selected data. Four options are available (as of version 0.8): Choropleth Map, Flow Map, Daily Movements, and Download, corresponding to the four application scenarios explained in section 2.4.

The Choropleth Map function visualizes a place’s aggregated flows between other places during a time period as a choropleth map. Three types of flow directions can be configured: Inflow, Outflow, and In&Out. Inflow refers to the number of users/devices from other places moving to the selected place during the selected time period. Outflow refers to the number of users/devices moving from the selected place to other places. In&Out contains the movements from both directions. Fig 5A and 5B show the SafeGraph-derived county population flows to New York County (Manhattan) from 03/08/2020 to 03/14/2020 and for the following week (03/15/2020 to 03/21/2020). Fig 5C and 5D show the Twitter-derived flows between England, UK and other first-level administrative units in the Europe area for 01/01/2020–02/29/2020, and 03/01/2020–04/30/2020. The impact of the COVID-19 pandemic on human mobility can be observed in both areas and scales.

thumbnail
Fig 5.

SafeGraph-derived county population flows to New York County from (a) 03/08/2020 to 03/14/2020 and (b) for the following week (03/15/2020 to 03/21/2020). Twitter-derived in & out flows between England, UK and other first-level administrative units in the Europe area for (c) 01/01/2020 to 02/29/2020, and (d) 03/01/2020 to 04/30/2020.

https://doi.org/10.1371/journal.pone.0255259.g005

The Flow Map function visualizes the OD lines based on the selected dataset, geographic level, and time period. Users can choose the area of interest (AOI) by drawing a bounding box on the map or use the full spatial coverage of the data. Flow direction (Inflow, Outflow, and In&Out) and flow color can also be configured. The width of each flow is weighted based on the number of device/user movements for display only. Fig 6 shows county-level population movement from 01/01/2020 to 01/05/2020 derived from Twitter (Fig 6A) and SafeGraph (Fig 6B). Note that the flow map function aims to provide a quick overview of the selected flows. To interactively visualize massive flows, the ODT Flow REST APIs can be used along with kepler.gl (a WebGL-enabled mapping library) in the Jupyter Notebook environment (see section 4.3.2).

thumbnail
Fig 6.

County-level population flows from 01/01/2020 to 01/05/2020 derived from (a) Twitter and (b) SafeGraph. Note: For SafeGraph-derived mobility data, only flows with aggregated device number great than 20 within the selected time period are displayed for performance consideration.

https://doi.org/10.1371/journal.pone.0255259.g006

The Daily Movements function shows the trend of the daily movement for a selected place. Besides Inflow, Outflow, In&Out, the direction option also includes Intraflow, which indicates the number of daily movements within the selected place (flows with a movement distance greater than zero but not crossing the unit boundary). Fig 7 demonstrates the daily population movement trend revealed in different geographic scales of the census tract, county, and country from 01/01/2019 to 12/31/2020. Fig 7A shows the country level daily intra-movements for Spain and Argentina (based on Twitter-derived OD data). The two countries exhibit a similar mobility reduction trend in March and April 2020 but show different recovery patterns starting from May 2020: Spain’s mobility gradually rebounded back until mid-August when it started to drop again; Argentina’s mobility has remained at a relatively low level. Fig 7B shows the U.S. county-level daily inflows to New York County from 01/01/2019 to 12/31/2020 (based on SafeGraph-derived OD data). In both geographic scales and locations, the impact of the COVID-19 pandemic on human mobility is well reflected by the sharp drop around March 11, 2020. Fig 7C shows the daily intraflow and interflow trend for two census tracts in Columbia, South Carolina from 01/01/2019 to 12/31/2020: one tract is part of the campus of the University of South Carolina (Fig 7C) and the other one is a residential area (Fig 7D). For the campus tract, the normal summer holiday season (May 15 –August 31) in 2019 is revealed, and the campus closures due to COVID-19 are also reflected by the extremely low mobility from March to August in 2020. The residential tract, on the other hand, shows no obvious seasonal pattern and the impact of COVID-19 is less dramatic.

thumbnail
Fig 7. Daily population movement in different geographic scales.

(a). Intraflow for Spain (top line) and Argentina (bottom line) in 2019 and 2020; (b) Inflow for New York County, U.S. in 2019 and 2020; (c) Intraflow for a census tract in Columbia, South Carolina (mainly located within the University of South Carolina) from 01/01/2019 to 02/24/2021; (d) Interflow (In&Out) for a census tract in a residential area of Columbia from 01/01/2019 to 02/24/2021.

https://doi.org/10.1371/journal.pone.0255259.g007

Finally, the mobility data can be downloaded using the Download function by selecting the interested data source, geographic level, time period, and aggregation type as CSV (comma-separated values) files for further analysis. Each row in the CSV file contains origin place (o_place), destination place (d_place), date (year, month, day if choosing daily), number of users/devices moved from origin to destination (cnt), and mean center of all flow origins (o_lat, o_lon) and flow destinations (d_lat, d_lon). Fig 8 shows over 160,000 global daily flows were extracted and downloaded using the portal (Fig 8A) and visualized in the third-party mapping library kepler.gl (www.kepler.gl) as a flow map (Fig 8B) and point density map (origin locations, Fig 8C). Section 4.3.2 details how to programmatically extract flow data and visualize in kepler.gl.

thumbnail
Fig 8.

(a) Extract and download the Twitter-derived mobility data in ODT flow explorer at the world first-level subdivision from 01/01/2020 to 03/31/2020; (b) visualize the data in kepler.gl as a flow map; (c) visualize the origin locations as a point density map.

https://doi.org/10.1371/journal.pone.0255259.g008

4.2 Integrating the ODT Flow APIs in the KNIME workflow environment

To demonstrate how the ODT Flow APIs can be used with scientific workflow to produce reproducible, replicable, and expandable mobility data analysis, we developed two groups of case studies 1) dynamic map visualization using human mobility data; and 2) correlation analysis between human mobility and the COVID-19 infections. All case studies are developed using KNIME, a free and open-source visual workflow builder [56].

4.2.1 Dynamic map visualization using human mobility data.

The first case study used ODT Flow APIs to extract daily intra-state mobility values and visualize the mobility trends over time from March 1, 2020 to March 30, 2020 by dynamic choropleth maps. Fig 9 demonstrates visualization implementation by the scientific workflow tool KNIME. Each workflow consists of various types of nodes such as data reader, table joiner, python script, and line plot. In the demonstrated workflow, the procedures are presented intuitively, including 1) input data; 2) variables setting; 3) data processing; 4) map visualization; and 5) chart visualization. ODT APIs are used in the first step (Input Data) to programmatically extract the mobility data. More importantly, to make the data replicable, users only need to change input variables settings (e.g., change to different time periods) without extra operations. Fig 10 displays the interactive dynamic map visualization results and compares the intra-state human mobilities differences before and during the early stage of the COVID-19 pandemic in the U.S. The left map displays the choropleth map of intra-state human mobilities as of March 6, 2020, while the right map displays the mobilities choropleth map as of March 29, 2020. From both maps and line charts, it can be found human mobilities decreased in most of the states since mid-March 2020.

thumbnail
Fig 9. The workflow of human mobility intensity trends visualization.

https://doi.org/10.1371/journal.pone.0255259.g009

thumbnail
Fig 10. Visualization results of human mobility trends from the workflow.

https://doi.org/10.1371/journal.pone.0255259.g010

4.2.2 Correlation analysis between human mobility and the COVID-19 infections.

The magnitude and scale of human movement are critical for disease transmission prediction, risk area identification, and decision-making about control measures [57]. The second case study builds a workflow, integrating population flows retrieved with ODT Flow APIs and COVID-19 infection cases data to investigate how human movements impact disease transmissions. New York state found its first case on March 1, 2020 and soon became the epicenter of the pandemic in the U.S., so we take New York as the study area to explore the impacts of New York population outflows on the COVID-19 spreading in other states. The U.S. confirmed cases are obtained from the shared COVID-19 datasets on Harvard Dataverse [58]. Since New York was locked down on March 22, 2020, we compute the sum of outflows from New York to each state from March 20 through March 21, 2020 and estimate the daily correlation coefficients between total flows and confirmed cases. Furthermore, to evaluate the quality of Twitter-derived mobility data, we replicate the process using Safegraph-derived mobility data (only need to change the source variable of the API from twitter to safegraph) (Fig 11). The results from the workflow indicate that the correlation coefficients estimated by Twitter and Safegraph have the same trends that the values increase since March 22 and reach the peak (0.84 and 0.81) on March 30, which is 9 days to the lockdown day (Fig 12). The high correlations explain the close relations between human movement and COVID-19 virus transmissions. Besides, this case study also verifies the similarity of human mobility data derived from Twitter and Safegraph.

thumbnail
Fig 11. The correlation analysis of human mobility and COVID-19 infection cases in the U.S.

https://doi.org/10.1371/journal.pone.0255259.g011

thumbnail
Fig 12. The correlation coefficients between outflows from New York and COVID-19 infection cases.

https://doi.org/10.1371/journal.pone.0255259.g012

4.3 Integrating the ODT Flow APIs in the Jupyter Notebook environment

Jupyter Notebook (jupyter.org) is an open-source web application for interactive computing. It allows users to create and share documents that contain live code, running results, and comments. We show two case studies using Jupyter Notebook to access human mobility via ODT Flow APIs, and then conduct further analysis and visualization. In both case studies, the analysis from data fetching to results visualization can be completed in minutes using dozens of lines of Python code.

4.3.1 Visual analytics of the impact of COVID-19 on human mobility in France.

In this case study, we visualize and analyze the impact of COVID -19 on human mobility in France’s 13 administrative regions. Such spatial analyses could benefit policymaking at different jurisdictional levels. The first step is data preparation. Daily intra-flows of these 13 regions in 2020 were first extracted using the ODT Flow API. Fig 13 shows a sample code in this step for reading the boundary file (for mapping) and obtaining human mobility data using the API. We then computed the monthly change rates of flows compared with January 2020 for each region. Finally, the change rates of each month were rendered on maps, so that the spatial differences between regions can be visually investigated.

thumbnail
Fig 13. Sample codes of reading the boundary file (for mapping) and obtaining human mobility data using the ODT Flow API.

https://doi.org/10.1371/journal.pone.0255259.g013

We compute the mobility reduction rate for each month of each region (Ri) with Eq 1 using the number of January 2020 flows (MJan.) as the baseline.

(1)

The maps in Fig 14 show the monthly mobility reduction rates of the 13 French regions. A nationwide drop in intra-region mobility started to appear in March following the nationwide mandatory home lockdown on March 16, 2020 when France had over 5,500 COVID-19 cases and 127 deaths [59] (Fig 15). The reduction rate peaked in April 2020 when most regions show over 50% of mobility reduction and started to decrease in May following the end of lockdown on May 11, 2020. In July and August, the mobility in southern regions was recovered along with the low daily infection cases. However, in November and December, the mobility in most regions of France experienced another dramatic reduction following the second nationwide lockdown on October 28 due to the second wave of infection and the high number of daily new death [60]. To visualize and analyze the mobility changes for another country, we only need to set target_place in the code to the country to be analyzed, and re-run the notebook to reproduce the maps, charts, and tables within a minute.

thumbnail
Fig 14. Monthly mobility change rates in 13 French administrative regions.

https://doi.org/10.1371/journal.pone.0255259.g014

thumbnail
Fig 15. Daily COVID-19 new cases in France in 2020.

(Data source: World Health Organization Coronavirus Dashboard, https://covid19.who.int/info/. Accessed on March 19, 2021).

https://doi.org/10.1371/journal.pone.0255259.g015

4.3.2 Interactive visualization of massive flows using ODT Flow APIs and kepler.gl.

The ODT Flow APIs allow users to on-demand query and extract large amounts of ODT flows at different geographic scales with designated study area (bounding box) and time period. As the extracted flow data contains latitude and longitude of the origin and destination locations, it can be directly loaded and visualized in third-party mapping libraries. Kepler.gl is an open-source WebGL-enabled high-performance mapping library, which supports interactive and responsive visualization of large location data. Fig 16 shows the entire code (except module import) of extracting the world first-level administrative OD flow matrix using the API and visualized as an interactive flow map using kepler.gl. The OD matrix is aggregated from January 1 to 5 in 2020 with the geographic area set to the whole world. New flow maps can be quickly regenerated by changing API parameters. For example, by changing the year from 2020 to 2019 and the type from “aggregated” to “daily”, we can quickly generate a new world flow map showing 2019 daily flows. The daily flows can be further animated using kepler.gl’s Time Playback function.

thumbnail
Fig 16. Python code of extracting the world first-level administrative OD flow matrix using the ODT Flow API and visualizing the data as an interactive flow map with kepler.gl.

https://doi.org/10.1371/journal.pone.0255259.g016

5. Discussions

5.1 How does the ODT Flow platform address the challenges of big data?

The overwhelming volumes in mobility datasets have been noted by many. In our case, the Twitter-derived and SafeGraph-derived daily OD Flows own 637 million and 11 billion Entity-O-D-T records, respectively. Such an amount demands not only massive data storage but also efficient processing techniques (intertwining with the velocity challenge). In the designed system, we aim to address the volume/velocity challenge by integrating our ODT data model with high-performance computing techniques. The datasets are stored and handled in a scalable big data computing environment, to facilitate efficient data processing, querying, extracting, and summarizing a large number of OD flows at the server-side. While the system is implemented on our in-house computing cluster, such a computing environment can be quickly provisioned in a cloud environment (e.g., Amazon EC2).

We tackle the variety challenge of the big mobility data via ODT cube, a unified conceptual data model, which supports the extraction and visualization of spatiotemporal standardized OD flows from heterogeneous movement data sources. In addition, the unified ODT cube allows multi-spatial scale queries (see Table 1) within a user-defined temporal period. The above features of the proposed ODT data model well respond to the multi-source and multi-scale characteristics of mobility data, originated from the variety challenge. In a similar manner, we aim to tackle the veracity challenge using the ODT model’s capability to integrate multiple data sources in one unified framework, allowing for easy comparison and synthesis of mobility data at different spatiotemporal scales and capturing the multi-facets of human mobility, thus leading to a more comprehensive understanding of human movement that potentially benefits a wide range of research domains. While two types of mobility data sources (i.e., Twitter and SafeGraph, thanks to their cost-free availability) are included in this study, other mobility data sources can be further included and handled by the proposed framework in a similar way. Although different data sources require different human mobility extraction algorithms, other data sources, serving as plug-and-play components for the system, can be incorporated and implemented.

Numerous studies have demonstrated the important role (value) of mobility datasets in fields such as migration, urban planning, disaster management, to name a few. In this study, we direct our attention to demonstrating the utility of the proposed ODT framework in addressing the challenges in the ongoing COVID-19 pandemic (as of the time of writing). Our demonstrations reveal the similarity and dissimilarity of selected mobility sources and confirm close relations between human movement and COVID-19 virus transmissions. In this context, the proposed ODT Flow platform uniquely serves the soaring needs of human mobility data during disaster events such as the COVID-19 pandemic we are facing. The interactive web portal and APIs (integrated in KNIME workflow and Python Jupyter Notebook) meet the needs of different user communities, which maximizes the value of mobility data by reaching out to broader users.

5.2 Reproducibility, replicability, and privacy

There is a growing awareness of challenges in reproducibility and replicability facing the academic community [36, 37]. The extensive use of locational data for applications such as disaster and humanitarian response raises the issue of reproducibility and replicability from competing perspectives of location privacy and geospatial data quality [61]. Following guidelines proposed by [36], we aim to facilitate reproducibility and replicability by 1) providing a unified and well-documented online human mobility data repository and an interactive web portal that allows querying, visualizing, and downloading multi-source mobility data at different geographic scales, 2) providing APIs that facilitate easy programmatical access of the datasets to ease for reproducibility of studies the demand mobility data, and 3) providing mobility analysis demonstrations using the accessible and open computing environments of KNIME workflow and Jupyter Notebook.

Mobility data consists of individuals’ location stamps, thus posing challenges to privacy protection as human movements are regarded as unique and predictable [62]. Scholars have voiced concerns on whether mobility data sharing is appropriate, even in the time of crisis like the COVID-19 pandemic [30, 63, 64]. Nonetheless, mobility records that have been properly aggregated and anonymized (i.e., Google Community Mobility Reports, Apple Mobility Trends Reports, Descartes Mobility Records, and SafeGraph mobility records) have become acceptable and popular among academic communities [38, 6568]. SafeGraph mobility records, one of the demonstrated mobility data sources in this study, are derived using a panel of GPS points from anonymous mobile devices. To enhance privacy, SafeGraph excludes cbg information if fewer than five devices visited an establishment in a month from a given cbg [34]. In our designed ODT framework, we further aggregate SafeGraph’s records at the cbg-level to upper geographic scales, i.e., census tract, before sharing them with the public. As for Twitter data, we anonymize collected tweets and aggregate them to a geographic level as fine as the U.S. County. In future development, we will continue to follow standard privacy protection guidelines, ensuring mobility records regarding individuals’ movements are protected.

5.3 Data limitations

In this study, the proposed ODT framework includes two data sources: 1) worldwide geotagged tweets collected using the Twitter public API [51] and 2) Social Distancing Metrics (SDM) provided by SafeGraph based on U.S. mobile devices [34]. We acknowledge their intrinsic data limitations as below.

5.3.1 Twitter-derived population flows.

The limitations of Twitter data have been documented by a number of studies [6971]. First, Twitter is not proportionally used by different population groups, thus presenting notable demographic and socioeconomic biases. For example, according to Statista [72], 28.4% of global Twitter users were aged between 35 and 49 years, 59.6% of global Twitter users were younger than 35, and 12% aged over 50. Second, geotagged tweets collected from the free public Twitter API (about 1% of the entire Twitter stream) are sparse and may not enough to capture the temporal patterns at the daily level for less populated areas. This is particularly the case when deriving county level daily population flows, as a Twitter user was included only when that user posted at least two tweets on a single day or posted tweets on at least two consecutive days. Third, the dynamics of people’s Twitting activities (e.g., people tend to tweet more during big events), as well as the changing of Twitter’s internal API, affect the daily number of tweets being collected. Thus, we advise that studies using the Twitter-derived flow data should be aware of these limitations when interpreting results and reaching conclusions. Nevertheless, a recent study suggests that Twitter-derived mobility data are able to capture the general human movement dynamics during the COVID-19 pandemic and present considerable similarity with other mobility data sources [27].

5.3.2 SafeGraph-derived population flows.

SafeGraph data have a high penetration rate (~10% of mobile devices in the U.S.) and well represent the U.S. population groups according to [73]. As a result, flows derived from SafeGraph are considerably denser than Twitter-derived flows, which overcomes the Twitter data limitations. Comparing to Twitter data, one downside of the SafeGraph-derived mobility data is their spatiotemporal scale, as the SafeGraph SDM data only date back to 2019, covering only the U.S. In addition, as SafeGraph does not provide individual-level data (the finest resolution is census block group level), aggregating the data to a larger spatial scale (e.g., county level) would exaggerate the actual number of visitors as a visitor could be counted multiple times during the aggregation. Finally, the ability to continue updating the daily OD flows from both Twitter data and SafeGraph data depends on the data availability which is determined by the data providers. These limitations further highlight the importance and necessity of sharing and fusing multiple data sources for human mobility studies.

6. Conclusion

Human mobility dynamics provide fundamental knowledge regarding spatial interactions, benefiting a wide range of applications in need of such prior knowledge. The COVID-19 pandemic, to some degree, re-emphasizes the importance of human mobility monitoring and the value of mobility records. With the entering of the Big Data Era, new challenges start to appear, as fine-grained, large-scale human mobility records well fall into the category of Big Data, characterized by challenges of 5Vs that demand a shift of data handling and sharing paradigm.

In response to the soaring needs of human mobility data, especially during disaster events such as the COVID-19 pandemic, and to the 5V challenges in big mobility data, we develop a scalable platform for extracting, querying, visualizing, and sharing multi-source multi-scale human mobility data. The ODT data model is designed to work with parallel query engines (Apache Hive and Impala) to handle mobility data in large volumes with extensive spatial coverage. We process the human mobility data using a high-performance computing environment with source-specific human mobility extraction algorithms, which achieves efficient extracting of billion-level OD flows at the server-side. To enhance end-users’ experience, we develop ODT Flow Explorer, allowing users to intuitively and interactively explore multi-source mobility datasets with user-defined spatiotemporal scales, which is expected to benefit both scientific communities and the general public in understanding human mobility dynamics. To promote reproducibility and replicability, we further develop ODT Flow REST APIs that provide researchers with the flexibility to access the data programmatically via workflows, codes, and programs. In the presented case studies, we demonstrate the potential of ODT Flow APIs coupled with KNIME scientific workflows and with Jupyter Notebooks to facilitate researchers to access and analyze massive OD flows.

While the ODT Flow platform (at the time of writing) features two mobility data sources, Twitter and SafeGraph, other mobility data sources such as the taxi trip data and cell phone call detail record (CDR) data, serving as plug-and-play components for the system, can be further included and handled by the proposed framework in a similar way. In future development, we will add more interactive visual analytics functions to the ODT Flow Explorer, integrate WebGL support (such as kepler.gl) to the mapping component so that it can handle large datasets visualization more efficiently, and develop more REST APIs to better support data extraction and analysis.

Supporting information

References

  1. 1. Lamsfus C, Martín D, Alzua-Sorzabal A, Torres-Manzanera E. Smart Tourism Destinations: An Extended Conception of Smart Cities Focusing on Human Mobility. In: Tussyadiah I, Inversini A, editors. Information and Communication Technologies in Tourism 2015. Cham: Springer International Publishing; 2015. pp. 363–375. https://doi.org/10.1186/s12936-015-0628-0 pmid:25889522
  2. 2. Hall CM. Reconsidering the geography of tourism and contemporary mobility. Geographical Research. 2005;43: 125–139.
  3. 3. Sirkeci I, Cohen JH. Cultures of Migration and Conflict in Contemporary Human Mobility in Turkey. European Review. 2016;24: 381–396.
  4. 4. Afifi T, Milan A, Etzold B, Schraven B, Rademacher-Schulz C, Sakdapolrak P, et al. Human mobility in response to rainfall variability: opportunities for migration as a successful adaptation strategy in eight case studies. Migration and Development. 2016;5: 254–274.
  5. 5. Hillier B, Turner A, Yang T, Park H-T. Metric and topo-geometric properties of urban street networks: some convergences, divergences and new results. Journal of Space Syntax Studies. 2009.
  6. 6. Bhat CR, Guo JY, Srinivasan S, Sivakumar A. Comprehensive econometric microsimulator for daily activity-travel patterns. Transportation Research Record. 2004;1894: 57–66.
  7. 7. Wu L, Zhi Y, Sui Z, Liu Y. Intra-Urban Human Mobility and Activity Transition: Evidence from Social Media Check-In Data. PLOS ONE. 2014;9: e97010. pmid:24824892
  8. 8. Kitamura R, Chen C, Pendyala RM, Narayanan R. Micro-simulation of daily activity-travel patterns for travel demand forecasting. Transportation. 2000;27: 25–51.
  9. 9. Smolak K, Kasieczka B, Fialkiewicz W, Rohm W, Sila-Nowicka K, Kopańczyk K. Applying human mobility and water consumption data for short-term water demand forecasting using classical and machine learning models. Urban Water Journal. 2020;17: 32–42.
  10. 10. Martín Y, Cutter SL, Li Z, Emrich CT, Mitchell JT. Using geotagged tweets to track population movements to and from Puerto Rico after Hurricane Maria. Popul Environ. 2020;42: 4–27.
  11. 11. Jiang Y, Li Z, Cutter SL. Social Network, Activity Space, Sentiment, and Evacuation: What Can Social Media Tell Us? Annals of the American Association of Geographers. 2019;109: 1795–1810.
  12. 12. Oliver N, Lepri B, Sterly H, Lambiotte R, Deletaille S, Nadai MD, et al. Mobile phone data for informing public health actions across the COVID-19 pandemic life cycle. Science Advances. 2020;6: eabc0764. pmid:32548274
  13. 13. Kumar A, Gupta PK, Srivastava A. A review of modern technologies for tackling COVID-19 pandemic. Diabetes & Metabolic Syndrome: Clinical Research & Reviews. 2020;14: 569–573. pmid:32413821
  14. 14. Ye X, Li S, Peng Q. Measuring interaction among cities in China: A geographical awareness approach with social media data. Cities. 2021;109: 103041.
  15. 15. Santos A, McGuckin N, Nakamoto HY, Gray D, Liss S. Summary of travel trends: 2009 national household travel survey. United States. Federal Highway Administration; 2011.
  16. 16. Barbosa H, Barthelemy M, Ghoshal G, James CR, Lenormand M, Louail T, et al. Human mobility: Models and applications. Physics Reports. 2018;734: 1–74.
  17. 17. Olabarria M, Pérez K, Santamariña-Rubio E, Aragay JM, Capdet M, Peiró R, et al. Work, family and daily mobility: a new approach to the problem through a mobility survey. Gaceta Sanitaria. 2013;27: 433–439. pmid:23122515
  18. 18. Goodchild MF. Citizens as sensors: the world of volunteered geography. GeoJournal. 2007;69: 211–221.
  19. 19. Yang C, Huang Q, Li Z, Liu K, Hu F. Big Data and cloud computing: innovation opportunities and challenges. International Journal of Digital Earth. 2017;10: 13–53.
  20. 20. Marr B. Big Data: Using SMART big data, analytics and metrics to make better decisions and improve performance. John Wiley & Sons; 2015.
  21. 21. Wang Y, Jiang W, Liu S, Ye X, Wang T. Evaluating Trade Areas Using Social Media Data with a Calibrated Huff Model. ISPRS International Journal of Geo-Information. 2016;5: 112.
  22. 22. Li Z. Geospatial Big Data Handling with High Performance Computing: Current Approaches and Future Directions. In: Tang W, Wang S, editors. High Performance Computing for Geospatial Applications. Cham: Springer International Publishing; 2020. pp. 53–76. https://doi.org/10.1007/978-3-030-47998-5_4
  23. 23. Warren MS, Skillman SW. Mobility Changes in Response to COVID-19. arXiv:200314228 [cs]. 2020 [cited 17 Jul 2021]. Available: http://arxiv.org/abs/2003.14228
  24. 24. Li Z, Wang C, Emrich CT, Guo D. A novel approach to leveraging social media for rapid flood mapping: a case study of the 2015 South Carolina floods. Cartography and Geographic Information Science. 2018;45: 97–110.
  25. 25. Ye X, Gong J, Li S. Analyzing Asymmetric City Connectivity by Toponym on Social Media in China. Chin Geogr Sci. 2021;31: 14–26.
  26. 26. Khan N, Yaqoob I, Hashem IAT, Inayat Z, Mahmoud Ali WK, Alam M, et al. Big data: survey, technologies, opportunities, and challenges. The scientific world journal. 2014;2014.
  27. 27. Huang X, Li Z, Jiang Y, Ye X, Deng C, Zhang J, et al. The characteristics of multi-source mobility datasets and how they reveal the luxury nature of social distancing in the US during the COVID-19 pandemic. International Journal of Digital Earth. 2021;14: 424–442.
  28. 28. Hu T, Wang S, She B, Zhang M, Huang X, Cui Y, et al. Human Mobility Data in the COVID-19 Pandemic: Characteristics, Applications, and Challenges. Applications, and Challenges (May 24, 2021). 2021.
  29. 29. da Câmara Ribeiro-Dantas M, Alves G, Gomes RB, Bezerra LC, Lima L, Silva I. Dataset for country profile and mobility analysis in the assessment of COVID-19 pandemic. Data in brief. 2020;31: 105698. pmid:32405515
  30. 30. Huang X, Li Z, Jiang Y, Li X, Porter D. Twitter reveals human mobility dynamics during the COVID-19 pandemic. PloS one. 2020;15: e0241957. pmid:33170889
  31. 31. Descartes Labs. Aggregated Mobility Index. 2020 [cited 17 Feb 2020]. Available: https://mktg.descarteslabs.com/mobility-tracking
  32. 32. COVID-19 Community Mobility Report. In: COVID-19 Community Mobility Report [Internet]. 2020 [cited 17 Feb 2020]. Available: https://www.google.com/covid19/mobility?hl=en
  33. 33. COVID‑19—Mobility Trends Reports. In: Apple [Internet]. 2020 [cited 17 Feb 2021]. Available: https://www.apple.com/covid19/mobility
  34. 34. Social Distancing Metrics. In: SafeGraph [Internet]. 2020 [cited 12 Mar 2021]. Available: https://docs.safegraph.com/docs/social-distancing-metrics
  35. 35. Rubin V, Lukoianova T. Veracity Roadmap: Is Big Data Objective, Truthful and Credible? Advances In Classification Research Online. 2013;24: 4–15. http://doi.org/10.7152/acro.v24i1.14671
  36. 36. Choi Y-D, Goodall JL, Sadler JM, Castronova AM, Bennett A, Li Z, et al. Toward open and reproducible environmental modeling by integrating online data repositories, computational environments, and model Application Programming Interfaces. Environmental Modelling & Software. 2021;135: 104888.
  37. 37. Sui D, Kedron P. Reproducibility and Replicability in the Context of the Contested Identities of Geography. Annals of the American Association of Geographers. 2021;111: 1275–1283.
  38. 38. Gao S, Rao J, Kang Y, Liang Y, Kruse J. Mapping county-level mobility pattern changes in the United States in response to COVID-19. SIGSpatial Special. 2020;12: 16–26.
  39. 39. AL-Dohuki S, Kamw F, Zhao Y, Ye X, Yang J, Jamonnak S. An Open Source TrajAnalytics Software for Modeling, Transformation and Visualization of Urban Trajectory Data. 2019 IEEE Intelligent Transportation Systems Conference (ITSC). 2019. pp. 150–155.
  40. 40. Kang Y, Gao S, Liang Y, Li M, Rao J, Kruse J. Multiscale dynamic human mobility flow dataset in the US during the COVID-19 epidemic. Scientific data. 2020;7: 1–13. pmid:31896794
  41. 41. Kedron P, Li W, Fotheringham S, Goodchild M. Reproducibility and replicability: opportunities and challenges for geospatial research. International Journal of Geographical Information Science. 2021;35: 427–445.
  42. 42. Yang C, Cao Y, Evans J, Kafatos M, Bambacus M. Spatial Web Portal for Building Spatial Data Infrastructure. Geographic Information Sciences. 2006;12: 38–43.
  43. 43. Li Z, Li X, Porter D, Zhang J, Jiang Y, Olatosi B, et al. Monitoring the Spatial Spread of COVID-19 and Effectiveness of Control Measures Through Human Movement Data: Proposal for a Predictive Model Using Big Data Analytics. JMIR Res Protoc. 2020;9: e24432. pmid:33301418
  44. 44. Stolte C, Tang D, Hanrahan P. Multiscale visualization using data cubes. IEEE Transactions on Visualization and Computer Graphics. 2003;9: 176–187.
  45. 45. Guo D, Chen J, MacEachren AM, Liao K. A visualization system for space-time and multivariate patterns (vis-stamp). IEEE transactions on visualization and computer graphics. 2006;12: 1461–1474. pmid:17073369
  46. 46. Li W, Wu S, Song M, Zhou X. A scalable cyberinfrastructure solution to support big data management and multivariate visualization of time-series sensor observation data. Earth Sci Inform. 2016;9: 449–464.
  47. 47. Kraak M-J. The space-time cube revisited from a geovisualization perspective. Proc 21st International Cartographic Conference. Citeseer; 2003. pp. 1988–1996.
  48. 48. Kveladze I, Kraak M-J, Elzakker CPJMV. The space-time cube as part of a GeoVisual analytics environment to support the understanding of movement data. International Journal of Geographical Information Science. 2015;29: 2001–2016.
  49. 49. Yang L, Kwan M-P, Pan X, Wan B, Zhou S. Scalable space-time trajectory cube for path-finding: A study using big taxi trajectory data. Transportation Research Part B: Methodological. 2017;101: 1–27.
  50. 50. Cloudera. CHD Components. In: Cloudera [Internet]. 2021 [cited 12 Mar 2021]. Available: https://www.cloudera.com/products/open-source/apache-hadoop/key-cdh-components.html
  51. 51. Twitter. Twitter API Documentation. 2021 [cited 17 Mar 2021]. Available: https://developer.twitter.com/en/docs/twitter-api
  52. 52. Li Z., Huang X., Ye X., Jiang Y., Martin Y., Ning H., Hodgson M., Li X., (2021), Measuring Global Multi-Scale Place Connectivity using Geotagged Social Media Data, Scientific Reports, 11, 14694. pmid:34282241
  53. 53. Datta A, Thomas H. The cube data model: a conceptual model and algebra for on-line analytical processing in data warehouses. Decision Support Systems. 1999;27: 289–301.
  54. 54. RevolverMaps. RevolverMaps Live Statistics. 2021 [cited 17 Jul 2021]. Available: https://www.revolvermaps.com/livestats/map/5cxm0tqf9wo/
  55. 55. Fielding RT. Architectural styles and the design of network-based software architectures. University of California, Irvine; 2000.
  56. 56. KNIME. 2020 [cited 14 Mar 2021]. Available at https://www.knime.com/
  57. 57. Yang C, Sha D, Liu Q, Li Y, Lan H, Guan WW, et al. Taking the pulse of COVID-19: a spatiotemporal perspective. International Journal of Digital Earth. 2020;13: 1186–1211.
  58. 58. Hu T, Guan WW, Zhu X, Shao Y, Liu L, Du J, et al. Building an open resources repository for COVID-19 research. 2020.
  59. 59. Cuthbertson A. France imposes 15-day lockdown and mobilises 100,000 police to enforce coronavirus restrictions. The Independent. 16 Mar 2020. Available: https://www.independent.co.uk/news/world/europe/coronavirus-france-lockdown-cases-update-covid-19-macron-a9405136.html. Accessed 17 Jul 2021.
  60. 60. Ledsom A. France Back In Nationwide Lockdown In ‘Worse’ Second Wave: What You Now Can And Can’t Do. In: Forbes [Internet]. 28 Oct 2020 [cited 17 Jul 2021]. Available: https://www.forbes.com/sites/alexledsom/2020/10/28/france-back-in-nationwide-lockdown-in-worse-second-wave-what-you-now-can-and-cant-do/
  61. 61. Tullis JA, Kar B. Where Is the Provenance? Ethical Replicability and Reproducibility in GIScience and Its Critical Applications. Annals of the American Association of Geographers. 2021;111: 1318–1328.
  62. 62. De Montjoye Y-A, Hidalgo CA, Verleysen M, Blondel VD. Unique in the crowd: The privacy bounds of human mobility. Scientific reports. 2013;3: 1–5. pmid:23524645
  63. 63. Bengio Y, Janda R, Yu YW, Ippolito D, Jarvie M, Pilat D, et al. The need for privacy with public digital contact tracing during the COVID-19 pandemic. The Lancet Digital Health. 2020;2: e342–e344. pmid:32835192
  64. 64. Cho H, Ippolito D, Yu YW. Contact tracing mobile apps for COVID-19: Privacy considerations and related trade-offs. arXiv preprint arXiv:200311511. 2020.
  65. 65. Huang X, Li Z, Lu J, Wang S, Wei H, Chen B. Time-series clustering for home dwell time during COVID-19: what can we learn from it? ISPRS International Journal of Geo-Information. 2020;9: 675.
  66. 66. Kogan NE, Clemente L, Liautaud P, Kaashoek J, Link NB, Nguyen AT, et al. An early warning approach to monitor COVID-19 activity with multiple digital traces in near real time. Science Advances. 2021;7: eabd6989. pmid:33674304
  67. 67. Chang S, Pierson E, Koh PW, Gerardin J, Redbird B, Grusky D, et al. Mobility network models of COVID-19 explain inequities and inform reopening. Nature. 2021;589: 82–87. pmid:33171481
  68. 68. Huang X, Lu J, Gao S, Wang S, Liu Z, Wei H. Staying at Home Is a Privilege: Evidence from Fine-Grained Mobile Phone Location Data in the United States during the COVID-19 Pandemic. Annals of the American Association of Geographers. 2021;0: 1–20.
  69. 69. Li L, Goodchild MF, Xu B. Spatial, temporal, and socioeconomic patterns in the use of Twitter and Flickr. Cartography and Geographic Information Science. 2013;40: 61–77.
  70. 70. Malik MM, Lamba H, Nakos C, Pfeffer J. Population Bias in Geotagged Tweets. Ninth International AAAI Conference on Web and Social Media. 2015. Available: https://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/view/10662
  71. 71. Jiang Y, Li Z, Ye X. Understanding demographic and socioeconomic biases of geotagged Twitter users at the county level. Cartography and geographic information science. 2019;46: 228–242.
  72. 72. Tankovska T. As of April 2021, Twitter global audience was composed of 38.5 percent of users aged between 25 and 34 years old. In: Statista [Internet]. 2021 [cited 17 Jul 2021]. Available: https://www.statista.com/statistics/283119/age-distribution-of-global-twitter-users/
  73. 73. Squire R. What about bias in the SafeGraph dataset? 2019 [cited 8 Nov 2020]. Available: https://www.safegraph.com/blog/what-about-bias-in-the-safegraph-dataset