1 Introduction

In the last years we are witnessing the spread of knowledge intensive applications that relies on the flourishing of the datasets available in the Linked Data (LD) CloudFootnote 1. The richness of semantic data they expose paves the way to a new generation of services and tools exploiting the ontological knowledge they encode as well as the possibility to easily mash up data coming from different sources. Among them, geographical datasets are becoming more and more important to deliver location-aware services.

The availability of a common query interface i.e. SPARQL, and the crowd-driven standardization of ontological vocabularies allows an intelligent application to grab data from diverse datasets and join them. We may imagine different scenarios where such a feature can be an important asset to provide a high quality service such as context-aware recommendation systems [15], on-line shopping, etc. or public services needed in situations of disaster management where the quality and timeliness of the data is of crucial importance.

In many application scenarios, geographical information is a key factor to enhance system answers to user request, e.g. in a movie recommendation scenario in order to suggests not only a movie according to user preferences, but also with reference to the closeness of the cinema.

Geographical knowledge bases and geo-spatial reasoning may play a key role also in emergency response or transportation planning [3, 5, 16] where it results useful to combine knowledge from different types of datasets in order to get knowledgeable information from the system. As a way of example, while organizing a trip, the user could combine information about the cultural heritage sites and means of transport with hotel accommodations, taking into account their proximity. Analogously, in emergency response situations, data can be combined to obtain a helpful and timely response. In an emergency scenario, e.g. an earthquake, the user should be able to query a knowledge base to look for collection camps, hospitals, rescue places, areas for helicopter landing as well as informal camps where the homeless have set up tent camps.

In this paper we present LOSM (Linked Open Street Map), a service that acts as a SPARQL endpoint on top of OSM. LOSM allows one to query Open Street Map data via a SPARQL query as it takes care to translate the query into a set of calls to OSM APIs. This results in exposing all the information contained in the Open Street Map geographical database as Linked Data thus making OSM a first class citizen in the Linked Data Cloud.

While LOSM supports an on-the-fly integration of OSM data with data coming from other sources, differently from similar projects (see LinkedGeoData Footnote 2 [2] for example) LOSM does not rely on a periodical dump of the data, but it always exposes fresh and up-to-date data.

Another strength of our tool is the possibility to merge different datasets of the Linked Data cloud, linking geo-spatial knowledge with the one coming from various knowledge bases, as we show in Sect. 3.1.

The remainder of this paper proceeds as follows. In Sect. 2 we start by briefly describing Open Street Map and the reason why we chose it as a provider of linked geo-spatial data, then we describe the system architecture and the query language implemented in LOSM. Then, in Sect. 3 we describe two use cases through some sample queries highlighting the capabilities of LOSM, with a particular reference to an emergency management scenario. Finally, in Sect. 4 we review related works relying on the use of geo-spatial data. Conclusion and future work close the paper.

2 LOSM: Linked Open Street Map

In the “geo-data” arena, the crowd sourced project Open Street Map [10] is currently playing a primary role due to its openness, easy of use and of integration in third party applications.

OSM is a geographical database maintained by Web users containing a huge amount of data that can also be displayed on a map. Its database is updated every 15 min and as of today it contains \(5,\!027,\!330,\!590\) GPS points and \(2,\!445,\!598\) users who contribute to the project.

All this data is either available via weekly dumps or it is queryable through a public API. In particular, overpassFootnote 3 is a read-only API which allows the user to query Open Street Map by means of at least two different languages: XML or Overpass QL. By means of an overpass query, the API is able to retrieve nodes within an area, recognize streets or relations.

The query language is very expressive and makes possible to perform spatial reasoning by imposing constraints within the query. For instance, the user may impose relationships among nodes through filters such as around, bounding box and the poly function. It is easy to see that having such data available in the Linked Open Data cloud would surely enrich the amount and quality of the information available within the so called Web of Data.

This is the rationale behind the LinkedGeoData project [2]. It aims at triplifying Open Street Map dumps every six months by mapping OSM tags and sourceKey properties with reference to a publicly available ontology. This is a very useful resource because it makes available classes that map keys and tags used in Open Street Map nodes.

Although the big effort and work in developing and maintaining the datasets behind the project, LinkedGeoData suffers from the misalignment between the data available via the SPARQL endpoint (based on a dump) and the one available in Open Street Map. Indeed, the updates made by the users are available as RDF triples only when the dump is processed and loaded in the LinkedGeoData triple-store.

Despite the considerable effort, LinkedGeoData approach cannot be used for all those scenarios where timeliness and freshness of information is a must have. A flagship example is that of disaster recovery where information about collection camps, rescue places, temporary hospitals, passable roads, etc. needs to be available as soon as possible.

Starting from this observation we developed LOSM, a SPARQL endpoint that acts as a translator from a SPARQL query to a set of overpass API calls. In such a way we are sure that the data we retrieve is always fresh and up-to-date as they come directly from the OSM database, which is constantly updated by a crowd of volunteer all around the world.

The scheme in Fig. 1 shows an overview of the service architecture. In a few words, the systems is able to translate a SPARQL query to a sequence of (iterative) overpass API calls, collect the data, join and return it to the client. We currently support SPARQL queries via HTTP GET.

The Parser uses a scanner for the recognition of lexemes in a SPARQL query and creates the data-structures needed by the Query Manager. This module is in charge of breaking the query into sub-queries according to the remote functions available in the overpass API.

The Result Manager handles the sub-queries and the results they generate to create the final Result map. The Result Manager breaks the graph pattern described in the SPARQL query into a set of connected sub-graphs by identifying their mutual relations. Each sub-query goes through the Translator which is in charge of creating the overpass calls.

Fig. 1.
figure 1

Overall representation of the system architecture

The system also exposes a Web page with a query editor with autocomplete facilities with respect to the LinkedGeoData ontology classes.

LOSM is available at http://sisinflab.poliba.it/semanticweb/lod/losm/.

2.1 The SPARQL Sublanguage Implemented in LOSM

In its current version, LOSM implements a subset of the full specification of SPARQL 1.0 plus some non-standard featuresFootnote 4 that results very useful when querying geographical data. We currently support only the SELECT query form and the Jena SpatialFootnote 5 extension also available in GeoSPARQL [3]. We support simple graph patterns that we anyway consider representative of a large number of queries over geographical data. As for the implemented spatial functions, we may list:

  • spatial:nearby (latitude longitude radius [units]) Footnote 6 returns URIs nodes (Open Street Map URIs) within the radius distance of the location of the specified latitude and longitude.

  • spatial:withinCircle (latitude longitude radius [units]) computes a circle centered in specified latitude and longitude and given radius and returns the OSM nodes within the circle.

  • spatial:withinBox (latitude_min longitude_min latitude_max longitude_max) calculates a rectangle by specifying the list of coordinates for the edges that has to follow the order provided in the function.

  • spatial:within("POLYGON((Point1_lat Point1_lon,...,PointN_lat PointN_lon))") calculates the polygon area expressed by Well Known Text (WKT) literals and returns OSM nodes available within it.

Regarding the URIs of classes and properties used in the graph pattern for SPARQL queries, LOSM may refer to the LinkedGeoData Ontology controlled vocabulary as well as to an ontological one which is a one-to-one mapping with OSM system of tagsFootnote 7. The rationale behind the introduction of this new vocabulary is driven by the main goal we had in mind while developing LOSM: to have a SPARQL endpoint able to timely expose all the changes that continually happen in Open Street Map even at the semantic level (represented by the tags).

Although very useful and structured, a static ontology as the one modelled within the LinkedGeoData project, cannot follow continuous data variations due to users’ freedom in inserting new tags and values. In each community a lot of linguistic phenomenons happen in time that change the frequency of a term occurrence and then its importance and adoption by the community itself. To address this problem we introduced a LOSM prefix <http://sisinflab.poliba.it/semanticweb/lod/losm/ontology/> (shortened in losm) based on the same crowdsourcing concept. The Parser recognizes the use of this prefix and prepares the overpass query in the proper way. This lets the user to use any term she considers reasonable and the evaluation of the existence of the term is based only on the real data coming from Open Street Map. As an example, if the user wants to retrieve information classified with the key-value pair key="refugee" value="yes" she will refer to the corresponding property represented by the URI <http://sisinflab.poliba.it/semanticweb/lod/losm/ontology/refugee> or equivalently by the CURIE losm:refugee (see the example in Sect. 3.1).

Some keys are reserved to provide advanced features in LOSM that makes easy the integration with other external knowledge graphs exposing a SPARQL endpoint such as DBpediaFootnote 8 or WikidataFootnote 9. The losm:dbpedia property acts as a converter from the Wikipedia page or Wikipedia id associated to an Open Street Map node to the corresponding DBpedia resource. Analogously, losm:wikipedia returns the complete URI of the corresponding Wikipedia page. In both cases, the output may refer to the main English version of DBpedia/Wikipedia or to a local version depending on the value of the OSM key wikipedia.

3 Use Case

The first use case we present in this section has the only purpose to explain how LOSM works, showing the steps performed by the Query Manager (Fig. 1).

Suppose the following situation: the day is over in our laboratory and the crew wants to find restaurants nearby (within 200 m) together with the cinemas which are at most one km far from each restaurant. They want to know the names of restaurants and cinemas together with the URIs of the latter. The above use case can be modeled by the SPARQL query:

figure a

The query is processed based on a priority system relying on a weighted dependency graph. The triples composing the graph pattern are analysed and grouped into the corresponding sub-graphs by looking at their subject. The system attaches to each triple a value depending on the degree of connection with other groups measured by taking into account shared variables and predicates. Then each group is labeled with a weight proportional to its triples values. The group with the lowest value is the first sent t o the Translator component. The Translator converts the set of sub-queries to its overpass equivalent starting with the triple with the lowest value. Once the results from overpass API are returned, they are used to update the initial query so new weights are computed. The process iterates on triples groups until the last sub-query has been translated.

figure b

Based on the above grouping, the Query Manager selects first the ?link group and generates the Overpass QL expression representing the first query to the overpass API:

figure c

Then the system executes the overpass query related to the ?object group which is composed by taking into account the results of the previous one.

figure d

The final sub-graph is converted into a set of overpass API calls; one for each node returned by the previous query. As an example we have:

figure e

3.1 Emergency Management Scenario

We now present a use case in emergency management where the usefulness of LOSM is twofold. On the one hand, we may have access to always fresh and timely information in the context of an unpredictable disaster. On the other hand, we can exploit a third party endpoint supporting SPARQL 1.1 to perform a federated query among LOSM, DBpedia and Wikidata thus mashing up the knowledge coming from the three sources. Indeed, in such a context relevant data rapidly changes over time and the system capability of linking information from different knowledge bases is crucial.

An Italian manager is doing a business trip in the Miyagi Prefecture when an earthquake happens. The damages all around are severe and catastrophic. It is possible that aftershocks will follow and he has to find a way to rescue himself in a foreign country, plus he does not speak Japanese. In the mean time news about the event are reaching any corner of the world and mechanisms of international assistance are already on the move. Volunteers are populating Open Street Map with fresh data about collection camps, rescue places and temporary hospitals Footnote 10. The manager has two primary needs: reaching a near refugee camp and, then, look for an airport to go back to Italy. He has a mobile phone with Gps and Internet connectivity so he tries to look for refugee camps, mapped on Open Street Map, which are located near by.

This request can be translated in the following SPARQL queryFootnote 11:

figure f

This query allows the user to retrieve any item containing the tag refugee within a circle with radius of 5000 m and returns the Open Street Map node, the name, and the GPS coordinates (lat, long). Obviously, the user does not have to write the sparql query himself, but he should rely on an end-user interface that allows him to build a sparql query without knowing the sparql language (see as a way of example the tools presented in [7, 8]).

In order to show the capability of the system to link information coming from different knowledge bases, we give an example of a more complex and exhaustive queries that can be posed to the Linked Data Cloud thanks to the use of LOSM.

From the previous query, the manager has found a refugee camp whose name he cannot understand as it is returned in Japanese. Anyway, based on the result of the previous query he wants to retrieve information about the nearest cities (within 10 km) and airports to go back to Italy. He wants to retrieve info about the nearest cities in Italian and the name of the airports (together with its coordinates) in English in order to pronounce it in an understandable way.

figure g

The previous query, by exploiting the SERVICE keyword from SPARQL 1.1, is able to combine information from different knowledge graphs with the one coming from LOSM. The first service invoked is the LOSM endpoint, the query returns cities within a radius of 10 Km from the refugee camp found in the previous queryFootnote 12. Additionally, the DBpedia resource URI and the Wikidata ID are returned. The second invoked service is the DBpedia endpoint. Here the query returns the Wikipedia URI, the Italian description of the city, the English name and the latitude and longitude of the airport. The last service is the Wikidata endpoint, form which we get the identifiers of the same city in Freebase and Geonames knowledge bases.

Summing up, the previous example shows how it is possible to get data referring to six different data sources (Open Street Map, Wikipedia, DBpedia, Wikidata, Freebase and Geonames) having only the two values of latitude and longitude available. It is worth to note that most of the data sources have a crowdsourcing nature, which usually weakens the integration between datasets because of the heterogeneity of the contributions. The problem is highly mitigated in this scenario thanks to spatial queries that can retrieve the outgoing references from points near to the starting one.

4 Related Work

In this section we first briefly describe various approaches and systems that deal with and expose geo-spatial data in a static way in the Web of Data. Then, we review some approaches that deal with and expose dynamic data sources as Linked Data.

In recent years several ontologies and languages have been proposed to model and query dataset related to geo-spatial knowledge and to extract information from these knowledge bases. The first attempts refer to Basic Geo Vocabulary and GeoOWL ontology. Basic Geo Vocabulary [9] is a simple RDF Schema vocabulary able to represent latitude, longitude and altitude information in the WGS84 reference system. The Basic Geo Vocabulary has then been extended with GeoRSS to include various geometric objects as points, lines, polygons and their associated feature descriptions [6]. A more structured and ontological representation of the GeoRSS vocabulary is available in the GeoOWL ontologyFootnote 13. Although these two projects were developed by W3C groups they never have become W3C recommendations (and they are not very used by the community).

GeoSPARQL [3] from the Open Geospatial Consortium (OGC) is a standard that has the aim to provide a way to represent and query geospatial data in the Semantic Web. GeoSPARQL addresses this task providing a small ontology to represent features and geometries and a number of SPARQL query predicates and functions. The ontology can be combined to other ontologies representing other domains, so enhancing the latter with spatial information. GeoSPARQL allows systems to infer topological information through a qualitative spatial reasoning, e.g., if a monument is inside a park, and the park is in a city, then the monument is in that city [3], as well as quantitative spatial reasoning (e.g., measuring distances). A plus of GeoSPARQL is the possibility to infer qualitative knowledge starting from quantitative ones using a single languages for both types of reasoning. GeoSPARQL standards are supported by the triple-store Parliament [4] to query spatial data via RDF properties, which is able to answer queries like “find all items located with a region X”. Parliament does not support Basic Geo Vocabulary, differently from OWLIM-SE (now GraphDB) triple store [1] which however supports only points for storage, thus allowing queries to find points within ad-hoc polygons and circles and to compute distances between points. GraphDB data types and queries are not compliant with GeoSPARQL [16].

Strabon [12] is a semantic spatiotemporal RDF store, that can be used to store linked geospatial data and to query them using an extension of SPARQL named stSPARQL. stSPARQL can be used to query data represented in an extension of RDF called stRDF that model geospatial data that changes over time (e.g., the growth of a city over the years). Strabon supports spatial datatypes enabling the serialization of geometric objects in OGC standards WKT and GML, as well as a subset of GeoSPARQL.

USeekMFootnote 14 is an extension library for semantic databases that adds efficient geospatial support. The module supports OpenGIS geometry types (such as Point, Line, Polygon) and functions (such as Within, Intersects, Overlaps, Crosses) as standardized in the OGC GeoSPARQL standard.

Among database engines, Virtuoso Universal Server [18] can handle 2-dimensional points expressed with WGS84 coordinates, as well as storage of geometric shapes (lines, polygons, etc.). In order to check if two geometries are related, Virtuoso uses some built-in predicates (e.g. ST_contains, ST_within, ST_intersect) and supports some geometric functions (e.g., ST_distance, ST_x, ST_y, ST_z). At the moment Virtuoso is not fully compliant with GeoSPARQL.

Oracle Spatial and Graph [14] supports, among others, RDF Semantic Graph data management and analysis, its applications ranging from semantic data integration to linked open data and network graphs used in transportation, utilities, energy and telcos. Oracle Spatial and Graph uses GeoSPARQL for representing and querying spatial data, even if it is not fully compliant with it.

A native RDF triple store implementation with spatial query functionality is described by Brodt et al. [5]. They model spatial features in RDF as typed complex literals and define spatial predicates as filter functions in SPARQL. However, their approach is optimized for storing and querying static RDF data with rare updates, as changes and updates in the location data can have an impact on their indexing and data processing.

Then, there are works that show how to expose dynamic data sources as Linked Data. Harth et al. [11] present an approach to expose data coming from information services as Linked Data to support their integration. Mapping is performed by using a tool to map RESTfull services to a reference ontology [20]. Although OSM is considered as one source, only queries based on bounding box have been supported. Thus this work does not proposes a general approach to expose OSM data as Linked Data.

Speiser et al. [19] and Norton et al. [13] present in their papers general approaches to expose data provided by services as Linked Data when invoked with a proper input, with [19] providing a more complete approach compared to [13]. Examples provided in the papers consider geospatial services like GeoNames [19] or OSM [13]. These general approaches are interesting but can hardly support the large variety of spatial queries over OSM that are supported by LOSM. In addition, vocabulary of the service is not mapped to widely adopted vocabularies as we did in LOSM.

5 Conclusion and Future Work

We presented LOSM, a service that acts as a SPARQL endpoint on top of Open Street Map data. Differently from LinkedGeoData, it does not work by using dumps of the OSM datasets but it queries directly the OSM database by means of a translation from SPARQL to overpass API calls. The implementation currently works on a subset of the SPARQL language plus the geographical query constructs from the Jena Spatial extension. We show how fresh and timely geographical data exposed via a SPARQL endpoint in combination with information coming from multilingual knowledge graphs can affect the search for information in a disaster recovery scenario. We are currently working to extend the expressiveness of the SPARQL sub-language supported by LOSM.