Introduction

Multiplex networks1 are useful representations of systems in which the same set of nodes may be connected by different types of relationships. Examples of systems that can be modeled as multiplex networks include social networks, transportation systems with multiple transportation modes or biological systems in which different types of interactions are accounted for1. In a multiplex network, nodes and links are grouped in layers according to their nature. Layers can be interdependent and they contain information which would be lost if we only considered the corresponding aggregated network. It has also been shown recently that different types of dynamics that are run on top of multilayer systems also provide new insights into the problems being modeled2,3,4.

As discussed in previous papers5 most available studies on urban transportation consider either one single transportation mode or many modes but all merged in one aggregated network. Thus, the introduction of the new framework of multiplex networks for the analysis of urban transportation systems might allow to better understand complex issues like how to accurately account for the interplay between different transport modes. However, even though few works started to use a multiplex representation to study failures6 and efficiency5,7 in transportation systems, they still represent isolated cases. For instance, there are very recent studies that rely on a complex notation to incorporate multiple modes8 or time9, or that simply aggregate the whole network losing information regarding transfer times10. Very up-to-date reviews where the term multilayer is either not present or used in a completely different way11,12 can also be found in the specialized literature.

The few previous studies on urban transportation systems as multiplex networks focus on addressing their multimodal nature, considering each layer as a transportation mode, to study their resilience6 or their coupling13. In this way, all the lines of the same mode (i.e. buses, metro and tram) are aggregated in a sort of superlayer. This representation is extremely compact –with only few layers– but it totally neglects transfer and waiting times between lines of the same mode that could eventually lead to a wrong estimation of shortest paths or travel times. Another solution is to consider each line of each mode as a single layer. While in this case we can preserve transfer times and synchronization between stops on different lines, it is not possible to quantify the importance of a transportation mode for the mobility in the system.

To reconcile both approaches, in this paper we propose an urban transportation model based on multiplex networks where both representations are used to extract different information from the system. We show that superlayers are fundamental to study the interdependency and resilience of the system while to have a realistic model of human mobility the single line per layer perspective should be adopted. We test our model using 9 different urban transportation networks raging from small cities of few hundred thousand inhabitants to megacities like London or Berlin. Finally, for a medium size city we will introduce detailed data about schedule and transfer times to create a more realistic dynamical scenario to test against real-world experimental data and facts. The remaining of the text is organized as follows. First, we give an overview of our multiplex representation in the Methods section and then use it to study the structure of 9 urban transportation networks in the first subsection of Results. Finally, in the second part of the Results section, we focus on one case study (city) for a deeper analysis. Specifically, we test if the model is able to reproduce experimental data and check for its possibilities regarding the study of service disruptions and network improvements.

Methods

We start considering each line of each mode of transport as a single layer. Each stop will be a node and there will be a weighted link between two nodes on a layer if the corresponding line passes through both of them, being the corresponding great-circle distance between them the weight. Although the same bus stop may be present in multiple layers, allowing transfers between them, this might not be the case between layers of different modes. To solve this problem, we will connect (with inter-layer links) each node of one mode to the closest one of each of the other modes as long as the distance between them is less than 100 m.

In the second part of the results section, however, we will follow a slightly different scheme to add more real features to the model. In this case, we will add a new layer which represents the land to introduce the possibility of moving through the city by walking. To create this layer we first took the population density grid of the European Union14, which is composed of roughly 100 × 100 m cells, and extracted those inside the area of interest. Then, we took those cells which had a population density greater than zero and set a node in the middle of them. Finally, we linked together nodes belonging to neighboring cells and added the corresponding weight (distance in meters). To connect this layer with the rest of the system we simply determined which cell each stop belongs to and established a link between that stop and the corresponding node of the land layer. Once this was done we added, again, the distance between stops as a weight to their links. This distance is simply the geographical distance in the case of tram and metro, but for bus stops it was calculated taking into account street patterns using the Google Maps API15. In this way, the second model is much more powerful, allowing us not only to check the validity of the conclusions extracted from the first analyses in a more realistic scenario but also to study more complex phenomena as service disruptions.

Results

Structure of urban transportation networks

In Table 1 we present the principal characteristics of the networks we are going to analyze. Data were obtained from a variety of sources, from each company’s website to city’s open data portals, and then arranged as described in the model discussed in Methods. As we can see they are very different in size and composition. For example, while Vitoria has a population of roughly 250,000 individuals and a small transportation network consisting of 302 nodes and 16 layers, London is one of the biggest cities in Europe with more than 8 · 106 inhabitants and a network consisting of 19,459 nodes and 555 layers.

Table 1 Principal characteristics of the networks under study, ordered by decreasing population29,30,31,32,33,34,35,36,37.

We start our analysis with one of the most basic measures of graph theory, the degree. In multiplex networks it can be defined in multiple ways. A straightforward approach is to consider the degree of node i as a vector ki of length M (the number of layers) where each element j represents the degree of node i in layer j. However, in this particular case this measure does not provide much information because, as each layer represents a single line, if a node is present in one layer it will have degree 2 in that layer (or 1 in special cases as the first/last stop of a line, although in our networks both stops are always the same one). What we can do is to examine the overlapping degree, oi, which is simply the sum of the elements of ki16 (Fig. 1).

Figure 1: Structural measures of the networks.
figure 1

Left: overlapping degree distribution of each network. Right: edge overlap distribution of each network. Despite these networks being quite different in composition and size they share some universal properties.

As we can see, although cities are very different from each other their overlapping degree distribution is quite similar: most of the nodes are only present in one layer (oi = 2), some are present in two layers (oi = 4) and only a few can be found in three or more layers. Given the nature of the system one could expect a low level of overlapping between layers. However, what is really interesting is the fact that the maximum overlapping is quite similar in all the networks, even though their number of layers (and thus their theoretical maximum overlapping degree) differs a lot. Similar results are found if we look at the edge overlap distribution, where the edge overlap, oij, is defined as the number of layers where a link between nodes i and j exists16. This kind of universal behavior can be understood if we take into consideration that these networks are embedded in city space and thus the real theoretical maxima for both the overlapping degree and the edge overlap are not given by the number of layers but by either physical constraints or citizens’ interest.

An interesting feature of multiplex networks is that the distribution of a quantity across the layers is, at least, as important as the overall value. For example, one node can have high overlapping degree either because it has a low value in all the layers or because it has a high value in just a few layers. However, in our model this does not apply due to the singularities of public transportation networks that we discussed before. Nevertheless, to get insights about the importance of a transportation mode over the others we can switch perspective and consider the superlayers representation, see Fig. 2. As in recent works6,7, we propose to group together lines belonging to the same transport mode ending up in most of the cases with three superlayers representing bus, tram and metro lines respectively. Except for this modification the other elements of the model remain untouched with interlayer links connecting nearby stops of different modes.

Figure 2: Superlayer representation of the Madrid transportation system.
figure 2

The figure represents the three transportation modes considered: tram (yellow nodes, upper layer), metro (purple nodes, mid layer) and buses (white nodes, bottom layer). See Table 1 for statistics of these layers.

To start this new analysis let us denote by Cb, Ct and Cm the subsets of layers corresponding to bus lines, tram lines and metro lines respectively. Now we can redefine the overlapping degree as where with x = {b, t, m} and representing the degree of node i in layer α. Then, instead of considering the activity distribution across layers17, , we can study the activity distribution across superlayers, , which we represent versus the overlapping degree in Fig. 3.

Figure 3
figure 3

Superlayer activity versus overlapping degree for each network.

At first one may think that nodes with the highest overlapping degree would also be those with the highest superlayer activity, but as we can see this is not the case. In fact, stops belonging to just one superlayer are the ones which tend to have the maximum overlapping degree. Those nodes despite being present in only one superlayer, surely have a major role for the mobility of the system. But, on the other hand, if we think in a disruption of the system it will be much easier to, for example, move temporarily a bus stop to a street nearby, even if it has a lot of lines, than cope with a disruption in a metro or tram stop. Thus, there is not a clear answer to the question of which node is the most important in these networks by just looking at the structure, as it depends on what we consider important.

To end this structural study we will focus on another measure used in multiplex networks analyses: interdependence18. Interdependence of node i, λi, is defined as the sum over every other node j of the fraction of shortest paths between node i and j which go through two or more layers over the total number of shortest paths between them. Hence, if λi is close to 0 it means that most of the paths go through just one layer while if it is close to 1 most of the paths go through 2 or more layers. The network interdependence is obtained as the average over all nodes. However, even though this measure is quite interesting as it provides some information which can not be obtained using the aggregated network alone, in our system it is not necessary because we already know that layers are interdependent. Indeed, if we look at Table 1 we can see that most of the networks have a number of nodes of the order of 1000, but a single bus line usually has around 50 stops, 100 at most, and lines of tram and metro have even less. This means that to reach every node of the network we will surely have to cross multiple layers regardless of our starting position. One possible solution would be to work with superlayers instead of single layers, as they would be denser. But then we would face a similar problem as the bus superlayer is much bigger than the other two.

Therefore, we slightly modify this metric to take into account the specific nature of transportation networks. Let us denote by ψij the number of shortest paths between nodes i and j which go across two or more layers and by ψij(x) the number of shortest paths between nodes i and j which go across two or more layers with at least one of them belonging to superlayer x. Then, we can define the interdependency of superlayer x as:

that tells us how many shortest paths which go across two or more layers in the system have, at least, one link in superlayer x. Note that we normalize only over those shortest paths which use two or more layers because we are interested in figuring out which modes are used given the need to change to another line and not the need for change itself.

Results of this measure are shown in Fig. 4a. Firstly, we note that almost all the shortest paths under consideration have at least one link in the bus superlayer, but this is quite logical given that the bus superlayer is much bigger than the others. Thus, most of the shortest paths will start or end in this superlayer. However, a closer look reveals an interesting result. Take the case of Madrid, for example. Even though its metro superlayer has only 16 layers and 241 nodes, while its bus superlayer has 177 layers and 4590 nodes, more than 40% of the shortest paths have at least one link in it. Similar results are found in the rest of networks as, for example, in Zaragoza where 20% of the shortest paths make use of the tram which has only 1 line and 50 nodes, with its bus superlayer being composed by 35 layers and 902 nodes.

Figure 4: Superlayer interdependency of the networks.
figure 4

Left: Superlayer interpendency as defined in (1). Right: Superlayer interdependency divided by nx which is the sum of the nodes inside superlayer x over the sum of all the nodes in the system.

As it can be seen, to fully understand these results it is necessary to take into account the size of the superlayers, the problem is how to define it. In this case, as we are exploring paths from node to node, we will consider the fraction of all the nodes in each superlayer as a measure of size. This way, if nx is the fraction of nodes in superlayer x, we will divide by nx to obtain the desired result. This procedure has it drawbacks as it is not upper bounded, but on the other hand it allows us to extract information on the importance of each layer in an easier way.

From this modified measure (see Fig. 4b), we observe that the tram and the metro modes are of utmost importance for the mobility of these systems, as they are part of much more shortest paths than it would correspond judging by their size. Note that we have not taken into account that they may have, on average, higher speed or greater carrying capacity than buses, and hence, the previous results are obtained only considering a topological point of view. The reason seems clear: metro and tram are usually used to connect distant points with straighter routes than bus lines. Another interesting conclusion that is a consequence of the previous one is that urban transportation is quite multimodal, although most of the networks examined rely mainly on two transportation modes. A good example of this is Madrid’s network where bus nodes cover most of the city and metro nodes connect distant locations with straighter paths throughout it. However, only one of the three tram lines overlaps with bus nodes, the others seem to go to locations that are not covered by bus. Thus, the tram mode is not used as a way to reach certain locations faster but to just connect distant locations at the periphery of the city.

Case study: towards a data-driven modeling of an urban transportation system

In this section, we study a detailed model for urban transportation that includes not only the structure of the networks but also data about frequencies and traveling speeds. Our aim here is to realistically mimic a scenario that allows studying the efficiency of an urban transportation system and its response to malfunctioning or improvements. To do so, we also include the land layer, as discussed in Methods, allowing passengers to get to the stop which might be across the street even if there is not a direct link between different lines. In this way, we will be able to simulate paths starting/ending at any point of the city and not only at lines stops.

We take the transportation network of the city of Zaragoza (Saragossa) in Spain as our case study, since we have data regarding average speed and frequency for each line19,20. This allows us to consider time, so that now the shortest path will be the one which takes less time, instead of the geographical distance. To this end, we divide the weight of the links by the average speed on their layer. Thus, the weight of a link will now represent time. Moreover, all weights are fixed throughout the simulation, as we consider that the speed is always the same, except for links connecting land nodes and stops. In the latter case, the weight of these ones will be the time at which the next vehicle will arrive minus the current time: therefore, given a certain path, the sum of the weight of the links used that belong to the land layer, any other layer or to the inter-layer links set will give the total time spent walking, in a vehicle or waiting, respectively.

Even more, due to the recent construction, in 2011, of the tramway, a survey was carried out to assess the impact of this new transportation mode in the mobility of the city21, see Table 2. Thus, we are able to test whether our model can provide similar (qualitative) results to passengers’ experience, namely: (i) mobility hinges on the tram, (ii) a lot of transfers are needed, (iii) if there is a disruption in the tram line the whole system is affected, (iv) bus stops are far from their destination and (v) bus frequencies are low. At the same time, although approximately, the previous effects can be quantified.

Table 2 Survey methodology of the survey performed by the regional taxi association to assess the impact of the new tram line on the urban mobility.

As we do not have data regarding passengers flux we do not take into account the carrying capacity of the vehicles and thus we consider a free flow regime as well as two different scenarios: movement from any point of the city to the city center (coordinates 41.652, −0.881) and vice versa and from any two points located at least 2 km away from each other, which has been reported as the minimum distance a passenger has to go to consider using public transportation22.

In Fig. 5 we show an overview of the results obtained considering 1.000 individuals per minute with random origin and destination between 08:30 am and 10:00 am with a walking speed 5 km/h23. In Fig. 5(a) the distribution of the number of transfers is shown, note that only 35% of the individuals have reached their destination without transfers, which agrees with (ii). In Fig. 5(b) we represent the distributions of the waiting times, total time and distance covered by walking. These results agree with (iv), as with a walking speed of 5 km/h 1 km is equivalent to 12 minutes and thus it represents approximately one third of the total time, but not with (v). Now, suppose that an individual needs to do one transfer to get to his location. If one of the two frequencies is too low, it might be faster to walk to get to the second line instead of getting there using another line or equivalently walking from the transfer point to his destination instead of waiting for the second vehicle. He would see two problems, he needs to walk a lot and the frequencies are low, but as he has not been waiting it will not be reflected on our model.

Figure 5
figure 5

Overview of the mobility of the system with random origins and destinations: (a) number of transfers made by individuals to reach their destination; (b) time each individual has been waiting in any stop (left), distance covered by walking (center) and total time of the trip (right); (c) fraction of individuals who have used each line. Note that it is normalized over the number of trips thus, as a passenger may use more than one line, the total sum is not 1. The parameter δ denotes a penalization in the walking speed in order to force the use of transports.

To avoid this problem we introduced the parameter δ to modulate the walking speed vw such as vw = (1 − δ) · 5 km/h. By tuning its value we can prevent individuals to avoid transfers but, at the same time, we still allow them to do so if the distance is small enough. As we can see in Fig. 5(a) with δ = 0.5 the number of transfers increases greatly, which means that the system tended to avoid one transfer. If we look at time we see that, as expected, waiting time increases and the distance covered by walking reduces. The total time has been readjusted taking into account the decrease in walking speed so that if it varies is only because of the new transfer and, surprisingly, it almost does not change. This means that we have a duality in the system as we can choose between walking or waiting while keeping the total time constant, which indeed agrees with (iv) and (v). Note also that, although the waiting time distributions may not seem quite high if we compare them to the total time distributions, several studies have shown that waiting time perception is usually overestimated, specially if the real time has been quite low24,25,26. Besides, this increase in waiting time is in closer agreement with previous studies5. However, as they only took into account the walking time associated to transfers between modes, they did not observe this duality.

Finally, picture Fig. 5(c) shows the fraction of individuals which make use of each line normalized over the total number of trips. In both scenarios almost half of the trips use the tram line which completely agrees with (i). Even more, it also agrees with what we saw when we studied the structure of the network in the previous subsection (Fig. 4b).

The last item left to check is (iii). For simplicity, we will focus only on trips to the city center (Fig. 6). On the top left panel we show the average time it takes to get to the city center between 08:30 and 09:00 am. Now, our model allows us to easily test the behavior of the system during service disruptions by just removing the affected nodes or layers. In the top right panel of Fig. 6 we show the increase on the total time if we remove the tram line completely, result that agrees with (iii). Although the tram line is quite vulnerable as a disruption on a single node may cause the whole line to fail (as it follows a fixed path), it is important to note that this line has a couple of links which allows it to run in loops and thus, in this particular scenario, only the northern or the southern parts would be affected and not the whole line. Nevertheless, if we look at the bottom left panel of Fig. 6 we see that the situation is completely different if we remove the two most used bus lines in our model to get to the city center (lines 22 and 35). As the bus mode is more redundant, the effect is much lower, at least under free flow conditions. Besides, a disruption on a single bus node does not cause so many problems on the whole line as it can be easily moved to a location nearby. Thus, we can say that the bus network is much more resilient while tram lines speed up the complete network.

Figure 6: Service disruptions and network improvements.
figure 6

Top left: average time it takes to get to the city center between 08:30 and 09:00. Top right: difference with the original case when the tram is removed. Bottom left: difference when the 2 most used lines to get to the city center are removed (lines 22 and 35). Bottom right: difference if we add a new tram line. This pictures have being done using tiles from OpenStreetMap. The cartography in the openstreetmap map tiles is licensed under cc by-sa (www.openstreetmap.org/copyright). The license terms can be found on the following link: http://creativecommons.org/licenses/by-sa/2.0/ (Accesed: 1st june 2015). © OpenStreetMap contributors.

To conclude, our model can also be used to easily test network improvements as the addition of new lines. On the bottom right panel of Fig. 6 we show the differences on the average time to get to the city center if we add a new tram line from east to west as it has been recently proposed by the city council27. As we can see the west part of the city would be the most benefited by this addition as the decrease in time is higher than in the rest of the city, even more taking into account that, as shown in the top left panel of Fig. 6, that part of the city was further away from the city center. Note that we have not removed any bus line and thus this result shows us again how tram and metro lines naturally speed up the network.

Discussion

In this paper, we have proposed a method to model public transportation systems as multiplex networks, which allows to either get more insights into their network properties or extract new conclusions of practical value. We have analyzed the structure of 9 urban transportation networks and found some regularities among the systems studied that can be related to the underlying structure of the cities. We have also shown that both a per line and a per transportation mode representations are useful and complementary to extract information about the functioning of transportation systems and to assess their vulnerabilities. Finally, using detailed data about service schedule and waiting times we created a realistic model for urban mobility. Despite its relative simplicity, we showed that our model not only reproduces real world facts, but that it can also be used to explore important issues like the impact of service disruptions and ways for network improvements, using information which, maybe with the exception of the average speed of each line, should be publicly available for most major cities. Needless to say, a deeper analysis would require to include the street network as the land layer and some information that might be harder to find -such as traffic light or mobility patterns- or specialized software like MATSim28, but that is beyond the scope of this study. Concluding, the proposed model can be used for a first diagnosis of the state of any urban transportation network using publicly available information and few computational resources.

Additional Information

How to cite this article: Aleta, A. et al. A Multilayer perspective for the analysis of urban transportation systems. Sci. Rep. 7, 44359; doi: 10.1038/srep44359 (2017).

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.