1 Introduction

Hubs are facilities that reduce the number of connections in a network. Instead of a direct link between pairs of Origin/Destination (O/D) nodes, each node connects to a hub and, through it, with indirect and fewer connections, connects to other nodes.

Hub-and-spoke design problem, also called hub location problem (HLP), seeks to find a set of nodes named hub to minimize the defined cost in the problem in a way that there is at least one route between each pair of O/D nodes. Three main characteristics are attributed to the classic HLP. First, a complete architecture is assumed for a hub network, i.e., every two hubs are connected via a link. Second, economies of scale on transportation costs, between hubs and between O/D nodes and hubs, are made possible by the consolidation of flows. Third, direct connection between O/D pairs without the use of hub(s) is not allowed.

The telecommunication networks, airlines, and cargo delivery systems are some well-known instances for the applications of the HLP (Nasiri et al. 2018b; Wasner & Zäpfel 2004). This research mainly considers a freight transport company in Turkey that cares about sustainability. A typical freight transport system is composed of area and central offices. Area offices receive and deliver shipments from/to customers directly, and central offices receive and deliver shipments from/to area offices or send shipments to another central office. Although more than one area office can exist in a town, it may have no central office. Therefore, each area office should be allocated to a central office.

As a result of the above description about a freight transport system, freight transport networks are very similar to hub networks (Arbabi et al. 2021). In this way, area and central offices in freight transport networks correspond to demand nodes and hubs, respectively. Moreover, economies of scale are obtained due to the consolidation of shipments that are supposed to be transported between central offices. Consequently, the freight transport problem can be recognized as a HLP (Alumur et al. 2012b; Dukkanci and Kara 2017).

Road freight transport has been a rapidly growing contributor to carbon dioxide (CO2) emissions (European_Commission 2015; Ahmad et al. 2022) and requires effective policy towards achieving zero carbon (Kannan et al. 2022; Karthick and Uthayakumar 2022). Moreover, social responsibility is a major concern of the companies, and they take the social issues into account in their decision-making (Govindan 2022a; Zarbakhshnia et al. 2022). However, classic HLPs traditionally focus on minimizing the total transportation cost.

Consider a freight shipping (or parcel delivery) company that uses ground and air transportation modes in a hierarchical structure to transport the customer’s demand from an origin to a destination. The model presented in this paper can help the manager of this company to minimize the current investment costs. In addition, pursuing sustainable development, the company intends to create at least a certain number of jobs and to ensure its produced emissions will not exceed a predetermined level. Also, to improve the level of service, the maximum travel time in its network should be minimized (as an index of responsiveness). Note that opening the hubs creates fixed job opportunities such as managerial positions, which do not depend on the transportation flow in a particular hub. In addition, by opening the hubs, variable job opportunities are created that depend on the amount of incoming flow to the hub (such as workers operating on the incoming flows). In this study, it is assumed that only one shipping company is responsible for the freight. Also, for each shipment, from origin to destination, only one shipment contract is concluded. For this reason, the network under consideration is multi-modal.

If a freight company wants to increase its level of service and to improve its social and environmental sustainability aspects while paying attention to the minimization of economic costs of network design, the decisions resulting from separately solving each of these problems will not necessarily be the optimal decisions for the company. The objectives of these two problems are in conflict with each other, so the best decisions that can be made for each problem are not necessarily desirable or even feasible for the other problem. In this research, the hierarchical HLP and sustainable HLP are considered as an integrated problem. A freight company that only seeks to minimize transportation costs and fixed network design costs should design a network that selects facilities with lower fixed costs. In addition, the routes should be selected in such a way that reduces transportation costs. But if the company pays attention to sustainability aspects in addition to the economic costs, it should design the network so that the travel time is reduced for the most distant demands, routes are selected that reduce emissions, and hubs are selected that increase employment levels. Moreover, in order to increase the level of employment, more flows should enter the hubs. These are not necessarily choices that will minimize economic costs. Therefore, in the present study, the aim is to provide decisions to the company manager in which the above objectives are met simultaneously.

The remainder of this paper is organized as follows. Section 2 gives the literature review, and Sect. 3 presents a mathematical model for Sustainable Hierarchical Multi-modal Hub Network Design Problem (SHMHNDP). In Sect. 4, an alternative formulation for the problem is presented. In Sect. 5, multi-objective optimization methods are proposed to solve the problem. Section 6 presents the numerical experiments. Finally, conclusions and future research are discussed in Sect. 7.

2 Literature review

The concept of hub was first developed by Goldman, (1969), which was an extension of Hakimi (1964)’s switching centers’ location problem. The first formulation of HLP was developed as a quadratic model by O'kelly (1987). Moreover, the linear model of HLP was presented by (Campbell 1994, 1996). (Aykin 1994, 1995a, 1995b) and (Klincewicz 1991, 1992) proposed various extensions of HLP. The application of the hub location idea in telecommunication and air transportation was examined by Bryan and O'kelly (1999). A survey on hub location models, classification, solution methods, and applications were carried out by Farahani et al., (2013).

A common assumption in many HLPs is that only one transportation mode exists between demand nodes and hubs and between hubs. A limited number of studies relaxed this assumption and took multiple transportation modes into account (Alumur et al. 2012a, 2012b; Ambrosino and Sciomachen 2016; Karimi et al. 2018; Maiyar and Thakkar 2018).

In recent years, considering sustainability in any new design and innovation is converted into an important issue (Govindan 2022a, b; Sharma et al. 2022; Kannan 2021). In particular, several studies take sustainability into account in HLPs. Mohammadi et al. (2014) presented a hub covering location problem considering environmental objectives. The objectives include minimizing greenhouse gas emissions and noise pollution. Moreover, they investigated the importance of considering vehicle speed in a sustainable HLP. Zhalechian et al. (2017b) presented a mathematical model for a multi-modal HLP considering the following objectives: minimization of total transportation and investment costs, minimization of the maximum transportation time between nodes, and minimization of noise pollution costs. To optimize the social and environmental impacts and costs of a biofuel supply chain, (Roni et al. 2017) developed a mixed-integer linear mathematical model to minimize the hub location cost and the unmet demand penalty cost. In their study, environmental and social issues have been taken into account, and a case of the Midwest region of United States was examined. Xu et al. (2018) dealt with designing an intermodal transportation network, as a special case of HLP, associated with the port competition while considering environmental concerns of stakeholders. As can be seen from researches mentioned above, although considering sustainability in multi-modal networks is more practical, it is not yet widely studied.

Moreover, in the traditional HLP, only two layers exist in the network: one among hubs and the other between hubs and demand nodes. However, real-world networks are more complex and involve more than two levels. A hierarchical hub structure is, hence, a more practical network. Yaman (2009) proposed a model for the single assignment hierarchical hub location in a three-level network. In this model, regarding delivery time constraints, the objective was to minimize the costs of transportation and hub establishment. Lin (2010) proposed a model for the design of a hierarchical hub network for a dual express system with the aim of minimizing fixed costs and transportation costs. The novelty of this model was to consider time constraints for demand pairs.

Later on, Alumur et al. (2012b) developed a hierarchical multi-modal hub location model with time-definite deliveries in a star network. The objectives in this model were the minimization of transportation costs and fixed costs of the links. The novelty was to consider a defined time for departing vehicles from the hubs. Yaman and Elloumi (2012) introduced a star p-hub center problem with the aim of minimizing the maximum path in the network and a star p-hub median problem with the aim of minimizing routing costs in a star/star network with a central hub. Davari et al. (2013) studied an incomplete hub covering location problem with the aim of maximizing the credibility of satisfying each flow in the network in less than a predefined time. Karimi et al. (2014) studied a capacitated single allocation hierarchical hub median location problem with the aim of minimizing total routing costs. Rajabi and Avakh Darestani (2015), regarding dispersion criterion for central hubs in a three-level network, attempted to locate hierarchical hubs in order to minimize transportation costs. Dukkanci and Kara (2017) studied routing and scheduling issues in the hierarchical HLP. Zhong et al. (2018) proposed a hybrid method for using hierarchical hub locations with the aim of integrating urban and rural public transportation.

In the context of hub center location problems, which are related to the present study, we refer to the research by Campbell (1994) and Kara and Tansel (2000). In these studies, minimization of service or cost measures is considered. These measures include the maximum travel time or flow cost between all demand nodes or arcs in the network and the maximum travel time or flow cost associated with an access arc. Campbell et al. (2007) and Ernst et al. (2009) also played an important role in developing mathematical models for this problem. Table 1 shows the studies which considered sustainability aspects or hierarchical network for HLP.

Table 1 Comparison of some related hub location researches

Recent works on hierarchical multi-modal HLP include the following: Shang et al. (2020) proposed a memetic algorithm and a Monte Carlo simulation to solve a stochastic hierarchical multi-modal HLP in a ring-star-star network. In addition, Ma et al. (2020) proposed a mixed-integer programming model for hierarchical multi-modal HLP considering time restrictions for a China railway express network. Moreover, Korani et al. (2020) developed mathematical models and a Lagrangian relaxation solution method for a reliable hierarchical multi-modal HLP. Later on, Shang et al. (2021) proposed two heuristic solution methods based on variable neighborhood search and multi-objective genetic algorithm for a bi-objective hierarchical multi-modal HLP. The objectives of the problem were minimization of overall system-wide costs and the maximum delivery time.

To the best of our knowledge, sustainability aspects have not been considered in hierarchical hub location models. The main contribution of this study, which distinguished our efforts from related studies, is designing a hierarchical multi-modal hub network considering economic, environmental, and social aspects. Given that the present model is multi-objective, the modelling approaches to multi-objective HLPs and the proposed solution methods for these problems are listed in Table 2. According to Table 2, researchers have used various methods, including exact, heuristic, metaheuristic, and hybrid methods, to solve the multi-objective HLP. In this research, to solve the present problem, we use three exact solution methods (AUGMECON2, TH, and AWTP) and two metaheuristic solution methods (NSGA-II and NRGA).

Table 2 Solution methods for multi-objective HLPs

3 Mathematical modelling

In this section, a mathematical formulation for SHMHNDP is proposed. In this model, two transportation modes are studied: truck and airplane. The customers are served by a set of trucks and airplanes from \(P\) opened ground or airport hubs. There is an airport as a central hub, and other airport hubs are directly connected to the central hub. The transportation at the links between airport hubs and the central hub is done by airplane, and transport at other links is performed using the trucks. An instance of the hierarchical network is shown in Fig. 1.

Fig. 1
figure 1

An instance of the network of the considered problem

In Fig. 1, there is a central hub (circle), three airport hubs (triangles) that are directly connected to the central hub, seven ground hubs (squares), and the other demand nodes allocated to the hubs. The network studied in this research is the same network used by Alumur et al. (2012b), with the difference that in this network, if two ground hubs are assigned to an airport hub, the two hubs will be connected to each other. While in the network used by Alumur et al. (2012b), this is not necessarily the case (i.e., two ground hubs can be connected or not).

3.1 Assumptions

In the proposed formulation for SHMHNDP, the following assumptions are considered:

  • The number of hubs is predetermined.

  • The location of the central hub is specified.

  • Two transportation modes (airplane and truck) are considered.

  • There are direct links between the central hub and the other airport hubs.

  • Each demand node is allocated to exactly one hub (single allocation).

  • There is no direct link between the airport hubs.

  • The flow quantity from the origin to the destination is determined.

  • All of the parameters are considered to be deterministic.

  • The capacity of the hubs is assumed unlimited.

4 Notations

In this subsection, sets, parameters, and variables used in the mathematical modelling are presented. The order of presentation is as follows: sets used in hub network design (including demand nodes, candidate nodes for hubs, and central hub), parameters (except parameters related to the calculation of emissions), decision variables, variables used in the linearization of nonlinear terms of the model, and parameters used in the calculation of emissions.


Sets

\(N\)

Demand nodes

\(H\subseteq N\)

Candidate nodes for hubs

\(A\subseteq H\)

Candidate nodes for airport hubs

\(o\in A\)

Central hub


Parameters

\(P\)

The number of hubs (central and non-central)

\({w}_{ij}\)

Demand flow from node \(i\in N\) to node \(j\in N\)

\({t}_{ij}\)

Travel time by trucks between nodes

\({t}_{ij}^{P}\)

Travel time by airplanes between nodes

\(\alpha\)

Discount factor for travel time using trucks on inter-hub links

\({c}_{ij}\)

Unit cost of routing from node \(i\in N\) to node \(j\in N\) if one of nodes is hub and another is non-hub

\({c}_{jo}^{P}\)

Routing cost from node \(j\in A\backslash \left\{o\right\}\) to the central hub

\({c}_{oj}^{P}\)

Routing cost from the central hub to node \(j\in A\backslash \left\{o\right\}\)

\({c}_{jk}^{T}\)

Routing cost (for trucks) from node \(j\in H\) to node \(k\in H\) if both nodes are hubs

\({FE}_{ij}\)

Fixed cost of establishing a link between node \(i\in N\) and node \(j\in H\)

\({FE}_{jk}^{T}\)

Fixed cost of establishing a truck link between node \(j\in H\) and \(k\in H\backslash \left\{j\right\}\)

\({FE}_{j}^{P}\)

Fixed cost of establishing a link between node \(j\in A\backslash \left\{o\right\}\) and the central hub

\({FJG}_{j}\)

Fixed job opportunities created by the establishment of ground hub \(j\in H\)

\({FJA}_{l}\)

Fixed job opportunities created by the establishment of airport hub \(l\in A\)

\({UJG}_{j}\)

Unit job opportunities (related to flow) in ground hub \(j\in H\)

\({UJA}_{l}\)

Unit job opportunities (related to flow) in airport hub \(l\in A\)

\({OF}_{i}=\sum_{s\in N}{w}_{is}\)

Total flow originated from node \(i\in N\)

\({DF}_{i}=\sum_{s\in N}{w}_{si}\)

Total flow delivered in node \(i\in N\)

\(Emissions\)

The maximum allowable amount of emissions

\(Jobs\)

The number of job opportunities to be created


Decision variables

\({z}_{ij}\)

Binary variable for allocation of node \(i\in N\) to hub \(j\in H\)

\({h}_{jl}\)

Binary variable for allocation of hub \(j\in H\) to airport hub \(l\in A\)

\({r}_{jk}^{l}\)

Binary variable that is equal to 1 if \(j\in H\) and \(k\in H\backslash \left\{j\right\}\) are hubs and both are allocated to the same airport hub \(l\in A\)

\({Y}_{jk}^{i}\)

Flow amount, initiated from node \(i\in N\) and passes node \(j\in H\) and \(k\in H\backslash \left\{j\right\}\) by trucks

\({X}_{jo}^{i}\)

Flow amount, initiated from node \(i\in N\) and passes node \(j\in A\backslash \left\{o\right\}\) toward the central hub

\({X}_{oj}^{i}\)

Flow amount, initiated from node \(i\in N\) and passes the central hub toward node \(j\in A\backslash \left\{o\right\}\)

\(\phi\)

Variable that denotes the maximum travel time in the network


Variables for the linearization

\({\tau }_{ijl}\)

Binary variable that is equal to 1 if node \(i\in N\) is allocated to hub \(j\in H\) and hub \(j\in H\) is allocated to hub \(l\in A\)

Parameters used for modelling the emission of trucks

\(M\)

Truck mass (summation of carried and empty load) (\(\mathrm{kg}\))

\(g\)

Constant of gravity (9.81 \(\mathrm{m}/{\mathrm{s}}^{2}\))

\(a\)

Acceleration of truck (\(\mathrm{m}/{\mathrm{s}}^{2}\))

\({v}_{ij}\)

Velocity of truck in arc \((i,j)\) (\(\mathrm{m}/\mathrm{s}\))

\({\theta }_{ij}\)

Angle of road in arc \((i,j)\) (degree)

\(\rho\)

Air density (\(\mathrm{kg}/{\mathrm{m}}^{3}\))

\(AR\)

Frontal surface area of the truck (\({\mathrm{m}}^{2}\))

\({C}_{d}\)

Factor of the drag resistance

\({C}_{r}\)

Factor of the rolling resistance

\(LW\)

Weight of load carried by each truck (\(\mathrm{kg}\))

\(EW\)

Weight of empty truck (\(\mathrm{kg}\))

\({d}_{ij}\)

Distance between nodes \(i\) and \(j\) (\(\mathrm{m}\))

\(\beta\)

Particular fixed value of the truck

\({\gamma }_{ij}\)

Particular fixed value of arc \((i,j)\)

The particular fixed value of arc \((i,j)\) and the particular fixed value of the truck are calculated as Eqs. (1) and (2), respectively:

$${\gamma }_{ij}=a+gsin{\theta }_{ij}+g{C}_{r}cos{\theta }_{ij}$$
(1)
$$\beta =0.5{C}_{d}AR\rho$$
(2)

Finally, the amount of emission produced by trucks can be calculated to be used as the left-hand side of constraint (25) of Sect. 3.3.

4.1 Mathematical model

Here, a mathematical formulation is suggested for SHMHNDP.

$$Min\sum_{i\in N}(\sum_{j\in H}({c}_{ij}{OF}_{i}+{c}_{ji}{DF}_{i}){z}_{ij}+\sum_{j\in H}\sum_{k\in H\backslash \left\{j\right\}}{c}_{jk}^{T}{Y}_{jk}^{i}+\sum_{j\in A\backslash \left\{o\right\}}({c}_{jo}^{P}{X}_{jo}^{i}+{c}_{oj}^{P}{X}_{oj}^{i}))+\sum_{i\in N}\sum_{j\in H}{FE}_{ij}{z}_{ij}+\sum_{l\in A}\sum_{j\in H}\sum_{k\in H\backslash \left\{j\right\}}{FE}_{jk}^{T}{r}_{jk}^{l}+\sum_{j\in A\backslash \left\{o\right\}}{FE}_{j}^{P}{h}_{jj}$$
(3)
$$Min\phi$$
(4)

subject to:

$$\sum_{j\in H}{z}_{ij}=1 \quad\quad \forall i\in N$$
(5)
$${z}_{ij}\le {z}_{jj} \quad\quad \forall i\in N,j\in H$$
(6)
$$\sum_{l\in A}{h}_{jl}={z}_{jj} \quad\quad \forall j\in H$$
(7)
$${h}_{jl}\le {h}_{ll} \quad\quad \forall j\in H,l\in A$$
(8)
$$\sum_{j\in H}{z}_{jj}=P$$
(9)
$${h}_{oo}=1 \quad\quad \forall o\in A$$
(10)
$${r}_{jl}^{l}={h}_{jl} \quad\quad \forall j\in H,l\in A\backslash \left\{j\right\}$$
(11)
$${r}_{lj}^{l}={h}_{jl} \quad\quad \forall j\in H,l\in A\backslash \left\{j\right\}$$
(12)
$${r}_{jk}^{l}\le {h}_{jl} \quad\quad \forall j\in H,k\in H\backslash \left\{j\right\},l\in A$$
(13)
$${r}_{jk}^{l}\le {h}_{kl} \quad\quad \forall j\in H,k\in H\backslash \left\{j\right\},l\in A$$
(14)
$${r}_{jk}^{l}\ge {h}_{jl}+{h}_{kl}-{h}_{ll} \quad\quad \forall j\in H,k\in H\backslash \left\{j\right\},l\in A$$
(15)
$$\sum_{k\in H\backslash \left\{j\right\}}{Y}_{jk}^{i}-\sum_{k\in H\backslash \left\{j\right\}}{Y}_{kj}^{i}=\sum_{s\in N}{w}_{is}\left({z}_{ij}-{z}_{sj}\right) \quad\quad \forall i\in N,j\in H\backslash A$$
(16)
$$\sum_{k\in H\backslash \left\{j\right\}}{Y}_{jk}^{i}+{X}_{jo}^{i}-{X}_{oj}^{i}-\sum_{k\in H\backslash \left\{j\right\}}{Y}_{kj}^{i}=\sum_{s\in N}{w}_{is}\left({z}_{ij}-{z}_{sj}\right) \quad\quad \forall i\in N,j\in A\backslash \left\{o\right\}$$
(17)
$$\sum_{k\in H\backslash \left\{o\right\}}{Y}_{ok}^{i}+\sum_{j\in A\backslash \left\{o\right\}}{X}_{oj}^{i}-\sum_{j\in A\backslash \left\{o\right\}}{X}_{jo}^{i}-\sum_{k\in H\backslash \left\{o\right\}}{Y}_{ko}^{i}=\sum_{s\in N}{w}_{is}\left({z}_{io}-{z}_{so}\right) \quad\quad \forall i\in N$$
(18)
$${Y}_{jk}^{i}\le \sum_{s\in N}\sum_{l\in A}{w}_{is}{r}_{jk}^{l} \quad\quad \forall i\in N,j\in H,k\in H\backslash \left\{j\right\}$$
(19)
$${X}_{jo}^{i}+{X}_{oj}^{i}\le \sum_{s\in N}{w}_{is}{h}_{jj} \quad\quad \forall i\in N,j\in A\backslash \left\{0\right\}$$
(20)
$$\phi \ge {t}_{ij}{z}_{ij}+{t}_{js}{z}_{sj} \quad\quad \forall i\in N,s\in N\backslash \left\{i\right\},j\in H$$
(21)
$$\phi \ge \left({t}_{ij}+\alpha {t}_{jl}\right){z}_{ij}{h}_{jl}+\left(\alpha {t}_{lk}+{t}_{ks}\right){z}_{sk}{h}_{kl} \quad\quad \forall i\in N,s\in N\backslash \left\{i\right\}, j\in H,k\in H\backslash \left\{j\right\},l\in A\backslash \left\{o\right\}$$
(22)
$$\phi \ge {t}_{ij}{z}_{ij}+\alpha {t}_{jk}{r}_{jk}^{l}+{t}_{ks}{z}_{sk} \quad\quad \forall i\in N,s\in N\backslash \left\{i\right\}, j\in H,k\in H\backslash \left\{j\right\},l\in A$$
(23)
$$\phi \ge \left({t}_{ij}+\alpha {t}_{jl}+{t}_{lo}^{P}\right){z}_{ij}{h}_{jl}+\left({t}_{ou}^{P}+\alpha {t}_{uk}+{t}_{ks}\right){z}_{sk}{h}_{ku} \quad\quad \forall i\in N,s\in N\backslash \left\{i\right\}, j\in H,k\in H\backslash \left\{j\right\},l\in A\backslash \left\{o\right\}, u\in A\backslash \left\{l\right\},\left\{o\right\}$$
(24)
$$\left(EW+LW\right)\left(\sum_{i\in N}\sum_{k\in H}\sum_{k\in H\backslash \left\{j\right\}}{\gamma }_{jk}{Y}_{jk}^{i}{d}_{jk}+\sum_{i\in N}\sum_{j\in H}\left(O{F}_{i}+D{F}_{i}\right){d}_{ij}{z}_{ij}\right)+\beta \left(\sum_{i\in N}\sum_{j\in H}{v}_{ij}^{2}{d}_{ij}{z}_{ij}+\sum_{i\in N}\sum_{j\in H}\sum_{k\in H\backslash \left\{j\right\}}\sum_{l\in A}{v}_{jk}^{2}{d}_{jk}{r}_{jk}^{l}\right)\le Emissions$$
(25)
$$\sum_{j\in H\backslash A}FJ{G}_{j}{z}_{jj}+\sum_{l\in A}FJ{A}_{l}{h}_{ll}+\sum_{i\in N}\sum_{j\in H\backslash \left\{A\right\}}UJG({OF}_{i}+{DF}_{i}){z}_{ij}+\sum_{i\in N}\sum_{j\in A}UJA({OF}_{i}+{DF}_{i}){z}_{ij}\ge Jobs$$
(26)
$${z}_{ij}\in \left\{\mathrm{0,1}\right\} \quad\quad \forall i\in N,j\in H$$
(27)
$${h}_{jl}\in \left\{\mathrm{0,1}\right\} \quad\quad \forall j\in H,l\in A$$
(28)
$${r}_{jk}^{l}\in \left\{\mathrm{0,1}\right\} \quad\quad \forall j,k\in H\backslash \{j\},l\in A$$
(29)
$${Y}_{jk}^{i}\ge 0 \quad\quad \forall i\in N,j\in H,k\in H\backslash \left\{j\right\}$$
(30)
$${X}_{jo}^{i},{X}_{oj}^{i}\ge 0 \quad\quad \forall i\in N,j\in A\backslash \left\{o\right\},o\in A$$
(31)

In this formulation, the summation of transportation costs in the hierarchical network and fixed costs of the links constitute the first objective function in Eq. (3). The costs of transportation are calculated according to the flows. These flows include flows between demands and hubs, flows in track routes, and flows in air routes. The second objective function, Eq. (4), minimizes the maximum travel time in the network.

Constraint (5) ensures that every demand node is exactly allocated to one hub (single allocation). By constraint (6), if and only if a node is selected to be a hub, demand nodes can be assigned to it. Constraint (7) expresses single allocation in the first level (allocation to airport hubs). By constraint (8), ground hubs can be assigned to a node if and only if it is selected as an airport hub. Constraint (9) ensures that P hubs (in total) must be located. Due to constraint (10), node o must be the central hub. Constraints (11)–(14) specify how to choose truck routes. By constraint (15), ground hubs assigned to a common airport hub must be connected to each other. Constraints (16)–(20) are related to routing flows in a hierarchical network. Constraint (21) specifies routes that consist of two links and have a hub. In constraint (22), routes consist of an airport hub and two ground hubs, and ground hubs are allocated to the airport hub.

Constraint (23) includes the routes in which there are two ground hubs that are connected to each other. Constraint (24) is related to the routes in which there are a central hub, two airport hubs, and two ground hubs.

In constraints (21)–(24), the travel times for all routes are calculated. Variable ϕ must be greater than or equal to all calculated travel times. Constraint (25) states that the total amount of emissions produced by the trucks must not be greater than a specified value. Constraint (26) guarantees that at least a specified number of jobs must be created. Constraints (27)–(31) describe the type of decision variables.

In constraints (22) and (24), the nonlinear term \({z}_{ij}{h}_{jl}\) can be linearized as Eqs. (32) and (33):

$${\tau }_{ijl}\ge {z}_{ij}+{h}_{jl}-1 \forall i\in N,j\in H,l\in A\backslash \{o\}$$
(32)
$${\tau }_{ijl}\le ({z}_{ij}+{h}_{jl})/2\forall i\in N,j\in H,l\in A\backslash \{o\}$$
(33)

The proposed SHMHNDP after linearization, in the worst case, when we have |N| =|H| =|A|= n, involves a total of 3n3 + n2-2n + 1 decision variables. Of these, 2n3 variables are binary variables, and n3 + n2-2n + 1 variables are continuous variables. Thus, the number of decision variables in the worst case is O(n3). Furthermore, the number of problem constraints equals n6-3n5 + 4n4 + 4n3-n + 4. Of these, there are n6 − 3n5 + 4n4 − 2n3 constraints related to the calculation of the maximum travel time (constraints (21)–(24)), and 6n3 − n + 4 constraints constitute other constraints. Thus, the number of constraints in the worst case is O(n6). It is observed that a high percentage of the problem constraints are related to the maximum travel time constraints. For example, if there are only ten demand nodes on the network, in the worst case, the problem would be 743,994 constraints. Of these, 738,000 constraints are related to the calculation of the maximum travel time on the network, and the number of other constraints is 5994.

5 Alternative formulation

In this section, we present an alternative formulation to reduce the number of constraints of the initial formulation and to solve more instances. First, we introduce some valid inequalities to improve the initial formulation. These inequalities are implied from constraints (21), (23), and (24).

Valid inequality 1. For each feasible solution of the problem, Eq. (34) is valid:

$$\begin{array}{*{20}l} {\phi \ge t_{{ij}} z_{{ij}} h_{{jl}} + \alpha t_{{jl}} z_{{jj}} h_{{jl}} + t_{{ks}} z_{{sk}} h_{{kl}} } \hfill & {\forall i \in N,s \in N\backslash \left\{ i \right\},} \hfill \\ {\quad \quad \quad \quad \, + \alpha t_{{lk}} z_{{kk}} h_{{kl}} } \hfill & {j \in H,k \in H\backslash \left\{ j \right\},l \in A\backslash \left\{ o \right\}} \hfill \\ \end{array}$$
(34)

Proof

If \({z}_{ij}=1\) and \({z}_{sk}=1\), then the right-hand side of Eq. (34) is the same as the right-hand side of Eq. (22). Because in this case \({z}_{jj}=1\) and \({z}_{kk}=1\). If \({z}_{ij}=1\) and \({z}_{sk}=0\), two cases occur: if \({z}_{kk}=1\), the above relation provides a lower bound for \(\phi\). If \({z}_{kk}=0\), the above relation provides a lower bound for the travel time between node \(i\) and any hub allocated to node \(l\). The case \({z}_{ij}=0\) and \({z}_{sk}=1\) is the same as before. If \({z}_{ij}=0\) and \({z}_{sk}=0\), then two cases occur. If \({z}_{jj}=1\) and \({z}_{kk}=1\), the right-hand side indicates the travel time between the hubs \(j\) and \(l\). If \({z}_{jj}=1\) and \({z}_{kk}=0\), the right-hand side of the above relation provides a lower bound for the travel time between node \(j\) and any hub assigned to node \(l\). The case \({z}_{jj}=0\) and \({z}_{kk}=1\) is shown similarly, and the case \({z}_{jj}=0\) and \({z}_{kk}=0\) is easy.

Valid inequality 2. Constraint (21) can be replaced by Eq. (35):

$$\phi \ge \sum_{j\in H}\left({t}_{ij}{z}_{ij}+{t}_{js}{z}_{sj}\right) \quad\quad \forall i\in N,s\in N\backslash \left\{i\right\}$$
(35)

Proof

If \({z}_{ij}=1\) and \({z}_{sj}=1\), the right-hand side of Eq. (35) is equal to the right-hand side of Eq. (21). But if \({z}_{ij}=1\) and \({z}_{sk}=1\) for \(j,k\in H,j\ne k\), the right-hand side of the above relation is less than or equal to the right-hand side of Eq. (23). The above relation has fewer constraints than Eq. (21).

Valid inequality 3. Constraint (23) can be replaced by Eq. (36):

$$\phi \ge {t}_{ij}{z}_{ij}+\sum_{l\in A}\alpha {t}_{jk}{r}_{jk}^{l}+{t}_{ks}{z}_{sk} \quad\quad \forall i\in N,s\in N\backslash \left\{i\right\}, j\in H,k\in H\backslash \left\{j\right\}$$
(36)

Proof

Hub \(j\in H\) and hub \(k\in H\backslash \left\{j\right\}\) can only be allocated to one common airport hub \(l\in A\). Therefore, for index \(l\) of the common airport hub, the right-hand side of Eq. (36) and the right-hand side of Eq. (23) are the same.

Equation (36) has fewer constraints than Eq. (23).

In order to provide an alternative formulation, we define some variables. Variable \({\mu }_{j}\) represents the value of the longest travel time from the nodes assigned to node \(j\). We also consider variable \({\theta }_{j}\) as the value of the longest travel time from node \(j\) to nodes assigned to node \(j\). Variable \({\delta }^{l}\) represents the longest travel time from demand nodes to airport hubs, and variable \({\eta }^{l}\) represents the longest travel time from airport hubs to demand nodes. Variable \({\delta }^{o}\) represents the longest travel time from the demand nodes to the central hub, and variable \({\eta }^{o}\) represents the longest travel time from the central hub to the demand nodes. According to these definitions, the constraints related to the calculation of the maximum travel time can be replaced by Eqs. (37)–(47).

$${\mu }_{j}\ge {t}_{ij}{z}_{ij} \quad\quad \forall i\in N,j\in H$$
(37)
$${\theta }_{j}\ge {t}_{ji}{z}_{ij} \quad\quad \forall i\in N,j\in H$$
(38)
$${\delta }^{l}\ge {\mu }_{j}+\sum_{l\in A\{o\}}\alpha {t}_{jl}{h}_{jl} \quad\quad \forall j\in H\backslash \{o\}$$
(39)
$${\eta }^{l}\ge \sum_{l\in A\{o\}}\alpha {t}_{lj}{h}_{lj}+{\theta }_{j} \quad\quad \forall j\in H\backslash \{o\}$$
(40)
$$\phi \ge {\mu }_{j}+\sum_{l\in A}\alpha {t}_{jk}{r}_{jk}^{l}+{\theta }_{k} \quad\quad \forall j\in H,k\in H\backslash \left\{j\right\}$$
(41)
$${\delta }^{o}\ge {\mu }_{j}+\sum_{l\in A\{o\}}\left(\alpha {t}_{jl}+{t}_{lo}^{P}\right){h}_{jl} \quad\quad \forall j\in H\backslash \{o\}$$
(42)
$${\eta }^{o}\ge \sum_{l\in A\{o\}}\left({t}_{ol}^{P}+\alpha {t}_{lj}\right){h}_{jl}+{\theta }_{j} \quad\quad \forall j\in H\backslash \{o\}$$
(43)
$$\phi \ge {\delta }^{l}+{\eta }^{l}$$
(44)
$$\phi \ge {\delta }^{o}+{\eta }^{o}$$
(45)
$${\mu }_{j},{\theta }_{j}\ge 0 \quad\quad \forall j\in H$$
(46)
$${\delta }^{l},{\eta }^{l},{\delta }^{o},{\eta }^{o}\ge 0$$
(47)

Thus, the alternative formulation simultaneously optimizes the first and second objective functions (Eqs. (3) and (4)) subject to Eqs. (5)–(20), (25)–(31), (35), and (37)–(47).

The alternative formulation of the SHMHNDP, in the worst case, when we have |N| =|H| =|A|= n, involves a total of 2n3 + n2 + 5 decision variables. Of these, n3 variables are binary variables, and n3 + n2 + 5 variables are continuous variables. Thus, the number of decision variables in the worst case is O(n3). Also, the number of problem constraints equals 4n3 + 6n2 + n + 2. Thus, the number of constraints in the worst case is O(n3). Note that in the initial formulation, the number of constraints related to the calculation of maximum travel time was n6-3n5 + 4n4-2n3, that is O(n6). While, in the alternative formulation, the number of constraints of calculating the maximum travel time equals 4n2 + 2n-2, that is O(n2). For example, if |N|= 10, then in the initial formulation, we have 738,000 constraints, and in the alternative formulation, we have 418 constraints related to the calculation of the maximum travel time.

6 Multi-objective optimization methods

Multi-objective optimization seeks to find one or more optimal solutions in problems with more than one objective. These objectives are often in conflict with each other, and thus finding an optimal solution for all objectives at the same time is usually impossible.

6.1 Domination and Pareto optimality

Domination is the basic concept considered in multi-objective optimization (all of the min type), which is defined as follows:

Consider a problem with \(\kappa\) objectives. The solution p dominates the solution \(q\) \((p\prec q\)) if the following conditions are met:

(1) In none of the objective functions, the value of \(p\) is worse than the value of \(q\):

$${f}_{i}\left(p\right)\le {f}_{i}\left(q\right) \forall i\in \left\{\mathrm{1,2},...,\kappa \right\}$$
(48)

(2) At least in one of the objective functions, the value of \(p\) is strictly better than the value of \(q\):

$$\exists i\in \left\{\mathrm{1,2},...,\kappa \right\} | {f}_{i}\left(p\right)<{f}_{i}\left(q\right)$$
(49)

In this case, the solution \(p\) is non-dominated, and the solution \(q\) is dominated. In a set of solutions, a member is a non-dominated solution if no other member dominates it. In addition, the set of non-dominated solutions is called the Pareto set.

6.2 The Epsilon constraint method

The epsilon constraint method can find efficient solutions for multi-objective problems. In this method, we optimize one of the objective functions while the other objective functions are included in the problem constraints. An improved version of the original epsilon constraint method, named augmented epsilon constraint method (AUGMECON), was proposed by (Mavrotas 2009). Then, (Mavrotas and Florios 2013) presented a novel version of the epsilon constraint method, namely, augmented epsilon constraint method version 2 (AUGMECON2). In this method, the information of slack variables is used to reduce computational time by avoiding redundant iterations. More details about the AUGMECON2 method are described in Appendix A.

6.3 Multi-objective genetic algorithm

The classic HLP is NP-hard (Skorin-Kapov et al. 1996). The hierarchical HLP is also NP-hard, because the classic HLP is a special case of the hierarchical HLP (Yaman 2009). In addition, in our problem, there are many constraints related to the calculation of maximum travel time on the network, which can increase the complexity of the problem in terms of the solution space and solution time. Besides, as the size of the problem increases, the solution time increases exponentially, and general solvers cannot solve the problem in a reasonable time. Therefore, in this section, we present two metaheuristic algorithms to solve the problem in larger scales. Among the developed metaheuristic algorithms, the multi-objective genetic algorithm is popular because of its capability in providing Pareto solutions that are uniformly distributed in the search space and are more desirable for the decision-maker. This algorithm maintains the diversity among non-dominated solutions through the crowding mechanism and does not require other control tools. Further, this algorithm is elitist and retains the Pareto solutions found in previous generations. Two of the most powerful algorithms for solving multi-objective problems are Non-dominated Sorting Genetic Algorithm version 2 (NSGA-II) and Non-dominated Rank-based sorting Genetic Algorithm (NRGA), which were proposed by (Deb et al. 2002). The steps of these algorithms are as follows:

6.3.1 Solution representation

At this stage, the way of coding the solutions of the problem is determined. The string or vector of genes, which shows a coded solution, is called a chromosome. An example of the solution representation for the present model is shown in Fig. 2.

Fig. 2
figure 2

The solution representation

In this solution representation, each chromosome consists of two parts. The first part shows how to allocate demand nodes to the located hubs (both ground and airport hubs). The second part describes how to allocate ground hubs to airport hubs. In the second part, when separating airport hubs from ground hubs, the numbers related to airport hubs appear twice, and the numbers related to ground hubs appear only once. Note that the content of the last two genes in each chromosome is always equal to 0 (node 0 is the central hub and it is not allocated to any other hub). According to Fig. 2, because the numbers 2 and 0 appear twice in the second part, these two nodes are selected as airport hubs, and other hubs that appear only once include ground hubs (hubs 1, 8, and 3). The numbers that appear between the two airport hubs show how the ground hubs are assigned to the right-hand side airport hub. Therefore, ground hub 3 is assigned to the central hub, and ground hubs 1 and 8 are assigned to airport hub 2. Once the hubs are located, and the ground hubs are allocated to the airport hubs, it is possible to determine how the other demand points are allocated to the hubs. In the first part of the chromosome, the numbers between the two hubs indicate how the other demand nodes are allocated to the right-hand side hub. If no number is seen between the two hubs, it means that no node is assigned to the hub on the right-hand side. Therefore, in Fig. 2, the nodes are allocated to the located hubs as follows: nodes 4 and 5 to hub 3, node 7 to hub 1, node 6 to hub 2, and node 9 to hub 8. A schematic representation of the above solution is depicted in Fig. 3.

Fig. 3
figure 3

Schematic representation of the solution in Fig. 2

6.3.2 Creating an initial population

After identifying an appropriate mechanism to convert any solution to a chromosome, a set of chromosomes creates the initial population. In this study, the initial population is generated randomly, and the number of its members is considered to be (npop).

6.3.3 Fitness evaluation

In this step, the fitness of each individual is calculated. As we discussed in Sect. 3, the complexity of the problem is mainly related to the high number of constraints for calculating the maximum travel time on the network. Therefore, for specific values of binary variables that are determined from the chromosome, in calculating the value of the first objective function, we relax these constraints because they only include binary variables and maximum travel time variables that are not entered in calculating the first objective function value (continuous variables \({Y}_{jk}^{i}\), \({X}_{jo}^{i}\), and \({X}_{oj}^{i}\) do not exist in constraints of calculating the maximum travel time). In addition, to calculate the maximum travel time, we use a Dijkstra algorithm with polynomial time complexity. In this study, for calculating the value of the first objective, we use the following approach. At first, the values of binary variables are determined based on the individual. Then, these values of binary variables are entered in a commercially available mathematical programming solver (e.g., CPLEX) as parameters, and the first objective function subject to the constraints (16)–(20), (25), (26), (30), and (31) is solved. The reason for the omission of constraints (5)–(15) is that they are already considered in the solution representation. In addition, constraints (21)–(24) are only used for obtaining the second objective; they are not required for calculating the first objective. Because no integer variable exists in the resulting model, it is an LP instead of a MIP problem, and therefore, it can be solved in polynomial time. After solving the LP, if the problem is infeasible, a penalty is considered for the objective functions.

To calculate the second objective function value, the values of binary variables are determined based on the individual, and the corresponding graph structure related to that chromosome is specified. Then, the adjacency matrix of the graph and the travel times between the vertices of the graph (after applying parameter α) are given as inputs to the Dijkstra algorithm. The Dijkstra algorithm calculates the minimum travel time between each pair of vertices of the graph, and the maximum of the minimum travel times (for all pairs) is considered as the second objective function value.

6.3.4 Non-dominating sorting

In this step, each solution is ranked based on the times it is dominated by the other solutions. The solutions that are not dominated by any of the other solutions are located on the first front and assigned rank one. The solutions dominated by at least one of the first front solutions are placed on the second front and assigned rank two, and likewise, all the solutions are sorted out.

6.3.5 Calculation of the crowding distance

After fronting the solutions, the crowding distance criterion is used to evaluate the solutions at a front. In this step, the crowding distance for each solution in each front is calculated, and the closeness of the solution to the other solutions is determined. The solution has a greater crowding distance if it is located in a less crowded space. To determine the crowding distance of the solution i, the average distance of the solution to the two adjacent solutions (in a front) for each objective (m) is calculated, and the summation for all objectives is shown by di as Eq. (50).

$${d}_{i}=\sum_{m=1}^{\kappa }\frac{{f}_{m}^{i+1}-{f}_{m}^{i-1}}{{f}_{m}^{max}-{f}_{m}^{min}}$$
(50)

6.3.6 Selection mechanism

The only difference between the two algorithms NSGA-II and NRGA used in this study is the implementation of this step outlined below:

  • The NSGA-II selection method: In the NSGA-II algorithm, the binary tournament selection method is used. In this method, the selection is based on the following two conditions:

  • Population rank: The population is selected at lower front ranks.

  • Crowding distance: Assuming that p and q are two members of the same front rank, a member is selected that has a greater crowding distance. In other words, the selection priority is based on the rank and crowding distance, respectively.

  • The NRGA selection method: The selection mechanism in the NRGA algorithm is based on a roulette wheel. In the roulette wheel, the selection mechanism is designed such that better possible members are selected. Each member of the population has two attributes: (a) rank of front, which is located on, and (b) member rank within the front based on the crowding distance. Thus, to select a solution, first, a non-dominated front should be selected. Then, a solution must be selected within that front. The probability of selecting the non-dominated front \(i\) is calculated as Eq. (51):

$${P}_{i}=\frac{2\times ran{k}_{i}}{{N}_{f}\times ({N}_{f}+1)}$$
(51)
  • where \({rank}_{i}\) is the rank of the front i, and Nf is the number of fronts determined in the non-dominated sorting step.

The probability of selecting the solution j in the non-dominated front i is calculated as Eq. (52):

$${P}_{ji}=\frac{2\times ran{k}_{ji}}{{N}_{i}\times ({N}_{i}+1)}$$
(52)

where \({N}_{i}\) represents the number of solutions on the front i, and \(ran{k}_{ji}\) denotes the rank of the solution j in the front i, based on the crowding distance.

By defining s1 = ∑Pi and s2 = ∑Pji, the roulette wheel will be defined on two real intervals: [0,s1] and [0,s2]. Based on the calculated probability values, each front occupies a value in the interval [0,s1], and each solution occupies a value in the interval [0,s2]. Then, two random numbers between zero and one are selected. The first random number is used to select the front in [0,s1], and the second random number is used to select one of the solutions in the selected front in the interval [0,s2].

  • Implementing crossover and mutation to produce new offspring

A crossover operator for information exchange between chromosomes in genotype space is used to create offspring chromosomes. The crossover operation can be done in different ways: single-point, two-point, multi-point, and uniform. In this study, the single-point crossover (like Fig. 4) is used. In our implementation, a random point is selected from the allocation part of chromosomes, and the contents of genes before and after that point are exchanged.

Fig. 4
figure 4

The single-point crossover

In the crossover operation, there may be some repetitive numbers, or some numbers may not appear in the permutation. Hence, repairing of offspring is needed. Thus, repetitive numbers should be removed, and numbers that are not in the permutation need to be added (Fig. 5).

Fig. 5
figure 5

Repairing the result of crossover

The mutation operator prevents the algorithm from being trapped in local optimum points. In the mutation operation, offspring receives different characteristics from parents. In this study, in order to perform mutation, two points from the allocation part of a chromosome are selected randomly, and contents of genes are replaced (see Fig. 6).

Fig. 6
figure 6

The mutation operator

6.3.7 Combining offspring and parent population

In this step, by merging the offspring and parent population, a new population is created. The new population is first sorted based on ranking, during which solutions with worse ranks are eliminated. The remaining population is also sorted according to the crowding distance. Then, the first (npop) individuals are transferred to the next generation of the algorithm.

6.3.8 Termination condition

The number of function evaluations (NFE) is used in this study as the termination condition of the algorithms.

6.3.9 Performance criteria

Because in multi-objective problems, a set of non-dominated solutions is selected as optimal solutions for setting parameters and comparing their efficiency, four comparison criteria (multi-objective indexes) are considered; they are summarized as follows:

  • Quality metric (QM)

The most important criterion for comparing the quality of the solution methods is QM, which is simply obtained in three steps. At first, the results obtained through algorithms are placed in a new archive. Then, all the solutions are again mutually compared to each other to update the archive. Finally, QM is calculated as the number of non-dominated solutions that belong to the results of each algorithm to the number of non-dominated solutions in the archive. The higher percentage indicates the higher quality of the algorithm.

  • Mean ideal distance (MID)

This criterion value is equal to the distance of the Pareto solutions of the algorithm from the ideal point. For a bi-objective problem, MID is calculated as Eq. (53):

$$MID=\frac{\sum_{i=1}^{n}\sqrt{{\left(\frac{{f}_{1i}-{f}_{1}^{Best}}{{f}_{1,total}^{max}-{f}_{1,total}^{min}}\right)}^{2}+{\left(\frac{{f}_{2i}-{f}_{2}^{Best}}{{f}_{2,total}^{max}-{f}_{2,total}^{min}}\right)}^{2}}}{n}$$
(53)

In which \(n\) is the number of Pareto solutions, \({f}_{i,total}^{max}\) is the maximum value of objective functions in all algorithms, and \({f}_{i,total}^{min}\) is the minimum value of objective functions in all algorithms. In the above equation, the ideal solution is \(\left({f}_{1}^{Best},{f}_{2}^{Best}\right)\).

  • Spacing metric (SM)

This criterion measures the uniformity of set points in the solution space. Equation (54) presents how to calculate the criterion:

$$SM = \frac{{\sqrt {\frac{{\mathop \sum \nolimits_{i = 1}^{n - 1} \left( {d_{i} - \overline{d}} \right)^{2} }}{n}} }}{{\overline{d}}}$$
(54)

where \(n\) is the number of Pareto solutions, di is the Euclidean distance between solution \(i\) and solution \((i+1)\) in solution space, and \(\overline{d}\) is the mean of distances di.

  • Diversification metric (DM)

This criterion shows the extent of Pareto solutions of an algorithm. DM is calculated using Eqs. (55) and (56).

$${d}_{i}^{^{\prime}}=\underset{j}{max}\left\{\sum_{m=1}^{\kappa }{\left({f}_{m}^{i}-{f}_{m}^{j}\right)}^{2}\right\}$$
(55)
$$DM=\sqrt{\sum_{i=1}^{n}{d}_{i}^{^{\prime}}}$$
(56)

In which \(\kappa\) is the number of objective functions. Also, \({f}_{m}^{i}\) and \({f}_{m}^{j}\) are the values of the objective function \(m\) for the two Pareto solutions \(i\) and \(j\).

6.4 Torabi-Hassini method and augmented weighted Tchebycheff procedure

In the following, we will use two exact solution methods to solve the SHMHNDP. The methods include the Torabi and Hassini (TH) method and the Augmented Weighted Tchebycheff Procedure (AWTP). The TH method is an interactive method to solve multi-objective optimization problems. This method was introduced by Torabi and Hassini, (2008). The TH method can determine the satisfaction degree of objective functions with respect to decision-maker preferences. This method solves a single-objective parametric mathematical programming problem to get compromise solutions. The TH method details are presented in Appendix B. The AWTP is one of the developed methods for solving multi-objective optimization problems based on reference points. The AWTP was introduced by Steuer and Choo, (1983). This method can find efficient Pareto solutions. The AWTP details are described in Appendix C.

7 Numerical experiments

In this section, the results of the proposed solution methods are presented. Computations are carried out on a laptop with the following characteristics: processor Intel(R) Core (TM) i5-5200U CPU @ 2.20 GHz and 6 GB of RAM. The AUGMECON2, TH, and AWTP methods are coded in GAMS software version 24.1.2 and the CPLEX solver version 12.5.1.0 is applied to solve instances. NSGA-II and NRGA are implemented in MATLAB software version 2014a.

7.1 Test problems and case study

To validate the proposed model, six test problems are solved and the results are reported. Test problems #1 to #5 are generated based on the Turkish dataset (with 81 demand nodes) in Operations Research (OR) library (Beasley 1990), while test problem #6 (hereafter called the case study) is exactly the case study of the Turkish network for freight transportation mentioned by Alumur et al. (2012b). In addition, test problems #1-#6 consist of |N|= 10, 12, 15, 17, 50, 81 demand nodes, respectively. Candidate nodes for hubs consist of |H|= 5, 6, 7, 11, 17, 22 nodes, and candidate nodes for airport hubs consist of |A|= 4, 4, 5, 7, 15, 19 nodes, respectively. Also, the number of hubs that must be located are P = 3, 3, 4, 5, 6, 8, respectively. Demand flows, travel times using ground transportation, transportation costs (distances between the nodes), and fixed costs of establishing the links between the nodes are available in OR library. Other parameters of the model that we needed for test problems are not available in OR library. So, the values of some of these parameters are generated based on the work of Alumur et al. (2012b). The demand points and the candidate nodes for hubs are shown in Fig. 7, and Ankara is selected as the central hub. All of the candidate nodes for hubs are candidates for airports except nodes 3, 68, and 81. Travel time between the central hub and other airport hubs are calculated based on the average speed of airplane (700 km/h). It should be noted that data on the distance is updated due to the construction of new highways within Turkey, and we use the updated data given in OR library via a link. The value of parameters related to emissions and jobs and other characteristics of the generated test problems are presented as online supplementary in Appendixes D and E.

Fig. 7
figure 7

Candidate nodes for hubs and location of central hub in the case study

7.2 Parameter setting

Parameters like crossover probability and mutation probability can significantly affect the performance of NSGA-II and NRGA (Gupta et al. 2019). In order to set the parameters of the proposed methods, the Taguchi design of experiment is applied. The way it works is that first, different levels are considered for the parameters that need to be set. Then, the experiments suggested by the Taguchi method should be performed. These experiments involve solving the problem by both algorithms considering different levels for the parameters (according to the proposed levels of the Taguchi method). After solving the problem, the values of multi-objective indicators must be calculated. In order to calculate a response for each experiment, the indicators need to be normalized according to their nature, and a specific weight must be considered for each indicator. After determining the responses, the best levels for the parameters are calculated using the S/N ratio.

Hence, the input parameters of the proposed algorithms and their levels are determined according to Table 3. To determine the most efficient levels of each parameter in the Taguchi method, by using the Minitab software, the orthogonal array L9 is used and nine experiments are set.

Table 3 Levels of factors

By solving the problem, with respect to the Pareto solutions obtained by NSGA-II and NRGA, the four defined criteria are calculated for NSGA-II and NRGA (Table 4). Note that each experiment is repeated six times, and the average of indexes is calculated.

Table 4 Result of the criteria

Subsequently, in order to create an output of each experiment, all criteria will be converted to a response by using the following method:

(1) At first, the positive or negative nature of each criterion must be specified. Here, the QM and DM are positive in nature, and thus, higher values are better for them. Instead, the MID and SM are negative in nature. Figure 8 shows the criteria and different situations (the designed experiments).

Fig. 8
figure 8

The matrix of results for experiments

In Fig. 8, \({O}_{i}\) and \({X}_{j}\) represent situation \(i\) and criterion \(j\), respectively. Also, \({r}_{ij}\) is the value of situation \(i\) in criterion \(j\). The number of experiments is \(\eta\).

(2) The obtained values of the parameters are normalized using the linear normalization technique (Çelen 2014) as Eqs. (57) and (58).

$${X}_{j}^{+}\to {R}_{ij}=\frac{{r}_{ij}-\underset{i=1:n}{min}\left({r}_{ij}\right)}{\underset{i=1:n}{max}\left({r}_{ij}\right)-\underset{i=1:n}{min}\left({r}_{ij}\right)}$$
(57)
$${X}_{j}^{-}\to {R}_{ij}=\frac{\underset{i=1:n}{max}\left({r}_{ij}\right)-{r}_{ij}}{\underset{i=1:n}{max}\left({r}_{ij}\right)-\underset{i=1:n}{min}\left({r}_{ij}\right)}$$
(58)

In this technique, criteria with negative nature become positive.

(3) In this step, a Total Weighted Normalized Indicator (TWNI) of the four criteria (Nasiri et al. 2018a) is calculated according to their importance and considered as response. In this problem, the following weights based on the importance of each criterion are intended.

$$Respons{e}_{i}=\sum_{j=1}^{n}{R}_{ij}{w}_{j}$$
(59)

(4) where (wQM,wMID,wSM,wDM) = (102,10,1,1). The results of normalization and calculation of the responses are collected in Table 5.

Table 5 Normalization for NSGA-II and NRGA

After obtaining the values for four multi-objective criteria, the response values of each experiment for NSGA-II and NRGA are calculated (column “response” in Table 5). Next, the S/N ratio is used to obtain the suitable level of each input parameter, which is shown in Table 6. The results of Taguchi experiments are reported in Appendix F.

Table 6 The suitable levels for parameters

7.3 The performance of multi-objective optimization methods

7.3.1 Results of AUGMECON2 for the alternative formulation

The alternative formulation has fewer constraints than the initial formulation, and it is computationally more efficient. The test problems #4, #5, and #6 (case study), which could not be solved with the initial formulation, can be solved with this formulation. The results of solving test problems #1-#6, including the payoff table and Pareto-optimal solutions obtained by AUGMECON2 method, are presented in Tables 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, and 18. The solution times for test problems #1-#6 are 0.72, 1.52, 2.14, 11.13, 69.08, and 538.94 s, respectively. The Pareto-optimal solution #4 for test problem #6 (case study) is shown in Fig. 9.

Table 7 Payoff table for test problem 1
Table 8 Pareto optimal set for test problem 1
Table 9 Payoff table for test problem 2
Table 10 Pareto optimal set for test problem 2
Table 11 Payoff table for test problem 3
Table 12 Pareto optimal set for test problem 3
Table 13 Payoff table for test problem 4
Table 14 Pareto optimal set for test problem 4
Table 15 Payoff table for test problem 5
Table 16 Pareto set for test problem 5
Table 17 Payoff table for test problem 6
Table 18 Pareto set for test problem 6
Fig. 9
figure 9

Representation of Pareto-optimal solution #4 for case study obtained by AUGMECON2

7.3.2 Results of NSGA-II and NRGA

The Pareto solutions obtained by AUGMECON2, NSGA-II, and NRGA for test problems #1-#3, and the Pareto solutions obtained by NSGA-II and NRGA for test problems #4-#6 are shown in Fig. 10. According to Fig. 10a, the number of Pareto solutions obtained from AUGMECON2 and NSGA-II is greater than the number of Pareto solutions obtained from NRGA. A Pareto solution has been found by all three algorithms. Also, AUGMECON2 covers a larger area of the solution space. According to Fig. 10b, AUGMECON2 has found more solutions than the other two algorithms. According to Fig. 10c, the number of Pareto solutions generated by AUGMECON2 is greater than NSGA-II and NRGA. Also, the Pareto solutions generated by AUGMECON2 dominate all the Pareto solutions of NRGA and NSGA-II.

Fig. 10
figure 10

Pareto frontier for test problems a #1, b #2, c #3, d #4, e #5, f #6 (case study)

In order to compare the Pareto solutions obtained from AUGMECON2, NSGA-II, and NRGA (depicted in Fig. 10), multi-objective indexes for these solution methods are reported in Table 19. According to Table 19, AUGMECON2 outperforms NSGA-II and NRGA in terms of QM.

Table 19 Results of indexes for test problems 1, 2, and 3

7.3.3 A comparison between NSGA-II and NRGA

In the following, the performance of the two metaheuristic algorithms (NSGA-II and NRGA) are compared. The purpose of this section is to investigate which of the metaheuristic algorithms performs better in terms of multi-objective indicators (including QM, MID, SM, and DM). For this purpose, it is necessary to prepare different instances of the problem with different sizes and use these instances to make statistical inferences on the performance of these algorithms. After identifying instances, each of these instances must be solved by both algorithms, and the values of multi-objective indicators for each instance are calculated. Then, by statistical tests, the performance of these two algorithms is compared.

To compare the efficiency of the developed metaheuristic algorithms, twenty-two instances with different sizes (small, medium, and large) are generated. We use three datasets for generating instances with different sizes: CAB (O'kelly, 1987), for generating instances with 18 to 25 nodes, IAD (Karimi and Bashiri 2011), for generating instances with 26–37 nodes, and TR (Tan and Kara 2007), for generating instances with 40 to 81 nodes. The reason for using different datasets is to make a more comprehensive comparison of the mean of multi-objective indexes in the two metaheuristic algorithms. The number of demand nodes, candidate nodes for hubs, candidate nodes for airport hubs, central hub, and the number of hubs that must be located are presented in Table 20. Note that sets N, H, and A consist of the first |N|, |H|, and |A| nodes from that dataset. Other details of generating instances are provided in Appendix G. The obtained indexes are presented in Tables 21, 22, and 23. It should be noted that the results are the average of the six times of running the proposed metaheuristics.

Table 20 Characteristics of test instances on three datesets
Table 21 Calculated indexes for the small-sized instances
Table 22 Calculated indexes for the medium-sized instances
Table 23 Calculated indexes for the large-sized instances

In order to compare the performance of the two metaheuristics statistically, we use a T-test. In this way, before using the T-test, a “Two-sample F-test” is used to investigate the equality of variances. If the variances are equal, a “Two-sample T-test assuming equal variances” is applied, and if the variances are not equal, a “Two-sample T-test assuming unequal variances” is applied. In all tests, the null hypothesis is that the variances or averages are equal, and the alternative hypothesis is that they are different. Results of statistical tests and a graphical representation of indexes are shown in Fig. 11.

Fig. 11
figure 11

Indexes of NSGA-II and NRGA for various instances and statistical comparison

As shown in Fig. 11, there is no significant difference between the performance of the two proposed metaheuristics in the four indexes. A comparison of the solution times in the proposed metaheuristics with increasing the size of the problem is demonstrated in Fig. 12.

Fig. 12
figure 12

A comparison between solution times of NSGA-II and NRGA for various datasets

According to Fig. 12, it can be concluded that by increasing the size of the problem, solution times in NSGA-II and NRGA increase but an increase in solution times is reasonable (the test instance with 81 nodes is solved in about four hours). Also, NRGA is faster than NSGA-II except for two instances (instances #1 and #2). However, in small problem instances (with 15 nodes or fewer), AUGMECON2 outperforms NSGA-II and NRGA in terms of the quality of solutions.

7.3.4 Results of TH method

Tables 24, 25, 26, 27, 28, and 29 show the Pareto solutions obtained by the TH method for test problems #1-#6. According to Tables 24, 25, 26, 27, 28, and 29, for a certain value for \(\psi\), with increasing the importance coefficient of an objective function, the satisfaction degree of that objective function does not decrease. Also, for a certain value for importance coefficient, with increasing \(\psi\), the satisfaction degree of the second objective function does not decrease, while the satisfaction degree of the first objective function decreases except in one case. In Table 29, for \({\pi }_{1}=0.2\) and \({\pi }_{2}=0.8\), by increasing \(\psi\) from 0.2 to 0.4, the satisfaction degree of the first objective function increases and the satisfaction degree of the second objective function decreases. It can be concluded that the higher value of the parameter \(\psi\) means more attention to the acquisition of a higher lower bound for the satisfaction degree of the objective functions and thus more balanced solutions are obtained. Conversely, lower values of \(\psi\) mean more attention to the objective function with a higher importance coefficient, and, as a result, unbalanced solutions are obtained.

Table 24 Results of TH method for test problem 1
Table 25 Results of TH method for test problem 2
Table 26 Results of TH method for test problem 3
Table 27 Results of TH method for test problem 4
Table 28 Results of TH method for test problem 5
Table 29 Results of TH method for test problem 6 (case study)

7.3.5 Results of AWTP method

Tables 30, 31, 32, 33, 34, and 35 show the Pareto solutions obtained by the AWTP method for test problems #1-#6. According to Tables 30, 31, 32, 33, 34, and 35, with increasing the weights of the objective functions, the value of that objective either improves or remains unchanged except in one case (Table 33). In Table 33, by increasing the weight of the first objective function from 0.4 to 0.5, the first objective function value worsens. Also, by decreasing the weight of the second objective function from 0.6 to 0.5, the second objective function value improves.

Table 30 Results of AWTP for test problem 1
Table 31 Results of AWTP for test problem 2
Table 32 Results of AWTP for test problem 3
Table 33 Results of AWTP for test problem 4
Table 34 Results of AWTP for test problem 5
Table 35 Results of AWTP for test problem 6 (case study)

7.3.6 A comparison between the proposed solution methods

The results of solving test problems #1-#6 by AUGMECON2, TH, and AWTP methods are shown in Fig. 13. According to Fig. 13, the number of solutions obtained by the AWTP is more than the other two methods.

Fig. 13
figure 13

Representation of solutions obtained by AUGMECON2, TH, and AWTP for test problems a #1, b #2, c #3, d #4, e #5, f #6 (case study)

The results of multi-objective indicators for the proposed solution methods on test problems #1-#6 are shown in Table 36. The results of the TWNI based on the normalized indexes are reported in the last column of Table 36. On test problems #1-#6, the average values of the TWNI for AUGMECON2, NSGA-II, NRGA, TH, and AWTP are 77.75, 7.92, 1.54, 65.62, and 44.73, respectively. Therefore, in terms of TWNI, the AUGMECON2 has the best performance among the proposed algorithms. The worst performance is related to the NRGA.

Table 36 Results of normalized indexes for test problems #1 to #6

7.3.7 A comparison between the proposed formulations

In this subsection, the performance of the proposed formulations (initial and alternative) in terms of solution time are compared. To do this, we optimize the test problems #1, #2, and #3 using the proposed exact solution methods (AUGMECON2, AWTP, and TH) for the initial and alternative formulations. Note that in the TH method, the compensation coefficient is considered equal to 0.5 and the importance coefficients of the objectives are considered from 0.1 to 0.9 (with a moving step of 0.1). The reported solution time is the sum of the solution time of finding the PIS and NIS values and solving the single-objective problems of the TH method. Also, in the AWTP, the weights of the objective functions are considered from 0.1 to 0.9 (with a moving step of 0.1). The results are reported in Table 37. According to Table 37, the alternative formulation in all three solution methods improves the solution time of the initial formulation. The lowest and highest percentages of improvement are 59.79 and 95.62%, respectively, and the average percentage of improvement is 79.02%.

Table 37 A comparison between solution times of the exact methods for the proposed formulations

7.4 Sensitivity analysis

The most important parameters of the model that affect costs and maximum travel time are the number of hubs, the allowable amount of emissions, the number of jobs that must be created, and discount factor for travel time. Thus, we make a sensitivity analysis on these parameters to show their effects on the model. For investigating the effect of the number of hubs on the costs and level of service in the hierarchical network, we solve the proposed bi-objective model of Sect. 3.3 or Sect. 4 using the AUGMECON2 method for different values of parameter P (number of hubs) and get the Pareto frontier. Table 38 demonstrates the conflict between objectives using the Pareto solutions of test problem #3 with four hubs.

Table 38 Pareto set for test problem 3 with four hubs

In Fig. 14, the graphical representation of the optimal solutions obtained by considering only the first objective, only the second objective, and Pareto-optimal solution #4 (Table 38) for test problem #3 with four hubs are shown. According to Fig. 14, if we optimize just the first objective, nodes 1, 7, and 16 are selected as airports, and other demand nodes are allocated to the central hub and airports in a way that transportation costs and fixed costs of links are minimized. If we optimize just the second objective, airports 7 and 16 and ground hub 5 are selected in order to minimize maximum travel time in the network. In this case, hubs have a central location, and we pay attention to the far demand points like 8 and 4. In bi-objective optimization, both criteria (minimizing costs and minimizing the maximum travel time) are considered, and the obtained solution is a combination of locations and allocations in two previous cases.

Fig. 14
figure 14

Representation of solutions for test problem 3 obtained by minimizing a first objective, b second objective, c Pareto solution #4

The results of sensitivity analysis of parameter P are shown in Fig. 15.

Fig. 15
figure 15

Sensitivity analysis of parameter P for test problem 3

According to Fig. 15, by increasing the number of hubs, the costs of the hierarchical network and also the maximum travel time in the network (which is interpreted to the level of service) decrease. In addition, as demonstrated in Fig. 15, one can say that by increasing the number of hubs, the Pareto frontier is improved. However, the amount of improvement can be negligible when the number of hubs increases from five hubs. As the number of hubs increases from two hubs to four hubs, the number of Pareto solutions increases, which allows the decision-maker to reduce transportation costs as much as possible. Also, by increasing the number of hubs from two hubs to four hubs, the minimum value for the second objective function decreases, which means that a higher level of service is achieved to customers. But by increasing the number of hubs from four hubs to six hubs, the minimum value for the second objective function does not change, and therefore, the highest level of customer service does not increase.

To analyze the sensitivity of objective values concerning parameters for \(Emissions\) and \(Jobs\), we consider different values and solve the bi-objective model to get the Pareto-optimal solutions for test problems. Since each of these two parameters appears only in one constraint, the effect of changing their values can be seen in a limited number of points (only two or three values are reported for each test problem). The results of sensitivity analysis are reported in Tables 39, 40, and 41. In Table 39, the average objective function values of all obtained Pareto solutions are reported in columns 3 and 4. It can be observed that increasing the allowable level of emissions increases the costs of the hierarchical network and decreases the maximum travel time in the hierarchical network, which means an improvement in the level of service. This phenomenon can be described considering that by altering the feasible region, new Pareto solutions with a significantly greater value of the first objective can be obtained. Therefore, the average of the first objective values of the new Pareto solutions may worsen despite the expansion of the feasible region.

Table 39 Sensitivity analysis of parameter \(Emissions\)
Table 40 Sensitivity analysis of parameter \(Jobs\)
Table 41 Results of test problem 3 for different values of parameters \(Emissions\) and \(Jobs\)

In Table 40, in all cases, by increasing the value of parameter \(Jobs\), the maximum travel time in the network increases, and the level of service decreases. Also, system costs decrease except in test problem #2.

In Table 41, in a specified level of allowable emissions, by increasing the parameter Jobs, in case Emissions = 7 × 1015, the maximum travel time and system costs remain constant. In the other cases, system costs decrease and the maximum travel time increases. Also, in case Jobs = 10,300, by increasing the allowable level of emissions, system costs increase, and the maximum travel time decreases.

Next, to examine the changes in the second objective function value relative to the discount factor for travel time, we optimize the second objective function for different values of the parameter α. The results are shown in Fig. 16. According to Fig. 16, as the value of α increases, a non-decreasing trend is observed in the curve. In two cases, as the value of α increases, there is no change in the second objective function value (from α = 0.4 to α = 0.5 and from α = 0.8 to α = 0.9). Between the values of α = 0.6 and α = 0.8, the objective value increases with a greater slope. It can be concluded that for increasing the level of service, the value of the parameter α should be reduced.

Fig. 16
figure 16

Sensitivity analysis of parameter α for test problem 5

8 Conclusions and future research

In this paper, a mathematical model for the design and optimization of a sustainable hierarchical hub network was presented. The model considers two objective functions, the first of which seeks to minimize transportation costs and the costs of establishing links in the network. The second objective is dedicated to minimizing the maximum travel time in the network. Sustainability aspects (emissions and employment) and responsiveness (level of service) were considered in the model. This model can be applied in a freight or passenger transport network with two transport modes (air and ground), in which there are different levels of service (hierarchical), and customers are varied in choosing their desired services. Moreover, concerning the problem objectives, the maximum travel time is minimized in the second objective function, and as a result, the level of service increases.

For solving the model, the AUGMECON2 method was used. Tthe solution time of AUGMECON2 was increased exponentially by increasing the size of the problem. Therefore, two multi-objective genetic algorithms (NSGA-II and NRGA) were applied to solve more instances of the problem. Several criteria and statistical tests were applied to compare the efficiency of the two proposed metaheuristics. Experimental results demonstrated that NRGA can find solutions with the same quality as the NSGA-II in less solution time. Also, an alternative formulation for the problem was proposed, which was computationally more efficient than the initial formulation. With the help of this formulation, medium and large-sized problems that could not be solved with the initial formulation were solved by three exact multi-objective solution methods, including AUGMECON2, TH, and AWTP, and Pareto-optimal solutions were reported. In addition, the performance of the proposed solution methods was measured by some multi-objective indicators. The results showed the superiority of AUGMECON2 over the proposed solution methods.

For extending the model, other assumptions that exist in real world can be included in the model. In this model, capacity constraints for ground and airport hubs have not been considered. This is a limitation of the current model and can be considered in future research. Also, in the present study, air pollution has not been considered for the airplanes, which may be a fruitful search area. Furthermore, a similar model can be proposed for other variants of HLP such as the p-hub center location problem, hub covering location problem, and continuous HLP. In addition, other solution methods for reducing solution time in large scales of the problem can be developed by researchers.