Elsevier

Computer Networks

Volume 103, 5 July 2016, Pages 67-83
Computer Networks

Efficient Hash-routing and Domain Clustering Techniques for Information-Centric Networks

https://doi.org/10.1016/j.comnet.2016.04.001Get rights and content

Abstract

Hash-routing is a well-known technique used in server-cluster environments to direct content requests to the responsible servers hosting the requested content. In this work, we look at hash-routing from a different angle and apply the technique to Information-Centric Networking (ICN) environments, where in-network content caches serve as temporary storage for content. In particular, edge-domain routers re-direct requests to in-network caches, more often than not off the shortest path, according to the hash-assignment function. Although the benefits of this off-path in-network caching scheme are significant (e.g., high cache hit rate with minimal co-ordination overhead), the basic scheme comes with disadvantages. That is, in case of very large domains the off-path detour of requests might increase latency to prohibitive levels. In order to deal with extensive detour delays, we investigate nodal/domain clustering techniques, according to which large domains are split in clusters, which in turn apply hash-routing in the subset of nodes of each cluster. We model and evaluate the behaviour of nodal clustering and report significant improvement in delivery latency, which comes at the cost of a slight decrease in cache hit rates (i.e., up to 50% improvement in delivery latency for less than 10% decrease in cache hit rate compared to the original hash-routing scheme applied in the whole domain).

Introduction

Internet usage patterns have been constantly changing over the last decades, reaching a situation that was not foreseen when it was originally designed. The engineering principles underpinning today’s Internet architecture were created in the 1960s and 1970s with the assumption that Internet would be mainly used for host-to-host communications. Instead, nowadays, the Internet is increasingly being used for content dissemination and retrieval and this trend is forecast to continue in the foreseeable future [1].

This mismatch between the original design assumptions and current usage patterns has partially been addressed through application layer solutions such as Content Delivery Networks (CDN) and Peer-to-peer (P2P) overlays, which have retrofitted some desirable content-aware functionalities on top of the existing architecture. However, the lack of native network support for content distribution restricts the efficiency of such approaches, and also potentially hinders the evolution of the Internet as a whole.

This has created a trend towards content-oriented networking, which has recently been realised through the Information-Centric Networking (ICN) paradigm. Information-Centric Networking, similarly to P2P and CDNs, puts content itself in the forefront of attention when it comes to content delivery. That is, content can be delivered from any network location/device, provided that this device holds a valid copy of the requested content.

Extending the P2P paradigm, where mainly end-user devices can serve requests for content, the ICN paradigm also includes in-network devices, that is router caches, as potential content servers. Based on this principle, a new field of research has emerged coined “in-network caching”. The challenges of addressing content temporarily stored in router-caches (from now on referred to as routers, or caches), resolving the location of this cache and fetching the content from the corresponding network location has recently attracted considerable attention [2]. Such challenges include caching redundancy, efficiency in utilising in-network storage [3] and replacement of content in caches according to their popularity [4], to name a few. Last but not least, an ICN network operates based on packet-sized content chunks, instead of content objects (or files), as is the norm in case of overlay/proxy caching. This fact adds one more requirement to the operation of in-network content caches - that of line speed operation.

In this paper, we deal with the resolution of requests to in-network content caches in domain-wide environments and the optimisation of the resolution process in order to increase cache hits, but also keep the latency needed to reach the cache under certain limits. Our cache-aware routing scheme utilises hash-routing techniques, which have been proposed in the past for mapping requests to physically co-located servers [5], [6]. According to hash-routing, each element within the network (be it servers in server-racks, or routers within a domain network) are assigned with a range of the hash space and store the content items whose hashed identifiers fall within the node’s hash space. In contrast to alternative architectural proposals according to which extra resolution steps are essential (e.g., [7], [8], [9]), this operation avoids complex request-to-cache resolution and minimises signalling overhead (the only overhead is the calculation of the hash function itself).

Similar to the work in [10], we also target here domain-wide ICN deployments, where a content naming scheme, flat or hierarchical, is in place. Also, the edge-domain routers implement a hash function that determines both the content placement and the request-to-cache routing process. In particular, when an edge router receives a content request, calculates the hash of the content identifier and redirects it to the responsible cache. If the requested content is cached in the corresponding router, the content is returned to the client, otherwise, the request is forwarded towards the original server. In a similar way, incoming content items responded for the origin server are forwarded for caching (or not) according to the hash of their identifier. As it is described in [10], the main concept underpinning our approach is that a content can be opportunistically found in a domain only in the cache calculated by the hash function.

The hash-routing schemes proposed in this paper require edge-domain routers and cache nodes to implement a hash function which maps content identifiers to cache nodes. This function is used: (i) by cache nodes to identify the set/range of content identifiers or names that they are responsible for, and (ii) by edge routers to route requests to the corresponding cache node (see Fig. 1(a)). As a result of this approach, each content object can be cached in a domain at most once, thus preventing redundant replication of cached content and resulting in more efficient utilisation of cache space. This approach also allows edge routers to forward content requests to the designated cache directly, without performing any lookup. In addition, the intra-domain forwarding procedure is performed without requiring any sort of inter-cache co-ordination since the hash function can be computed in a distributed manner by edge routers and caches, thus being scalable to any domain size.

The hash function maps a content identifier (flat or hierarchical) to a caching node of the domain. Such function does not need to produce a cryptographic hash. In fact, it is desirable for its output to be produced with minimal processing as long as it is capable of mapping content items to cache nodes so that the load of caches is evenly spread.

For example, in case of human-readable identifiers, like URLs (RFC 3986, [11]), content items can be mapped to caching nodes by hashing their identifiers using fast non-cryptographic hashing functions such as Murmur, Jenkins, xxHash, CityHash and CRC32 and then applying modulo hashing over the number of caching nodes. In case of binary content identifiers, such as those defined by RFC 6920 [12], modulo hashing can be applied directly on content names.

Consistent hashing [6] may also be used to minimize the number of items to be remapped as a result of failures or additions or removals of caching nodes. The choice of the specific hash function is out of the scope of this paper, since it does not affect the optimisation process proposed here, and each network manager can choose the one that fits its own specifications.

In [10], we have designed and evaluated the performance of five different algorithms that take advantage of hash-routing techniques in ICN environments. The differences between the five algorithms proposed in [10] lie in the routing and replication process followed for requests and content objects, respectively. In particular, it is clear from Fig. 1(b) that hash-routing techniques can follow symmetric or asymmetric paths to deliver the content. In the interests of space, we omit the details of the different hash-routing techniques investigated in [10] and refer the reader to that paper for further details.

Hash-routing falls in the group of off-path in-network caching techniques, as opposed to on-path in-network caching (e.g., [3], [13] - see further discussion in Section 2), according to which content is fetched from caches, only if the request “hits” the content along the path to the content origin. Although on-path in-network caching techniques by definition do not require any co-ordination between cache nodes, they result in suboptimal performance, as we have already demonstrated in [10]. In contrast, off-path caching techniques improve performance in terms of cache hits, but come at the cost of extra coordination, e.g., [7], [14].

Hash-routing techniques clearly improve the performance of the cache network in terms of cache hits, as shown extensively in [10] and at the same time require minimal co-ordination among network nodes. However, this increase in cache-hits comes at the cost of increased latency caused by the detouring required to look up the responsible cache. This tradeoff is clearly affected by the number of extra hops to be travelled off-path to find the cached content. In case of a small network/domain, this latency might be negligible, but as the size of the network increases (and depending also on topological characteristics, such as the density of the network graph, its diameter, etc.), the increase in latency might become prohibitive.

Building on this tradeoff, in this paper, we investigate the potential of partitioning the network using node clustering techniques to keep the latency under certain bounds. We introduce the system model in Section 3. We make use of the well known “k-split clustering” [15], [16] and “k-medoids clustering” [17] techniques and we also introduce a new “Bin packing content assignment function” (Section 4). Our results show that by splitting large domains in clusters the latency to hit and deliver cached content indeed drops. Inevitably and expectedly, this comes at the cost of lower cache hit rates (up to 50% improvement in delivery latency for less than 10% decrease in cache hit rate). The choice of whether an ISP makes use of hash-route clustering and which clustering technique to use depends on the interests and the business model of the particular ISP. We finally lay down the options different ISPs might have according to domain sizes.

Section snippets

Related work

The topic of in-network content caching has received wide attention recently in the context of Information-Centric Networks. Generally speaking, and according to [2], the in-network caching problem can be split in three distinct subproblems, which we next discuss in turn. These are the allocation of caches to network routers and the economic incentives of ISPs and other market players to adopt the new networking paradigm; the placement of content into the caches and the subsequent discovery and

System model and problem formulation

We consider an information-centric network of arbitrary topology, which can be represented as a graph G=(V,E). Let V denote the set of ICN routers and E the set of communication links connecting them. Let also Cv be the storage capacity (in bits) of router vV. Throughout the paper we will use the calligraphic letters to denote sets and the corresponding capitals for cardinality; for example |V|=V.

Let M denote a given and fixed set of M content items that have to be delivered over the network.

Nodal partitioning and hash routing

From the above analysis it is evident that there exists a trade-off between the cache hit ratio (probability of finding a requested item cached within the domain i.e., Eq. (1)) and the incurred latency for the retrieval of an item (i.e., Eq. (3)). In particular, for a given hash function ρ and a given total amount of cache deployed in some domain, the larger the domain, the larger the overall cache capacity (assuming equal cache capacity among the network nodes), which means less items assigned

Evaluation setup

In this Section, we evaluate through simulations the performance of hash routing in combination with the offline partitioning/clustering schemes presented in Section 4. We compare the performance of clustering techniques with the original hash-routing proposal [10] applied in the whole network/domain. Moreover, we examine two different content-to-cache assignment functions namely the modulo assignment function (Mod) and the newly proposed Bin packing assignment function (Bin). For comparison we

Conclusions

The process of resolving requests to in-network caches has concerned the ICN community so far and has resulted in several proposals to deal with this issue. However, the tradeoff between performance (in terms of cache hits) and co-ordination overhead (which raises scalability concerns) is not easy to balance. We believe that hash-routing techniques offer a very easy to implement, efficient and scalable way of assigning content items to network caches and redirecting content requests to the

Acknowledgements

V. Sourlas work is supported by the European Commission through the FP7-PEOPLE-IEF INTENT project, Grant Agreement no. 628360. Also, the research leading to these results was funded by the UK Engineering and Physical Sciences Research Council (EPSRC) under grant no. EP/K019589/1 (the COMIT Project) and the EU-Japan initiative under European Commission FP7 grant agreement no. 608518 and NICT contract no. 167 (the GreenICN Project).

Vasilis Sourlas received his Diploma degree from the Computer Engineering and Informatics Department, University of Patras, Greece, in 2004 and the M.Sc. degree in Computer Science and Engineering from the same department in 2006. In 2013 he received his PhD from the Department of Electrical and Computer Engineering, University of Thessaly (Volos), Greece. From 2013 to 2014 he was a post-doc research associate at the Centre for Research and Technology, Hellas (CERTH). In Jan. 2015 he joined the

References (63)

  • K.W. Ross

    Hash-routing for collections of shared web caches

    IEEE Netw.

    (1997)
  • D. Karger et al.

    Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web

    ACN STOC

    (1997)
  • Y. Wang et al.

    Advertising cached contents in the control plane: necessity and feasibility

    IEEE INFOCOM WKSHPS

    (2012)
  • S. Eum et al.

    Catt: potential based routing with content caching for icn

    ACM ICN Workshop

    (2012)
  • H.g. Choi et al.

    Corc: Coordinated routing and caching for named data networking

    ACM/IEEE ANCS

    (2014)
  • L. Saino et al.

    Hash-routing schemes for information centric networking

    ACM ICN Workshop

    (2013)
  • T. Berners-Lee et al.

    Uniform resource identifier (URI) generic syntax

    RFC 3986

    (2005)
  • S. Farrell et al.

    Naming things with hashes

    RFC 6920

    (2013)
  • V. Sourlas et al.

    Distributed cache management in information-centric networks

    IEEE Trans. Netw. Serv. Manag.

    (2013)
  • Y. Chen et al.

    Efficient and adaptive web replication using content clustering

    IEEE JSAC

    (2003)
  • L. Kaufman et al.

    Clustering by means of medoids, reports of the faculty of mathematics and informatics

    Facul. Math. Inform.

    (1987)
  • P.K. Agyapong et al.

    Economic incentives in information-centric networking: implications for protocol design and public policy

    IEEE Commun. Mag.

    (2012)
  • G. Dán

    Cache-to-cache: could ISPs cooperate to decrease peer-to-peer content distribution costs?

    IEEE Trans. Parallel Distrib. Syst.

    (2011)
  • V. Pacifici et al.

    Content-peering dynamics of autonomous caches in a content-centric network

    IEEE INFOCOM

    (2013)
  • F. Kocak et al.

    The effect of caching on a model of content and access provider revenues in information-centric networks

    IEEE SocialCom

    (2013)
  • T.M. Pham et al.

    Pricing in information-centric network interconnection

    IFIP Networking

    (2013)
  • A. Araldo et al.

    Cost-aware caching: caching more (costly items) for less (ISPs operational expenditures)

    IEEE Trans. Parallel Distrib. Syst.

    (2015)
  • S. Borst et al.

    Distributed caching algorithms for content distribution networks

    IEEE INFOCOM

    (2010)
  • I.D. Baev et al.

    Approximation algorithms for data placement in arbitrary networks

    ACM-SIAM Symposium on Discrete Algorithms

    (2001)
  • J. Li et al.

    Popularity-driven coordinated caching in named data networking

    ACM/IEEE ANCS

    (2012)
  • G. Tyson et al.

    A trace-driven analysis of caching in content-centric networks

    ICCCN

    (2012)
  • Cited by (27)

    • A functional taxonomy of caching schemes: Towards guided designs in information-centric networks

      2019, Computer Networks
      Citation Excerpt :

      On the other hand, implicit collaborated schemes [10,34,39,110,121,127,136] have medium communication overhead as these schemes advertise or flood or exchange content caching or routing related control messages within a small number of cache routers most of the time hierarchically along content delivery paths instead of globally for taking collaborative caching decision. Non-cooperative schemes [35,116,119,127,145,146] produce low communication overhead as for these schemes cache routers take content caching and routing decision independently without collaborating with one another either exchanging very little or no control message. In on-path based schemes [35,36,111,117,127,131] no information or very little caching information is exchanged or broadcasted.

    • Improving NDN forwarding engine performance by rendezvous-based caching and forwarding

      2018, Computer Networks
      Citation Excerpt :

      In contrast, explicit ones often have better coordination between cache nodes but with higher complexity. One exception is the hash-based cache coordination such as proposed in [17–20] because of its simplicity. A hybrid cache coordination that shares the benefits of both implicit and explicit cache coordination was proposed in our previous work [21].

    • Popularity-aware dynamic clustering scheme for distributed caching in ICN

      2022, ICN 2022 - Proceedings of the 2022 9th ACM Conference on Information-Centric Networking
    View all citing articles on Scopus

    Vasilis Sourlas received his Diploma degree from the Computer Engineering and Informatics Department, University of Patras, Greece, in 2004 and the M.Sc. degree in Computer Science and Engineering from the same department in 2006. In 2013 he received his PhD from the Department of Electrical and Computer Engineering, University of Thessaly (Volos), Greece. From 2013 to 2014 he was a post-doc research associate at the Centre for Research and Technology, Hellas (CERTH). In Jan. 2015 he joined the Electronic and Electrical Engineering Department, UCL, London to pursue his two years Marie Curie IEF fellowship. His main interests are in the area of Information-Centric Networks and Future Internet.

    Ioannis Psaras is an EPSRC Fellow at the Electrical and Electronic Engineering Department of UCL. He is interested in resource management techniques for current and future networking architectures with particular focus on routing, replication, caching and congestion control. Before joining UCL in 2010, he held positions at the University of Surrey, and Democritus University of Thrace, Greece, where he also obtained his PhD in 2008. In 2004 he won the Ericsson Award of Excellence in Telecommunications for his diploma dissertation. He has held research intern positions at DoCoMo Eurolabs and Ericsson Eurolabs.

    Lorenzo Saino received a B.S. in Telecommunications Engineering from Politecnico di Milano (Italy) in 2007 and a M.S. in Telecommunications from University College London (UK) in 2008. In 2015 he received his PhD from the Department of Electronic and Electrical Engineering at University College London, UK. From 2008 to 2011 he was a research engineer at Orange Labs where he carried out research in various subjects, including network mobility, mobile service design, smart cards, mobile cloud computing and information security. He was the recipient of the Orange Labs best patent of the year award in 2011. His current research interests include distributed systems and computer networks with particular focus on networked caching systems.

    George Pavlou is Professor of Communication Networks in the Department of Electronic and Electrical Engineering, University College London, UK where he co-ordinates research activities in networking and network management. He received a Diploma in Engineering from the National Technical University of Athens, Greece and M.S. and Ph.D. degrees in Computer Science from University College London, UK. His research interests focus on networking and network management, including aspects such as traffic engineering, quality of service management, policy-based systems, autonomic networking, information-centric networking and software-defined networks. He has been instrumental in a number of European and UK research projects that produced significant results with real-world uptake and has contributed to standardisation activities in ISO, ITU-T and IETF. He has been the technical program chair of several conferences and in 2011 he received the Daniel Stokesbury award for “distinguished technical contribution to the growth of the network management field”.

    View full text