ABSTRACT
Apache Hadoop is a predominant software framework for distributed compute and storage with capability to handle huge amounts of data, usually referred to as Big Data. This data collected from different enterprises and government agencies often includes private and sensitive information, which needs to be secured from unauthorized access. This paper proposes extensions to the current authorization capabilities offered by Hadoop core and other ecosystem projects, specifically Apache Ranger and Apache Sentry. We present a fine-grained attribute-based access control model, referred as HeABAC, catering to the security and privacy needs of multi-tenant Hadoop ecosystem. The paper reviews the current multi-layered access control model used primarily in Hadoop core (2.x), Apache Ranger (version 0.6) and Sentry (version 1.7.0), as well as a previously proposed RBAC extension (OT-RBAC). It then presents a formal attribute-based access control model for Hadoop ecosystem, including the novel concept of cross Hadoop services trust. It further highlights different trust scenarios, presents an implementation approach for HeABAC using Apache Ranger and, discusses the administration requirements of HeABAC operational model. Some comprehensive, real-world use cases are also discussed to reflect the application and enforcement of the proposed HeABAC model in Hadoop ecosystem.
- Mohammad A Al-Kahtani and Ravi Sandhu. 2002. A model for attribute-based user-role assignment. Proc. of ACSAC. IEEE, 353--362. Google ScholarDigital Library
- Smriti Bhatt, Farhan Patwa, and Ravi Sandhu. 2017. ABAC with Group Attributes and Attribute Hierarchies Utilizing the Policy Machine Proc. of ABAC Workshop. ACM, 17--28. Google ScholarDigital Library
- Pietro Colombo and Elena Ferrari. 2015. Complementing mongodb with advanced access control features: Concepts and research challenges Proc. of SEBD 2015.Google Scholar
- Pietro Colombo and Elena Ferrari. 2015. Privacy aware access control for Big Data: a research roadmap. Big Data Research, Vol. 2, 4 (2015), 145--154. Google ScholarDigital Library
- Jason Crampton and George Loizou. 2003. Administrative scope: A foundation for role-based administrative models. ACM Transactions on Information and System Security (TISSEC), Vol. 6, 2 (2003), 201--231. Google ScholarDigital Library
- Devaraj Das, Owen O'Malley, Sanjay Radia, and Kan Zhang. 2011. Adding security to Apache Hadoop. Hortonworks, IBM (2011).Google Scholar
- Philip Derbeko, Shlomi Dolev, Ehud Gudes, and Shantanu Sharma. 2016. Security and privacy aspects in MapReduce on clouds: A survey. Computer Science Review Vol. 20 (2016), 1--28. Google ScholarDigital Library
- David F Ferraiolo, Ravi Sandhu, Serban Gavrila, D Richard Kuhn, and Ramaswamy Chandramouli. 2001. Proposed NIST standard for role-based access control. ACM Transactions on Information and System Security (TISSEC), Vol. 4, 3 (2001), 224--274. Google ScholarDigital Library
- Maanak Gupta, Farhan Patwa, James Benson, and Ravi Sandhu. 2017. Multi-Layer Authorization Framework for a Representative Hadoop Ecosystem Deployment Proc. of the 22nd ACM on Symposium on Access Control Models and Technologies (SACMAT). ACM, New York, NY, USA, 183--190. Google ScholarDigital Library
- Maanak Gupta, Farhan Patwa, and Ravi Sandhu. 2017. Object-Tagged RBAC Model for the Hadoop Ecosystem. Proc. of Data and Applications Security and Privacy XXXI: DBSec 2017, Philadelphia, PA, USA, July 19--21, 2017. Springer, 63--81.Google ScholarCross Ref
- Maanak Gupta, Farhan Patwa, and Ravi Sandhu. 2017 b. POSTER: Access Control Model for the Hadoop Ecosystem Proc. of the 22Nd ACM on Symposium on Access Control Models and Technologies (SACMAT). ACM, New York, NY, USA, 125--127. Google ScholarDigital Library
- Maanak Gupta and Ravi Sandhu. 2016. The $mathrmGURA_G$ Administrative Model for User and Group Attribute Assignment Proc. of NSS. Springer, 318--332.Google Scholar
- Robert Hryniewicz. 2016 a. Best Practices in HDFS Autorization with Apache Ranger. https://hortonworks.com/blog/best-practices-in-hdfs-authorization-with-apache-ranger/. (2016).Google Scholar
- Robert Hryniewicz. 2016 b. Best Practices in Hive Autorization with Apache Ranger. https://hortonworks.com/blog/best-practices-for-hive-authorization-using-apache-ranger -in-hdp-2--2/. (2016).Google Scholar
- Vincent C Hu, David Ferraiolo, Rick Kuhn, Arthur R Friedman, Alan J Lang, Margaret M Cogdell, Adam Schnitzer, Kenneth Sandlin, Robert Miller, Karen Scarfone, et almbox.. 2014 a. Guide to attribute based access control (ABAC) definition and considerations. NIST Special Publication Vol. 800, 162 (2014).Google Scholar
- Vincent C Hu, Tim Grance, David F Ferraiolo, and D Rick Kuhn. 2014 b. An access control scheme for big data processing. Proc. of CollaborateCom. IEEE, 1--7.Google ScholarCross Ref
- Vincent C Hu, D Richard Kuhn, and David F Ferraiolo. 2015. Attribute-based access control. IEEE Computer 2 (2015), 85--88.Google ScholarDigital Library
- Xin Jin, Ram Krishnan, and Ravi Sandhu. 2012 a. A role-based administration model for attributes. Proc. of the First International Workshop on Secure and Resilient Architectures and Systems. ACM, 7--12. Google ScholarDigital Library
- Xin Jin, Ram Krishnan, and Ravi Sandhu. 2012 b. A unified attribute-based access control model covering DAC, MAC and RBAC Proc. of IFIP Annual Conference on Data and Applications Security and Privacy. Springer, 41--55. Google ScholarDigital Library
- Xin Jin, Ravi Sandhu, and Ram Krishnan. 2012 c. RABAC: role-centric attribute-based access control Proc. of MMM-ACNS. Springer, 84--96. Google ScholarDigital Library
- D Richard Kuhn, Edward J Coyne, and Timothy R Weil. 2010. Adding attributes to role-based access control. Computer, Vol. 43, 6 (2010), 79--81. Google ScholarDigital Library
- Rongxing Lu, Hui Zhu, Ximeng Liu, Joseph K Liu, and Jun Shao. 2014. Toward efficient and privacy-preserving computing in big data era. IEEE Network, Vol. 28, 4 (2014), 46--50.Google ScholarCross Ref
- David Nunez, Isaac Agudo, and Javier Lopez. 2014. Delegated Access for Hadoop Clusters in the Cloud. Proc. of CloudCom. IEEE, 374--379. Google ScholarDigital Library
- Owen O'Malley, Kan Zhang, Sanjay Radia, Ram Marti, and Christopher Harrell. 2009. Hadoop security design. Yahoo, Inc., Tech. Rep (2009).Google Scholar
- Navid Pustchi, Ram Krishnan, and Ravi Sandhu. 2015. Authorization federation in IaaS multi cloud. In Proc. of the 3rd International Workshop on Security in Cloud Computing. ACM, 63--71. Google ScholarDigital Library
- Ravi Sandhu, Venkata Bhamidipati, and Qamar Munawer. 1999. The ARBAC97 model for role-based administration of roles. ACM Transactions on Information and System Security (TISSEC), Vol. 2, 1 (1999), 105--135. Google ScholarDigital Library
- Ravi S Sandhu, Edward J Coyne, Hal L Feinstein, and Charles E Youman. 1996. Role-based access control models. IEEE Computer, Vol. 29, 2 (1996), 38--47. Google ScholarDigital Library
- Johannes S"anger, Christian Richthammer, Sabri Hassan, and Günther Pernul. 2014. Trust and big data: A roadmap for research. In Proc. of DEXA. IEEE, 278--282. Google ScholarDigital Library
- NIST Big Data Public Working Group, Security and Privacy Subgroup. 2017. DRAFT: NIST Big Data Interoperability Framework: Volume 4, Security and Privacy. NIST Special Publication Vol. 1500, 4 (2017).Google Scholar
- Daniel Servos and Sylvia L Osborn. 2014. HGABAC: Towards a formal model of hierarchical attribute-based access control Proc. of International Symposium on Foundations and Practice of Security. Springer, 187--204.Google Scholar
- Priya P Sharma and Chandrakant P Navdeti. 2014. Securing big data Hadoop: a review of security issues, threats and solution. IJCSIT Vol. 5 (2014).Google Scholar
- Jordi Soria-Comas and Josep Domingo-Ferrer. 2016. Big data privacy: challenges to privacy principles and models. Data Science and Engineering Vol. 1, 1 (2016), 21--28.Google ScholarCross Ref
- Ben Spivey and Joey Echeverria. 2015. Hadoop Security. Protecting your Platform. " O'Reilly Media, Inc.". Google ScholarDigital Library
- Bo Tang and Ravi Sandhu. 2013. Cross-tenant trust models in cloud computing. In Proc. of 14th International Conference on Information Reuse and Integration (IRI). IEEE, 129--136.Google ScholarCross Ref
- Bo Tang and Ravi Sandhu. 2014. Extending openstack access control with domain trust Proc. of International Conference on Network and System Security. Springer, 54--69.Google Scholar
- Bo Tang, Ravi Sandhu, and Qi Li. 2015. Multi-tenancy authorization models for collaborative cloud services. Concurrency and Computation: Practice and Experience, Vol. 27, 11 (2015), 2851--2868. Google ScholarDigital Library
- Omer Tene and Jules Polonetsky. 2012. Privacy in the age of big data: a time for big decisions. Stanford Law Review Online Vol. 64 (2012), 63.Google Scholar
- Huseyin Ulusoy, Pietro Colombo, Elena Ferrari, Murat Kantarcioglu, and Erman Pattuk. 2015. GuardMR: Fine-grained security policy enforcement for MapReduce systems Proc. of ASIACCS. ACM, 285--296. Google ScholarDigital Library
- Huseyin Ulusoy, Murat Kantarcioglu, Erman Pattuk, and Kevin Hamlen. 2014. Vigiles: Fine-grained access control for mapreduce systems Proc. of Big Data Congress. IEEE, 40--47. Google ScholarDigital Library
- Lingyu Wang, Duminda Wijesekera, and Sushil Jajodia. 2004. A logic-based framework for attribute based access control Proc. of Workshop on Formal methods in security engineering. ACM, 45--55. Google ScholarDigital Library
- Tom White. 2012. Hadoop: The Definitive Guide. O'Reilly Media, Inc. Google ScholarDigital Library
- Eric Yuan and Jin Tong. 2005. Attributed based access control (ABAC) for web services Proc. of International Conference on Web Services. IEEE. Google ScholarDigital Library
- Wenrong Zeng, Yuhao Yang, and Bo Luo. 2013. Access control for Big Data using data content. In Proc. of International Conference on Big Data. IEEE, 45--47.Google ScholarCross Ref
- Jiaqi Zhao, Lizhe Wang, Jie Tao, Jinjun Chen, Weiye Sun, Rajiv Ranjan, Joanna Kołodziej, Achim Streit, and Dimitrios Georgakopoulos. 2014. A security framework in G-Hadoop for big data computing across distributed cloud data centres. JCSS, Vol. 80, 5 (2014), 994--1007.Google ScholarCross Ref
Index Terms
- An Attribute-Based Access Control Model for Secure Big Data Processing in Hadoop Ecosystem
Recommendations
POSTER: Access Control Model for the Hadoop Ecosystem
SACMAT '17 Abstracts: Proceedings of the 22nd ACM on Symposium on Access Control Models and TechnologiesApache Hadoop is an important framework for fault-tolerant and distributed storage and processing of Big Data. Hadoop core platform along with other open-source tools such as Apache Hive, Storm, HBase offer an ecosystem to enable users to fully harness ...
Multi-Layer Authorization Framework for a Representative Hadoop Ecosystem Deployment
SACMAT '17 Abstracts: Proceedings of the 22nd ACM on Symposium on Access Control Models and TechnologiesApache Hadoop is a predominant software framework to store and process vast amount of data, produced in varied formats. Data stored in Hadoop multi-tenant data lake often includes sensitive data such as social security numbers, intelligence sources and ...
The implementation of data storage and analytics platform for big data lake of electricity usage with spark
AbstractElectricity data could generate a large number of records from smart meter day by day. The traditional architecture might not properly handle the increasingly dynamic data that need flexibility. For effective storing and analytics, efficient ...
Comments