Abstract
The size of data sets being collected and analyzed in the industry for business intelligence is growing rapidly, making traditional warehousing solutions prohibitively expensive. Hadoop [3] is a popular open-source map-reduce implementation which is being used as an alternative to store and process extremely large data sets on commodity hardware. However, the map-reduce programming model is very low level and requires developers to write custom programs which are hard to maintain and reuse.
- A. Pavlo et. al. A Comparison of Approaches to Large-Scale Data Analysis. Proc. ACM SIGMOD, 2009. Google ScholarDigital Library
- C. Ronnie et al. SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets. Proc. VLDB Endow., 1(2):1265--1276, 2008. Google ScholarDigital Library
- Apache Hadoop. Available at http://wiki.apache.org/hadoop.Google Scholar
- Hive Performance Benchmark. Available at https://issues.apache.org/jira/browse/HIVE-396.Google Scholar
- Hive Language Manual. Available at http://wiki.apache.org/hadoop/Hive/LanguageManual.Google Scholar
- Facebook Lexicon. Available at http://www.facebook.com/lexicon.Google Scholar
- Apache Pig. http://wiki.apache.org/pig.Google Scholar
- Apache Thrift. http://incubator.apache.org/thrift.Google Scholar
Index Terms
- Hive: a warehousing solution over a map-reduce framework
Recommendations
Major technical advancements in apache hive
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of DataApache Hive is a widely used data warehouse system for Apache Hadoop, and has been adopted by many organizations for various big data analytics applications. Closely working with many users and organizations, we have identified several shortcomings of ...
Comments