Friday, September 4, 2015

Running SQL Query on Hadoop : Apache Hive Alternatives

Running SQL Query on Hadoop : Apache Hive Alternatives

Hive is the SQL programmer friendly tool for running SQL query on Hadoop HDFS File system. While running query Hive will convert SQL like query into MapReduce.

Hive is not the only tool will do the same. This post will let give synopsis on open source alternative of Hive.


1) spark sql (previously Shark - Sql on Spark) - will be the best alternative of Hive over Spark. Spark SQL is Spark's module for working with structured data.
2) Cloudera Impala - like Hive but it uses its own execution daemons which we need to install every datanodes in Hadoop cluster. Impala do BI-style Queries on Hadoop.
3) Facebook Presto - like Impala need to install all datanodes. Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.
4) Apache Drill - Schema free SQL for Hadoop. It support multiple datastores HDFS, MongoDB and Hbase


No comments: