what i learnt - Data and Analytics: December 2014

Saturday, December 27, 2014

Apache Flink setup on ubuntu

Apache Flink

Compines feature from RDBMS ( query optimization capabilities)
and MapReduce (scalability)
Write like a programming language, execute like a database
Like Spark, Flink execution engine that aggressively uses
in-memory execution, but very gracefully degrades to
disk-based execution when memory is not enough
Flink support filesystems : HDFS, HBase, Local FS, S3, JDBC.
Run on Local, Cluster and YARN

In this blog will see how to Setup Apache Flink on local mode,
once it's done will Execute / Run Flink job on the files which is stored in HDFS.

#Download the latest Flink and un-tar the file.

bdalab@bdalabsys:/$ tar -xvzf flink-0.8-incubating-SNAPSHOT-bin-hadoop2.tgz

#rename the folder

bdalab@bdalabsys:/$ mv flink-0.8-incubating-SNAPSHOT/ flink-0.8

#move the working dir into flink_home

bdalab@bdalabsys:/$ cd flink-0.8

#start Flink on local mode

bdalab@bdalabsys:flink-0.8/$ ./bin/start-local.sh

#JobManager will started by above command. check the status by

bdalab@bdalabsys:flink-0.8/$ jps
6740 Jps
6725 JobManager

#JobManager web UI will started by default on port 8081

localhost:8081

Now we have everything up & running. will try to Run job.
as we all are aware a familier WordCount example in distributed
computing, lets begin with WordCount in Flink

#*-WordCount.jar file available under $FLINK_HOME/examples

bdalab@bdalabsys:flink-0.8/$ bin/flink run examples/flink-java-examples-0.8-incubating-SNAPSHOT-WordCount.jar /home/ipPath /home/flinkop

Above command, will run on file from local and store the result back to
local file system.

#If we want to process the same in HDFS

bdalab@bdalabsys:flink-0.8/$ bin/flink run examples/flink-java-examples-0.8-incubating-SNAPSHOT-WordCount.jar hdfs://localhost:9000/ip/tvvote hdfs://localhost:9000/op/

make sure HDFS daemons are up&running . else will get an error.
#bin/flink has 4 major Action.

run #runs a program
info #displays information about a program.
list #lists running and finished programs. -r & -s
cancel #cancels a running program. -i

#Display the running JobID by

bdalab@bdalabsys:flink-0.8/$bin/flink list -r -s

In Next blog will explain you the Setup Flink on Cluster mode

Tuesday, December 16, 2014

Simple way to configuring mysql or Postgresql RDBMS as Hive metastore

simple way to configuring mysql or Postgresql RDBMS as Hive metastore

Hive will store the metadata information (i.e like RDBMS will stores the table
and column information) out of HDFS and it will process the data available in HDFS.

By default Hive store its metastore into Derby a lightweight database.
which will serve single instance at a time. If you try to start mutltiple instance of Hive, you will get error like
"Another instance of Derby may have already booted the database".

In this will see how we can configure other RDBMS (MySQL & PostgreSQL) as Hive metastore.

Create / rename hive-default.xml.template TO hive-site.xml under $HIVE_HOME/conf

hadoop@solai# vim.tiny $HIVE_HOME/conf/hive-default.xml

change the value of the following property




<property>

<name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:mysql://localhost:3306/hivedb</value> 

</property>

<property>

<name>javax.jdo.option.ConnectionDriverName</name>

<value>com.mysql.jdbc.Driver</value>

</property>

<property>

<name>javax.jdo.option.ConnectionUserName</name> 

<value>mysqlroot</value> 

</property>

<property>

<name>javax.jdo.option.ConnectionPassword</name> 

<value>hive@123</value> 
</property>

download and plcae the "mysql-connector-java-5.x.xx-bin.jar" to the $HIVE_HOME/lib

hadoop@solai# mv /home/hadoop/Downloads/mysql-connector-java-5.1.31.tar.gz $HIVE_HOME/lib

In Mysql create database "hivedb" and load the hive schema to the database "hivedb"

mysql> create database hivedb;
mysql> use hivedb;

## following will create hive schema in mysql database.
mysql> SOURCE $HIVE_HOME/scripts/metastore/upgrade/mysql/hive-schema-0.12.0.mysql.sql

its important to restrict user to alter / delete hivedb.

mysql> CREATE USER 'mysqlroot'@'hivedb' IDENTIFIED BY 'hive@123';

mysql> REVOKE ALL PRIVILEGES, GRANT OPTION FROM 'mysqlroot'@'hivedb';

mysql> GRANT SELECT,INSERT,UPDATE,DELETE,LOCK TABLES,EXECUTE ON hivedb.* TO 'mysqlroot'@'localhost';

mysql> FLUSH PRIVILEGES;

mysql> quit;

Enter Hive CLI, for create table

hadoop@solai#$HIVE_HOME/bin/hive

hive> create table testHiveMysql(uname string, uplace string);

enter into mysql to check the schema information created in hive environment. following lines will return the table and column information.

mysql> select * from TBLS;
mysql> select * from COLUMNS_V2;
mysql> show tables;

show tables, will return all the tables pertaining to the Hive schema

what i learnt - Data and Analytics

Pages

Saturday, December 27, 2014

Apache Flink setup on ubuntu

Tuesday, December 16, 2014

Simple way to configuring mysql or Postgresql RDBMS as Hive metastore

Labels