Wednesday, April 30, 2014

Configuring Eclipse for Apache Hadoop 1.x/2.x : Debain/BOSS Operating System

Configure to integrate Eclipse IDE for Apache Hadoop on BOSS/Debian OS
Why Configure Eclipse for Apache Hadoop?

  1. From Eclipse to work on Hadoop files systems (HDFS). 
    1. Create New Directory
    2. Upload files to HDFS
    3. Upload Directory to HDFS
    4. Download from HDFS
  2. Write & Execute MapReduce program, which runs on Hadoop cluster
Step to Integrate Eclipse to work on Hadoop Cluster
  1. Hope you have Installed Eclipse, else download & install
  2. Run your Hadoop server, If haven't, setup Hadoop 
  3. Download hadoop-eclipse-plugin-1.2.1.jar and place the Jar into Eclipse/plugins (instead of downloading you can also build the plugin jar file yourself using "ant")
  4. Start the eclipse 
    1. $ECLIPSE_HOME/eclipse
  5. In Eclipse menu click,  Window --> Open Perspective --> Others -->  MapReduce
  6. In bottom MapReduce icon click to Add new Hadoop location
  7. Click to Add Hadoop location
  8. Enter MapReduce & HDFS running port
  9. Enter DFS port and MapReduce port
    1. for recall, MapReduce port (9001) specified in $HADOOP_HOME/conf/mapred-site.xml 
    2. for recall, HDFS port (9000) specified in $HADOOP_HOME/conf/core-site.xml
    3. Enter the Hadoop user name
       
  10. Once Hadoop location added, DFS Locations will be seen/displayed in Eclipse Project Explorer window, (Windows-->Show View-->Project Explorer)
  11. Once Hadoop added, DFS Locations will be seen/displayed in Project Explorer window,
    Display directory in HDFS
  12. Right click DFS location and click to Connect
  13. Once connected successfully, it will display all the DFS Folder.
  14. You can create Directory, Upload files to HDFS location, Download files to local by right click any of the listed Directory.  
HDFS File Management commands


Possible Error you may get

ERROR
              Error: Call to loaclhost/127.0.0.1:9000 failed on connection     exception:java.net:ConnectionException

SOLUTION
             Make sure you have all the Hadoop daemons up&running.