Monday, August 31, 2015

Installing Mahout with Apache Spark 1.4.1 : Issues and Solution

Installing Mahout with Apache Spark 1.4.1 : Issues and Solution

In this blog I will discuss the possible error you may get during the installation with how to resolve those.

The Error which I listed here based on the sequence which i got during my installation.

Cannot find Spark class path. Is 'SPARK_HOME' set?

cd $MAHOUT_HOME

bin/mahout spark-shell

Got error Cannot find Spark class path. Is 'SPARK_HOME' set?

Solution
Issue is in bin/mahout file , its point to compute-classpath.sh under $SPARK_HOME/bin dir. But in my $SPARK_HOME/bin i didn't find any such a file.

Add compute-classpath.sh under $SPARK_HOME/bin dir.

In my case I just copied it from older version i.e spark1.1


ERROR: Could not find mahout-examples-*.job

cd $MAHOUT_HOME

bin/mahout spark-shell

ERROR: Could not find mahout-examples-*.job in /media/bdalab/bdalab/sw/mahout or /media/bdalab/bdalab/sw/mahout/examples/target, please run 'mvn install' to create the .job file

Solution
set MAHOUT_LOCAL variable to true, to avoid the error.

export MAHOUT_LOCAL=true


Error: Could not find or load main class org.apache.mahout.driver.MahoutDriver

cd $MAHOUT_HOME

bin/mahout spark-shell

MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath.

MAHOUT_LOCAL is set, running locally

Error: Could not find or load main class org.apache.mahout.driver.MahoutDriver

Solution
It indicate that need to install mahout driver.

root@solai[bin]# mvn -DskipTests -X clean install

[INFO] Scanning for projects...

[INFO] ------------------------------

[ERROR] BUILD FAILURE

[INFO] ---------------------------------

[INFO] Unable to build project '/media/bdalab/bdalab/sw/mahout/pom.xml; it requires Maven version 3.3.3
Downloaded Latest version of Maven 3.3.3 from repository and unpack it. Run previous command from the Latest Maven bin,

root@solai[bin]# $MAVEN_HOME/bin/mvn -DskipTests -X clean install

org.apache.maven.enforcer.rule.api.EnforcerRuleException: Detected JDK Version: 1.8.0-60 is not in the allowed range [1.7,1.8).
then I have change Java 1.8.06 to 1.7. Now i got this error
root@solai[bin]# $MAVEN_HOME/bin/mvn -DskipTests -X clean install

[INFO] Mahout Build Tools ..... SUCCESS [02:42 min]

[INFO] Apache Mahout ..... SUCCESS [ 0.041 s]

[INFO] Mahout Math ......FAILURE [01:45 min]

[INFO] Mahout HDFS ........ SKIPPED

[INFO] Mahout Map-Reduce ..... SKIPPED

[INFO] Mahout Integration ..... SKIPPED

[INFO] Mahout Examples .........SKIPPED

[INFO] Mahout Math Scala bindings ..... SKIPPED

[INFO] Mahout H2O backend ...... SKIPPED

[INFO] Mahout Spark bindings ..... SKIPPED

[INFO] Mahout Spark bindings shell ..... SKIPPED

[INFO] Mahout Release Package ..... SKIPPED

Caused by: org.eclipse.aether.transfer.ArtifactTransferException: Could not transfer artifact org.apache.maven:maven-core:jar:2.0.6 from/to central (https://repo.maven.apache.org/maven2): GET request of: org/apache/maven/maven-core/2.0.6/maven-core-2.0.6.jar from central failed
I thought error caused because of the networking issues.
running the same command again,
As i guessed installation completed successfully.

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "main"
After succefull installation I was trying to get mahout>

cd $MAHOUT_HOME

bin/mahout spark-shell

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "main"

Solution
export JAVA_TOOL_OPTIONS="-Xmx2048m -XX:MaxPermSize=1024m -Xms1024m"

Sunday, August 30, 2015

Installing Mahout on Spark 1.4.1

Installing Mahout and Spark

In this blog I will describe the step to install Mahout with Apache Spark 1.4.1 (latest version). Also list out the possible Error and remedies.

Installing Mahout & Spark on your local machine

1) Download Apache Spark 1.4.1 and unpack the archive file

2) Change to the directory where you unpacked Spark and type sbt/sbt assembly to build it

3) Make sure right version of maven (3.3) installed in your system. If not install mvn before build Mahout

4) Create a directory for Mahout somewhere on your machine, change to there and checkout the master branch of Apache Mahout from GitHub git clone https://github.com/apache/mahout mahout

5) Change to the mahout directory and build mahout using mvn -DskipTests clean install


Starting Mahout's Spark shell

1) Goto the directory where you unpacked Spark and type sbin/start-all.sh to locally start Spark

2) Open a browser, point it to http://localhost:8080/ to check whether Spark successfully started. Copy the url of the spark master at the top of the page (it starts with spark://)

3) Define the following environment variables:

export MAHOUT_HOME=[directory into which you checked out Mahout]

export SPARK_HOME=[directory where you unpacked Spark]

export MASTER=[url of the Spark master]

4) Finally, change to the directory where you unpacked Mahout and type bin/mahout spark-shell, you should see the shell starting and get the prompt mahout>



In next blog will discuss the possibility of Error while installing Mahout with solution.

Next : Resolved issues - Installing Mahout 0.11.0 with Saprk 1.4.1