Issue
What I am trying to accomplish is the following:
Have Eclipse run the Spark code
Have the master set as "spark://spark-master:7077"
The spark-master is set on a virtual machine by going to a sbin directory and executing:
sh start-all.sh
and then
/bin/spark-class org.apache.spark.deploy.worker.Worker spark://spark-master:7077
This is what UI displays:
The version I have on a virtual machine is: Spark 1.3.1, Hadoop 2.6
on Eclipse (with Maven), I have installed: spark-core_2.10, Spark 1.3.10
When I set the master to be "local", there are no errors.
When I try to run the simple PI example setting the master to "spark://spark-master:7077", I get the error:
15/04/21 16:49:20 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, spark-master): java.lang.ClassNotFoundException: mavenj.testing123$1
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:65)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
15/04/21 16:49:20 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 2, spark-master, PROCESS_LOCAL, 1001329 bytes)
15/04/21 16:49:20 INFO TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1) on executor spark-master: java.lang.ClassNotFoundException (mavenj.testing123$1) [duplicate 1]
15/04/21 16:49:21 INFO TaskSetManager: Starting task 1.1 in stage 0.0 (TID 3, spark-master, PROCESS_LOCAL, 1001329 bytes)
15/04/21 16:49:21 INFO TaskSetManager: Lost task 0.1 in stage 0.0 (TID 2) on executor spark-master: java.lang.ClassNotFoundException (mavenj.testing123$1) [duplicate 2]
15/04/21 16:49:21 INFO TaskSetManager: Starting task 0.2 in stage 0.0 (TID 4, spark-master, PROCESS_LOCAL, 1001329 bytes)
15/04/21 16:49:21 INFO TaskSetManager: Lost task 1.1 in stage 0.0 (TID 3) on executor spark-master: java.lang.ClassNotFoundException (mavenj.testing123$1) [duplicate 3]
15/04/21 16:49:21 INFO TaskSetManager: Starting task 1.2 in stage 0.0 (TID 5, spark-master, PROCESS_LOCAL, 1001329 bytes)
15/04/21 16:49:21 INFO TaskSetManager: Lost task 1.2 in stage 0.0 (TID 5) on executor spark-master: java.lang.ClassNotFoundException (mavenj.testing123$1) [duplicate 4]
15/04/21 16:49:21 INFO TaskSetManager: Starting task 1.3 in stage 0.0 (TID 6, spark-master, PROCESS_LOCAL, 1001329 bytes)
15/04/21 16:49:22 INFO TaskSetManager: Lost task 1.3 in stage 0.0 (TID 6) on executor spark-master: java.lang.ClassNotFoundException (mavenj.testing123$1) [duplicate 5]
15/04/21 16:49:22 ERROR TaskSetManager: Task 1 in stage 0.0 failed 4 times; aborting job
15/04/21 16:49:22 INFO TaskSchedulerImpl: Cancelling stage 0
15/04/21 16:49:22 INFO TaskSchedulerImpl: Stage 0 was cancelled
15/04/21 16:49:22 INFO DAGScheduler: Stage 0 (reduce at testing123.java:35) failed in 9.762 s
15/04/21 16:49:22 INFO DAGScheduler: Job 0 failed: reduce at testing123.java:35, took 9.907884 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, spark-master): java.lang.ClassNotFoundException: mavenj.testing123$1
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:65)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
Solution
To answer the question (somehow I always find the answer whenever I ask on StackOverflow), to make it work, all the worker-code (if I can call it like that) needs to be put onto a JAR first, together with all the other JARs that are relevant. After the SparkContext is initiated, it is just a matter of making a note of a path using:
sc.addJar("PATH TO JAR");
PS I have tested it with several versions, and it worked. Latest version I tested it with was 1.3.1
EDIT: Make sure that ports are not in a conflict.
Answered By - 3xCh1_23
Answer Checked By - Marilyn (JavaFixing Volunteer)