Issue
I've a simple Java Spark script. That basically it's to return kafka data:
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.functions;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
public class App
{
public static void main( String[] args )
{
System.out.println( "Hello World!" );
SparkSession spark = SparkSession.builder().appName("Kafka_Load").config("spark.driver.allowMultipleContexts", "true").config("spark.master", "local").getOrCreate();
Dataset<Row> df = spark.readStream().format("kafka").option("kafka.bootstrap.servers", "localhost:9092").option("subscribe", "my_topic").load();
System.out.println( "Hello World1!" );
}
}
That runs well when I run by eclipse using run java application. But when I run "java -jar my_jar.jar" it gives the follwoing exception:
Exception in thread "main" org.apache.spark.sql.AnalysisException: Failed to find data source: kafka. Please deploy the application as per the deployment section of "Structured Streaming + Kafka Integration Guide".;
My pom.xml is the following one:
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.4.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.4.0</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.4.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
<version>2.4.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql-kafka-0-10_2.11</artifactId>
<version>2.4.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.2.0</version>
</dependency>
</dependencies>
How can I solve this?
Solution
First - You need an Uber JAR that actually has the spark-kafka-sql
packages included (refer Maven Assembly plugin documentation or "jar-with-dependencies
"). You also should remove spark-streaming-kafka
since you aren't using it.
Second - you must use spark-submit
to run Spark applications to establish the correct program classpath
runs well when I run by eclipse
Eclipse will default to include the dependencies listed in your POM when running the program from the IDE.
Answered By - OneCricketeer
Answer Checked By - Terry (JavaFixing Volunteer)