Issue
This question aims to understand better java/scala classpath and conflicting dependencies resolution by asking about an "unexpected" (for me) issue:
I've packed two separate scala codebases into two different uber-jars. I pack them as uber-jars because they should be running from the same machine but have conflicting dependencies between them.
What did I expect? Each uber-jar resolves dependencies only from itself, hence no conflicts, no shading issues, etc.
What did happen? Conflicting dependencies. One of the uber-jars tried pointing to a different version of a dependency that exits in the second uber-jar.
I expected the jar always to try to resolve dependencies on the closest classpath (maybe I am mixing some terms here), before "looking around". I have a naive expectation and will highly appreciate a clear explanation (and/or learning resources).
Details:
- The uber-jars were packaged using sbt-assembly:
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "1.0.0")
- with the following options:
assembly / assemblyOption ~= {
_.withIncludeScala(false)
}
assembly / assemblyMergeStrategy := {
case "META-INF/services/org.apache.spark.sql.sources.DataSourceRegister" => MergeStrategy.concat
case "META-INF/services/com.fasterxml.jackson.databind.Module" => MergeStrategy.concat
case "META-INF/services/com.fasterxml.jackson.core.JsonFactory" => MergeStrategy.concat
case "META-INF/services/com.fasterxml.jackson.core.ObjectCodec" => MergeStrategy.concat
case "META-INF/services/org.apache.spark.deploy.history.EventFilterBuilder" => MergeStrategy.concat
case "META-INF/services/org.glassfish.jersey.internal.spi.AutoDiscoverable" => MergeStrategy.concat
case PathList("org", "apache", "spark", "unused", xs@_*) => MergeStrategy.discard
case "UnusedStubClass.class" => MergeStrategy.discard
case "module-info.class" => MergeStrategy.rename
case "META-INF/MANIFEST.MF" => MergeStrategy.discard
case PathList(ps@_*) if ps.last contains "NOTICE" => MergeStrategy.discard
case PathList(ps@_*) if ps.last contains "LICENSE" => MergeStrategy.discard
case PathList(ps@_*) if ps.last contains "pom.properties" => MergeStrategy.discard
case PathList(ps@_*) if ps.last contains "pom.xml" => MergeStrategy.discard
case PathList(ps@_*) if ps.last contains "git.properties" => MergeStrategy.discard
case PathList(ps@_*) if ps.last contains "io.netty.versions.properties" => MergeStrategy.discard
case PathList(ps@_*) if ps.last contains "DUMMY.SF" => MergeStrategy.discard
case PathList(ps@_*) if ps.last contains "DUMMY.DSA" => MergeStrategy.discard
case "META-INF/DEPENDENCIES" => MergeStrategy.discard
case x => MergeStrategy.deduplicate
}
The two uber-jars are spark jobs running one after the other in the same cluster.
When I ran them in the same cluster, uber-jar2 failed with conflicts. When I ran in different clusters, both worked perfectly fine.
Solution
JAR is just a ZIP of directories containing .class
files and apps resources.
Uber JAR is just taking all your dependencies (other JARs - extracted, compilation output) and put them in a single archive so that whatever uses them, don't have to fetch other JARs.
If you build 2 Uber JARs with different versions of the same dependency then when you'll try to load them in the same ClassLoader at one, it will have issues because there will be 2 .class files of the same class.
So if you always intend to deploy for the same cluster, just bundle them together. And if you want to deploy them separately, it would be easier to not build 2 uberjars because dependencies will overlap. You could e.g. build 2 unerjar of dependencies and make your code depend on it (then have 2 JARs on the class path), or use whatever other strategy that avoids conflicts.
Answered By - Mateusz Kubuszok
Answer Checked By - David Marino (JavaFixing Volunteer)