Thursday, August 18, 2022

[FIXED] Why two java/scala uber jars running on the same cluster bump into shading issues?

August 18, 2022 jar, java, maven, sbt, scala

Issue

This question aims to understand better java/scala classpath and conflicting dependencies resolution by asking about an "unexpected" (for me) issue:

I've packed two separate scala codebases into two different uber-jars. I pack them as uber-jars because they should be running from the same machine but have conflicting dependencies between them.

What did I expect? Each uber-jar resolves dependencies only from itself, hence no conflicts, no shading issues, etc.

What did happen? Conflicting dependencies. One of the uber-jars tried pointing to a different version of a dependency that exits in the second uber-jar.

I expected the jar always to try to resolve dependencies on the closest classpath (maybe I am mixing some terms here), before "looking around". I have a naive expectation and will highly appreciate a clear explanation (and/or learning resources).

Details:

The uber-jars were packaged using sbt-assembly:

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "1.0.0")

with the following options:

assembly / assemblyOption ~= {
  _.withIncludeScala(false)
}

assembly / assemblyMergeStrategy := {
  case "META-INF/services/org.apache.spark.sql.sources.DataSourceRegister" => MergeStrategy.concat
  case "META-INF/services/com.fasterxml.jackson.databind.Module" => MergeStrategy.concat
  case "META-INF/services/com.fasterxml.jackson.core.JsonFactory" => MergeStrategy.concat
  case "META-INF/services/com.fasterxml.jackson.core.ObjectCodec" => MergeStrategy.concat
  case "META-INF/services/org.apache.spark.deploy.history.EventFilterBuilder" => MergeStrategy.concat
  case "META-INF/services/org.glassfish.jersey.internal.spi.AutoDiscoverable" => MergeStrategy.concat


  case PathList("org", "apache", "spark", "unused", xs@_*) => MergeStrategy.discard
  case "UnusedStubClass.class" => MergeStrategy.discard
  case "module-info.class" => MergeStrategy.rename
  case "META-INF/MANIFEST.MF" => MergeStrategy.discard
  case PathList(ps@_*) if ps.last contains "NOTICE" => MergeStrategy.discard
  case PathList(ps@_*) if ps.last contains "LICENSE" => MergeStrategy.discard
  case PathList(ps@_*) if ps.last contains "pom.properties" => MergeStrategy.discard
  case PathList(ps@_*) if ps.last contains "pom.xml" => MergeStrategy.discard
  case PathList(ps@_*) if ps.last contains "git.properties" => MergeStrategy.discard
  case PathList(ps@_*) if ps.last contains "io.netty.versions.properties" => MergeStrategy.discard
  case PathList(ps@_*) if ps.last contains "DUMMY.SF" => MergeStrategy.discard
  case PathList(ps@_*) if ps.last contains "DUMMY.DSA" => MergeStrategy.discard

  case "META-INF/DEPENDENCIES" => MergeStrategy.discard

  case x => MergeStrategy.deduplicate
}

The two uber-jars are spark jobs running one after the other in the same cluster.
When I ran them in the same cluster, uber-jar2 failed with conflicts. When I ran in different clusters, both worked perfectly fine.

Solution

JAR is just a ZIP of directories containing .class files and apps resources.

Uber JAR is just taking all your dependencies (other JARs - extracted, compilation output) and put them in a single archive so that whatever uses them, don't have to fetch other JARs.

If you build 2 Uber JARs with different versions of the same dependency then when you'll try to load them in the same ClassLoader at one, it will have issues because there will be 2 .class files of the same class.

So if you always intend to deploy for the same cluster, just bundle them together. And if you want to deploy them separately, it would be easier to not build 2 uberjars because dependencies will overlap. You could e.g. build 2 unerjar of dependencies and make your code depend on it (then have 2 JARs on the class path), or use whatever other strategy that avoids conflicts.

Answered By - Mateusz Kubuszok
Answer Checked By - David Marino (JavaFixing Volunteer)

This Answer collected from stackoverflow and tested by JavaFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, August 18, 2022

[FIXED] Why two java/scala uber jars running on the same cluster bump into shading issues?

Issue

Solution

Popular Posts

Labels