Issue
I apologize if this question is repeated.
Our current environment:
Java 1.17
Spring 5.3.18
Spring batch 4.2.8
At a high level, our architecture intent is to physically separate the launcher threads from the execution threads for our spring batch processes, shotgunning heavy workload steps across the available processors on worker nodes. We have designed the partitioners and flows for this model of operation.
The expectation is that on the worker systems we can have a bunch of "step" beans floating loosely in the JVM, to be partitioned at the "master" JVM, propagated out via AMQ, then picked up and executed asynchronously at the worker VM's.
I have reviewed the documentation at https://docs.spring.io/spring-batch/docs/4.2.x/reference/html/spring-batch-integration.html#remote-partitioning . The example given (and indeed all of the examples I have found to date on the internet) are written as if there is "A" single step that is being run remotely.
Today:
We are using XML bean configuration for the jobs because of some peculiarities with Spring and Java scoping. Ironically, in our case the XML bean definitions offered scoping options that were not available in the Java DSL.
The XML below is an excerpt from a working configuration with a single remote step bean.
On the master side, we have this PartitionHandler configuration:
<bean id="ecPartitionHandler" class="org.springframework.batch.integration.partition.MessageChannelPartitionHandler">
<property name="stepName" value="as-step0002.slave"/>
<property name="jobExplorer" ref="jobExplorer"/>
<property name="messagingOperations" ref="amqMessagingTemplate"/>
</bean>
<int:poller default="true" task-executor="stepTaskExecutor" fixed-delay="1000" />
On the slave side, we have this configuration:
<bean id="stepExecutionRequestHandler"
class="org.springframework.batch.integration.partition.StepExecutionRequestHandler">
<property name="jobExplorer" ref="jobExplorer" />
<property name="stepLocator" ref="stepLocator" />
</bean>
<bean id="stepLocatorAmq"
class="org.springframework.batch.integration.partition.BeanFactoryStepLocator" />
<bean id="slavePartitionHandler" class="org.springframework.batch.integration.partition.MessageChannelPartitionHandler">
<property name="stepName" value="as-step0002.slave"/>
<property name="gridSize" value="3"/>
<property name="messagingOperations" ref="stepMessagingTemplate"/>
</bean>
<bean id="amq-properties"
class="com.maxis.mxarchive.spring.InjectableProperties"
factory-method="getAmqProperties">
<constructor-arg ref="configPropertiesService" />
</bean>
Observation:
The originating master and the receiving slave message handlers both directly reference the specific step to be executed.
Question:
From a purely pragmatic perspective, does this mean that I can simply add more MessageChannelPartitionHandler bean pairs referencing the appropriate steps to ensure that spawned partitions are picked up and executed by the correct step beans on the worker systems?
Or do I need to plug in a flow with a decider to pick the appropriate step from the step ExecutionContext?
Or should I implement a StepLocator bean?
Thank you,
Solution
The type of step executed on the worker side is completely arbitrary, so nothing prevents you from running a partitioned step on the worker side as well. This, indeed, allows you to implement a second level partitioning, in which each worker can further partition the partition that was assigned to it.
Now to answer your questions:
From a purely pragmatic perspective, does this mean that I can simply add more MessageChannelPartitionHandler bean pairs referencing the appropriate steps to ensure that spawned partitions are picked up and executed by the correct step beans on the worker systems?
Yes, as mentioned previously, the step on the worker side can be a partitioned step. So it would requires its own Partitioner
/PartitionHandler
team of two.
Or do I need to plug in a flow with a decider to pick the appropriate step from the step ExecutionContext?
Nothing wrong with this approach , but I would personally not recommend it. The reason is that this approach will make the implementation complex, as almost all components should be step-scoped to get access the EC in order to set/get partitioning meta-data (step to execute, partition definition, etc) needed to do the work. Give it a try, and you will quickly realize what I'm trying to explain.
Or should I implement a StepLocator bean?
That could be an option if the step bean is defined outside the application context bootstrapped on the worker side. Otherwise, the default BeanFactoryStepLocator
can be used to locate the (partitioned) step defined in the worker context.
That said, I just wanted to share some personal experience when working towards the same goal: how to maximise resources utilization when running multiple jobs on different machines. Here is a non exhaustive list of what worked well:
- Design job instances to be independent from each others (for example a job instance per input file, instead of a single job that processes all input files). This maximises parallelism and enables fault-tolerance
- Combine remote partitioning/chunking with a multi-threaded step on workers. This is a bit similar to what you are trying to do. The only difference is using a multi-threaded step on workers instead of partitioned step as you are trying to do (Which I never tried BTW).
I tried to summarize my experience on the matter in this post: Spring Batch on Kubernetes: Efficient batch processing at scale.
Answered By - Fadhel Mahmoud Ben Hassine
Answer Checked By - Pedro (JavaFixing Volunteer)