Issue
I created a Spring Batch job which reads orders from MongoDB and makes a rest call to upload them. However, the batch job automatically gets completed even though all records are not read by MongoItemReader.
I am maintaining a field batchProcessed:boolean on Orders collection. The MongoItemReader reads records for which {batchProcessed:{$ne:true}} as I need to run the batch job multiple times but not process the same documents again and again.
In my OrderWriter I set batchProcessed to true.
@Bean
@StepScope
public MongoItemReader<Order> orderReader() {
MongoItemReader<Order> reader = new MongoItemReader<>();
reader.setTemplate(mongoTempate);
HashMap<String,Sort.Direction> sortMap = new HashMap<>();
sortMap.put("_id",Direction.ASC);
reader.setSort(sortMap);
reader.setTargetType(Order.class);
reader.setQuery("{batchProcessed:{$ne:true}}");
return reader;
}
@Bean
public Step uploadOrdersStep(OrderItemProcessor processor) {
return stepBuilderFactory.get("step1").<Order, Order>chunk(1)
.reader(orderReader()).processor(processor).writer(orderWriter).build();
}
@Bean
public Job orderUploadBatchJob(JobBuilderFactory factory, OrderItemProcessor processor) {
return factory.get("uploadOrder").flow(uploadOrdersStep(processor)).end().build();
}
Solution
The MongoItemReader
is a paging item reader. When reading items in pages and changing items that might be returned by the query (ie a field that is used in the query's "where" clause), the paging logic can be lost and some items might be skipped. There's a similar problem with the JPA paging item reader that is explained in details here: Spring batch jpaPagingItemReader why some rows are not read?
Common techniques to work around this issue is to use a cursor-based reader, use a staging table/collection, use a partitioned step with a partition per page, etc.
Answered By - Mahmoud Ben Hassine
Answer Checked By - Terry (JavaFixing Volunteer)