Issue
If some of you could help me design my spring batch process it would be nice :D I need to perform ETL by consuming a REST API & then store some data from it. This process must be daily & Spring Batch seems perfect to achieve what I want since we already are using Spring framework for a lots of stuff at the company I work at. But I am struggling on how to design my job(s?)/tasklet etc.
Could you please help me designing what would be the most appropriate way to do what I want ?
Summary of what i need to do :
- Consume a summary list of all Items
- Loop over those items to retrieve an
HREF
field - Query each
HREF
- Insert in DB (only the data I need, 90% of the data are useless for me)
I am wondering how I should translate those steps into the spring batch way. Should i create 1 tasklet + 1 chunk job, tasklet for the main list + then write href to local file & job read from local file + write to db ? (it's about 10k items only so local file would be ok) Should I create only 1 tasklet where the reader does both query summary + each individual endpoint ? Which one would be the most performant ? I don't need to max perfs, i'm quite new at Spring batch and i'm wondering how to design the processing :)
Thanks !!
EDIT : I cannot use a simple list because the list is not at root level but in a "data" property at root level. Also by "Query each HREF
" I meant perform an API call using the HREF value which is a link to the endpoint of a single item data that I must query because i need data from it not present in the 1st list given by the API.
EDIT 2 : See comments on accepted answer for solution.
Solution
How to design my process - loop over a list + 1 query for each item - spring batch
You can create a chunk-oriented step as follows:
- An item reader that returns items from the list (
ListItemReader
might work) - An item processor that enriches items with
HREF
field - A
JdbcBatchItemWriter
to insert items in the DB
This is a common pattern, and is documented here: Driving Query Based ItemReaders. That said, this pattern works well with small/medium data sets, but not with large data sets as it requires one or more query for each item. The following threads might be helpful with regard to that matter:
- doesn't driving query pattern cause N+1 problem?
- Spring batch: How to output list as output for RepositoryItemReader : see example in the answer.
- How to read multiple tables using Spring Batch
Answered By - Fadhel Mahmoud Ben Hassine
Answer Checked By - Willingham (JavaFixing Volunteer)