Issue
I am using Spring Boot and Spring Batch and JPA
to load data from One database into another. In a single batch Job I've created 10 steps
to run steps in sequence and each step reads almost 1 millions
records (I can't run in parallel, because data that I've doesn't load in parallel).
I've used GenerationType.IDENTITY
and look like because of this batch job taking hell lot of time. If I say to load 100 records
it takes 2 min
. The target Postgres database
has sequences
implemented by DBA and we must need to follow those, but its eating up all the performance.
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
@Column(name = "EMP_ID", nullable = false, updatable = false, insertable = false)
private Long id;
How can I improve the performance of this batch job ?
I kept spring.jpa.properties.hibernate.jdbc.batch_size=1000
and chunkSize=1000
as well.
Solution
Hibernate cannot batch insert entities if the entity is using IDENTITY
to generate its ID (Also mentioned in the docs at here).
So you have to change to use SEQUENCE
to generate the ID. And according to this , choose to use "pooled" or "pooled-lo" algorithm to get the new ID from the sequence in order to further improve the performance by reducing the round trips to get the ID.
So the ID mapping looks like :
@Id
@GeneratedValue(strategy = GenerationType.SEQUENCE, generator="emp_sequence")
@SequenceGenerator(name="emp_sequence", sequenceName = "emp_id_seq", allocationSize = 100)
private Long id;
And the hibernate settings :
spring.jpa.properties.hibernate.order_inserts = true
spring.jpa.properties.hibernate.order_updates = true
spring.jpa.properties.hibernate.jdbc.batch_size = 1000
spring.jpa.properties.hibernate.jdbc.batch_versioned_data = true
# For using "pool-lo" optimiser for generating ID when using JPA @SequenceGenerator
spring.jpa.properties.hibernate.id.optimizer.pooled.preferred = pooled-lo
Also , you have to make sure the corresponding ID sequence in PostreSQL is aligned with the configuration in @SequenceGenerator
:
alter sequence emp_id_seq increment by 100;
Another tip is to add reWriteBatchedInserts=true
in the JDBC connection string which will provides 2-3x performance improvement as said from the docs.
Answered By - Ken Chan
Answer Checked By - David Marino (JavaFixing Volunteer)