Issue
I recently switched my app to Spring Boot 2. I rely on Spring Data JPA to handle all transactions and I noticed a huge speed difference between this and my old configuration. Storing around 1000 elements was being done in around 6s and now it's taking over 25 seconds. I have seen SO posts about batching with Data JPA but none of these worked.
Let me show you the 2 configurations:
The entity (common to both) :
@Entity
@Table(name = "category")
public class CategoryDB implements Serializable
{
private static final long serialVersionUID = -7047292240228252349L;
@Id
@Column(name = "category_id", length = 24)
private String category_id;
@Column(name = "category_name", length = 50)
private String name;
@Column(name = "category_plural_name", length = 50)
private String pluralName;
@Column(name = "url_icon", length = 200)
private String url;
@Column(name = "parent_category", length = 24)
@JoinColumn(name = "parent_category", referencedColumnName = "category_id")
private String parentID;
//Getters & Setters
}
Old Repository (showing an insert only) :
@Override
public Set<String> insert(Set<CategoryDB> element)
{
Set<String> ids = new HashSet<>();
Transaction tx = session.beginTransaction();
for (CategoryDB category : element)
{
String id = (String) session.save(category);
ids.add(id);
}
tx.commit();
return ids;
}
Old Hibernate XML Config File:
<property name="show_sql">true</property>
<property name="format_sql">true</property>
<!-- connection information -->
<property name="hibernate.connection.driver_class">com.mysql.cj.jdbc.Driver</property>
<property name="hibernate.dialect">org.hibernate.dialect.MySQLDialect</property>
<!-- database pooling information -->
<property name="connection_provider_class">org.hibernate.connection.C3P0ConnectionProvider</property>
<property name="hibernate.c3p0.min_size">5</property>
<property name="hibernate.c3p0.max_size">100</property>
<property name="hibernate.c3p0.timeout">300</property>
<property name="hibernate.c3p0.max_statements">50</property>
<property name="hibernate.c3p0.idle_test_period">3000</property>
Old Statistics:
18949156 nanoseconds spent acquiring 2 JDBC connections;
5025322 nanoseconds spent releasing 2 JDBC connections;
33116643 nanoseconds spent preparing 942 JDBC statements;
3185229893 nanoseconds spent executing 942 JDBC statements;
0 nanoseconds spent executing 0 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
3374152568 nanoseconds spent executing 1 flushes (flushing a total of 941 entities and 0 collections);
6485 nanoseconds spent executing 1 partial-flushes (flushing a total of 0 entities and 0 collections)
New Repository:
@Repository
public interface CategoryRepository extends JpaRepository<CategoryDB,String>
{
@Query("SELECT cat.parentID FROM CategoryDB cat WHERE cat.category_id = :#{#category.category_id}")
String getParentID(@Param("category") CategoryDB category);
}
And I'm using the saveAll()
in my service.
New application.properties:
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver
spring.datasource.hikari.connection-timeout=6000
spring.datasource.hikari.maximum-pool-size=10
spring.jpa.properties.hibernate.show_sql=true
spring.jpa.properties.hibernate.format_sql=true
spring.jpa.properties.hibernate.generate_statistics = true
spring.jpa.properties.hibernate.dialect=org.hibernate.dialect.MySQLDialect
spring.jpa.properties.hibernate.jdbc.batch_size=50
spring.jpa.properties.hibernate.order_inserts=true
New Statistics:
24543605 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
136919170 nanoseconds spent preparing 942 JDBC statements;
5457451561 nanoseconds spent executing 941 JDBC statements;
19985781508 nanoseconds spent executing 19 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
20256178886 nanoseconds spent executing 3 flushes (flushing a total of 2823 entities and 0 collections);
0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
Probably , I'm misconfiguring something on behalf on Spring. This is a huge performance difference and I'm on a dead end. Any hints on what is going wrong here are very appreciated.
Solution
Let's merge the statistics so they can be easily compared.
Old rows are prefixed with o
, new ones with n
.
Rows with a count of 0 are ignored.
Nanoseconds measurements are formatted so that milliseconds can are before a .
o: 18 949156 nanoseconds spent acquiring 2 JDBC connections;
n: 24 543605 nanoseconds spent acquiring 1 JDBC connections;
o: 33 116643 nanoseconds spent preparing 942 JDBC statements;
n: 136 919170 nanoseconds spent preparing 942 JDBC statements;
o: 3185 229893 nanoseconds spent executing 942 JDBC statements;
n: 5457 451561 nanoseconds spent executing 941 JDBC statements; //loosing ~2sec
o: 0 nanoseconds spent executing 0 JDBC batches;
n: 19985 781508 nanoseconds spent executing 19 JDBC batches; // loosing ~20sec
o: 3374 152568 nanoseconds spent executing 1 flushes (flushing a total of 941 entities and 0 collections);
n: 20256 178886 nanoseconds spent executing 3 flushes (flushing a total of 2823 entities and 0 collections); // loosing ~20sec, processing 3 times the entities
o: 6485 nanoseconds spent executing 1 partial-flushes (flushing a total of 0 entities and 0 collections)
n: 0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
The following seem to be the relevant points:
The new version has 19 batches which take 20sec which don't exist in the old version at all.
The new version has 3 flushes instead of 1, which take together 20 sec more or about 6 times as long. This is probably more or less the same extra time as the batches since they are most certainly part of these flushes.
Although batches are supposed to make things faster, there are reports where they make things slower, especially with MySql: Why Spring's jdbcTemplate.batchUpdate() so slow?
This brings us to a couple of things you can try/investigate:
- Disable batching, in order to test if you are actually suffering from some kind of slow batch problem.
- Use the linked SO post in order to speed up batching.
- log the SQL statements that actually get executed in order to find the difference. Since this will result in rather lengthy logs to manipulate, try extracting only the SQL statements in two files and comparing them with a diff tool.
- log flushes in order to get ideas why extra flushes are triggered.
- use breakpoints and a debugger or extra logging to find out what entities are getting flushed and why you have way more entities in the second variant.
All the proposals above operate on JPA.
But your statistics and question content suggest that you are doing simple inserts in a single or few tables.
Doing this with on JDBC, e.g. with a JdbcTemplate
might be more efficient and at least easier to understand.
Answered By - Jens Schauder
Answer Checked By - Mary Flores (JavaFixing Volunteer)