Thursday, May 12, 2022

[FIXED] Why is reading a JDBC ResultSet by position faster than by name and how much faster?

May 12, 2022 hibernate, hibernate-6.x, java, jdbc, performance

Issue

Announcing Hibernate 6 the Hibernate team claims that by switching from read-by-name to read-by-position in JDBC rel="noreferrer">ResultSet they gain a performance benefit.

High-load performance testing showed that Hibernate’s approach of reading values from ResultSet by name to be its most limiting factor in scaling through-put.

Does that mean they are changing calls from getString(String columnLabel) to getString(int columnIndex)?

Why is this faster?

As ResultSet is an interface doesn't performance gain depend on the JDBC driver implementing it?

How big are the gains?

Solution

Speaking as a JDBC driver maintainer (and, I admit, making some sweeping generalizations which not necessarily apply to all JDBC driver), row values will usually be stored in an array or list because that most naturally matches the way the data is received from the database server.

As a result, retrieving values by index will be the simplest. It might be as simple as something like (ignoring some of the nastier details of implementing a JDBC driver):

public Object getObject(int index) throws SQLException {
    checkValidRow();
    checkValidIndex(index);
    return currentRow[index - 1];
}

This is about as fast as it gets.

On the other hand, looking up by column name is more work. Column names need to be treated case-insensitive, which has additional cost whether you normalize using lower or uppercase, or use a case-insensitive lookup using a TreeMap.

A simple implementation might be something like:

public Object getObject(String columnLabel) throws SQLException {
    return getObject(getIndexByLabel(columnLabel));
}

private int getIndexByLabel(String columnLabel) {
    Map<String, Integer> indexMap = createOrGetIndexMap();
    Integer columnIndex = indexMap.get(columnLabel.toLowerCase());
    if (columnIndex == null) {
        throw new SQLException("Column label " + columnLabel + " does not exist in the result set");
    }
    return columnIndex;
}

private Map<String, Integer> createOrGetIndexMap() throws SQLException {
    if (this.indexMap != null) {
        return this.indexMap;
    }
    ResultSetMetaData rsmd = getMetaData();
    Map<String, Integer> map = new HashMap<>(rsmd.getColumnCount());
    // reverse loop to ensure first occurrence of a column label is retained
    for (int idx = rsmd.getColumnCount(); idx > 0; idx--) {
        String label = rsmd.getColumnLabel(idx).toLowerCase();
        map.put(label, idx);
    }
    return this.indexMap = map;
}

Depending on the API of the database and available statement metadata, it may require additional processing to determine the actual column labels of a query. Depending on the cost, this will likely only be determined when it is actually needed (when accessing column labels by name, or when retrieving result set metadata). In other words, the cost of createOrGetIndexMap() might be pretty high.

But even if that cost is negligible (eg the statement prepare metadata from the database server includes the column labels), the overhead of mapping the column label to index and then retrieving by index is obviously higher than directly retrieving by index.

Drivers could even just loop over the result set metadata each time and use the first whose label matches; this might be cheaper than building and accessing the hash map for result sets with a small number of columns, but the cost is still higher than direct access by index.

As I said, this is a sweeping generalization, but I would be surprised if this (lookup index by name, then retrieve by index) isn't how it works in the majority of JDBC drivers, which means that I expect that lookup by index will generally be quicker.

Taking a quick look at a number of drivers, this is the case for:

Firebird (Jaybird, disclosure: I maintain this driver)
MySQL (MySQL Connector/J)
PostgreSQL
Oracle
HSQLDB
SQL Server (Microsoft JDBC Driver for SQL Server)

I'm not aware of JDBC drivers where retrieval by column name would be equivalent in cost or even cheaper.

Answered By - Mark Rotteveel
Answer Checked By - David Marino (JavaFixing Volunteer)

This Answer collected from stackoverflow and tested by JavaFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, May 12, 2022

[FIXED] Why is reading a JDBC ResultSet by position faster than by name and how much faster?

Issue

Solution

Popular Posts

Labels