Issue
I store files in my postgres database in a column of type bytea
with a size potentionaly exceeding the allocated Java heap space so when trying to write those files into the file system I quickly run into out of memory issues.
I am using JDBC to perform the query and then extract the content as binary stream.
This is a simplified version of my code:
public File readContent(String contentId) {
PreparedStatement statement = jdbcTemplate.getDataSource().getConnection().prepareStatement("SELECT content from table.entry WHERE id=?");
statement.setString(1, contentId);
ResultSet resultSet = statement.executeQuery();
resultSet.next();
File file = writeToFileSystem(resultSet.getBinaryStream(1));
resultSet.close();
return file;
}
private File writeToFileSystem(InputStream inputStream) {
File dir = createDirectories(Paths.get(properties.getTempFolder(), UUID.randomUUID().toString())).toFile();
File file = new File(dir, "content.zip");
FileUtils.copyInputStreamToFile(inputStream, file);
return file;
}
My expectation was that this would let me stream the data from the database into the file without ever having to load it into memory entirely. This approach doesn't work however as I am still getting OutOfMemoryErrors
as soon as the query is executed:
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.postgresql.core.PGStream.receiveTupleV3(PGStream.java:395)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2118)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:288)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:430)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:356)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:168)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:116)
at sun.reflect.GeneratedMethodAccessor201.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.tomcat.jdbc.pool.StatementFacade$StatementProxy.invoke(StatementFacade.java:114)
at com.sun.proxy.$Proxy149.executeQuery(Unknown Source)
at [...].ContentRepository.readContent(ContentRepository.java:111)
Is there any way I can stream the data from the database into a file without having to increase the Java VMs available memory?
Solution
As per this mail group discussion you should not be using bytea
for this use case:
There are two methods to store binary data in pg and they have different access methods and performance characteristics. Bytea data is expected to be shorter and is returned in whole with a ResultSet by the server. For larger data you want to use large objects which return a pointer (oid) to the actual data which you can then stream from the server at will.
This page describes some of the differences between the two and demonstrates using a pg specific api to access large objects, but getBlob/setBlob will work just fine.
See Chapter 7. Storing Binary Data which shows example code and Chapter 35. Large Objects that goes into details:
PostgreSQL has a large object facility, which provides stream-style access to user data that is stored in a special large-object structure. Streaming access is useful when working with data values that are too large to manipulate conveniently as a whole.
Answered By - Karol Dowbecki
Answer Checked By - Terry (JavaFixing Volunteer)