Issue
I'm having difficulty using the CITEXT datatype in PostgreSQL using JPA and Hibernate. CITEXT is supposed to provide a case insensitive text datatype but when used with JPA/Hibernate it doesn't behave in a case insensitive manner. Has anyone else had this issue or know a way around it? I have seen some mention (but very, very little) about a JDBC issue, but that went back at least a year and wasn't very clear.
I have a 'nickname' column defined as citext in postgres 9.1. I just did a test to see if it could find a row using a named query as such:
create table test(
nickname citext
)
@NamedQuery(name = "Person.findByNickname",
query = "SELECT p
FROM Person p
WHERE p.nickname = :nickname")
Insert a nickname into the DB:
insert into test values('testNick')
Then run this code:
String nickname = "testNick";
Query q = em.createNamedQuery("Person.findByNickname");
q.setParameter("nickname", nickname);
if (q.getResultList().isEmpty()) {
return (false);
}
return (true);
This returns 'true' (i.e. there is already a 'testNick' in the database).
If I make this assignment
String nickname = "testnick"; //(lower case 'N')
and run it again it returns 'false'.
Since the column is CITEXT, it should return 'true' again. i.e. case insensitive text.
Using JPA and Hibernate. Anyone have any thoughts?
In the meantime I've changed the column back to varchar and created a functional index for lowercase. And I have to create a native query now to search using database functions. Would like to find out if there is a way I can not have to do this to maintain the database abstraction.
Regards.
Solution
citext
provides case-insensitive operators for use within the database, with other citext values.
What's happening
At a guess, your JPA implementation is explicitly specifying the type of the parameter as text
when it creates the parameterized statement. citext
doesn't define a citext = text
operator, so PostgreSQL casts the citext
to text
and uses the case-sensitive text = text
operator. Effectively, comparing citext
to text
is case sensitive.
Here's what I think is happening. Given the dummy data:
regress=# CREATE EXTENSION citext;
regress=# CREATE TABLE citest ( x citext );
regress=# INSERT INTO citest(x) VALUES ('FRED'), ('FrEd');
regress=# SELECT * FROM citest;
x
------
FRED
FrEd
(2 rows)
... a comparison of citext to an unknown string literal will be interpreted as citext=citext
and be done case-insensitively:
regress=# SELECT * FROM citest WHERE x = 'FRED';
x
------
FRED
FrEd
(2 rows)
... but a comparison between citext
and an explicitly text
typed literal will convert the citext
argument to text
using citext
's implicit cast to text, then do a text=text
case sensitive comparison:
regress=# SELECT * FROM citest WHERE x = 'FRED'::text;
x
------
FRED
(1 row)
Or rather, what Hibernate is doing will be closer to:
regress=# PREPARE blah(text) AS SELECT * FROM citest WHERE x = $1;
PREPARE
regress=# EXECUTE blah('FRED');
x
------
FRED
(1 row)
where the type is specified as text
when binding a parameter, since Hibernate "knows" that Strings are text
.
In other words, you need to get Hibernate to, via PgJDBC, explicitly specify the citext
datatype as the parameter type to your query, resulting in something like:
regress=# PREPARE blah(citext) AS SELECT * FROM citest WHERE x = $1;
PREPARE
regress=# EXECUTE blah('FRED');
x
------
FRED
FrEd
(2 rows)
Note the explicit citext
type parameter to the prepared statement. That will be ... interesting ... to do, especially since PgJDBC doesn't know anything about the citext
type. You'd have to write a custom data type handler for Hibernate that uses PgJDBC's setObject
; even then you'll have operator consistency issues between Java and Pg (see below).
IMO you'll be much better off using traditional case sensitive types and lower()
, ILIKE
, etc.
It's also possible that Hibernate is relying on what PgJDBC tells it about column case sensitivity. At least as of 9.2-devel PgJDBC doesn't know anything about the citext
type, so it'll always say "yup, that's case sensitive" when asked.
Tracing
It's hard to be sure that's what's happening without seeing the actual queries run by JPA. Try setting log_statement = 'all'
in postgresql.conf
. Then SIGHUP
the postmaster, use pg_ctl reload
, or restart Pg to cause the change to take effect.
Re-run your test and examine the logs. Test the queries you see in psql
to observe the results. If you're unsure what's going on, update your question with them. If you update also include your Hibernate version and your PgJDBC version.
It's also possible that Hibernate is relying on what PgJDBC tells it about column case sensitivity. At least as of 9.2-devel PgJDBC doesn't know anything about the citext
type, so it'll always say "yup, that's case sensitive" when asked.
Operator consistency difficulties
WARNING: The citext
type cannot affect how Hibernate works with the text once it's come out of the database. It won't have any effect on the String.equals
method, for example. You would need to tell Hibernate you want it to treat the text as case insensitive. Otherwise if you have a text
or varchar
primary/foreign key you can get situations where the Hibernate asks for the key "FRED"
, it gets "FrEd"
back, and is quite confused because the DB returned a key that isn't equal - according to Hibernate - to the one it was asked for. Similar oddities will occur if you include citext
-backed strings in equals
and hashCode
implementations in your entities.
Unfortunately JPA doesn't seem to specify annotation attributes in the @Column
mapping for whether the column is case sensitive or not. Java doesn't have the concept of a case-insensitive string data type anyway, so it wouldn't do tons of good even if JPA did specify it.
You'll probably avoid confusing Hibernate too badly so long as you don't use citext
for keys or include citext
values in equals
and hashCode
.
Answered By - Craig Ringer
Answer Checked By - Dawn Plyler (JavaFixing Volunteer)