Issue
I have a simple test servlet that should output a non ASCII character (right single quotation mark - ’). In Tomcat, it works, but in Liberty I get junk. Is this a bug in Liberty, am I doing it wrong, or a config issue?
package test;
import java.io.IOException;
import java.io.PrintWriter;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
public class TestServlet extends HttpServlet {
@Override
protected void doGet(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException {
response.setContentType("text/html;charset=UTF-8");
response.setCharacterEncoding("UTF-8");
try (PrintWriter out = response.getWriter()) {
out.print("’");
out.close();
}
}
}
and the web.xml
<?xml version="1.0" encoding="UTF-8"?>
<web-app version="3.1" xmlns="http://xmlns.jcp.org/xml/ns/javaee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee http://xmlns.jcp.org/xml/ns/javaee/web-app_3_1.xsd">
<servlet>
<servlet-name>TestServlet</servlet-name>
<servlet-class>test.TestServlet</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>TestServlet</servlet-name>
<url-pattern>/TestServlet</url-pattern>
</servlet-mapping>
</web-app>
From Tomcat the response is (courtesy of Fiddler):
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: text/html;charset=UTF-8
Content-Length: 3
Date: Wed, 23 Jun 2021 23:40:07 GMT
’
The body hex is: E2, 80, 99 (which is correct UTF-8 for ’)
From Liberty it is
HTTP/1.1 200 OK
X-Powered-By: Servlet/3.1
Content-Type: text/html;charset=UTF-8
Content-Length: 3
Content-Language: en-CA
Date: Wed, 23 Jun 2021 23:52:49 GMT
รข€™
The hex for that content is: C3, A2, E2, 82, AC, E2, 84, A2
Dev tools (F12) matches Fiddler.
I've tried moving around the code
response.setContentType("text/html;charset=UTF-8");
response.setCharacterEncoding("UTF-8");
before and after the getWriter (the docs say it should be before getWriter). With and without setCharacterEncoding
and all kinds of things, content types etc.
The .java file itself is saved with UTF-8 encoding.
It's curious that the content length header says 3 bytes with either server, but with Liberty the actual content length is 8 bytes. As if the bytes have been re-encoded?
So, what is going on here?
UPDATE: taking out the out.close() per @pmdinh's answer had an effect, but didn't fix it. This is the closest I could get to proper behaviour
response.setCharacterEncoding("UTF-8");
try (PrintWriter out = response.getWriter()) {
response.setContentType("text/html;charset=UTF-8");
out.print("’1234");
}
This correctly encodes it, but now the content length is wrong by 2 bytes. So the response is
HTTP/1.1 200 OK
X-Powered-By: Servlet/3.1
Content-Type: text/html;charset=UTF-8
Content-Length: 5
Content-Language: en-CA
Date: Thu, 24 Jun 2021 17:50:55 GMT
’1234
but since the content-length is 2 short the browser shows ’12
Also note that the placing of setCharacterEncoding and setContentType matters and other combinations make the output even worse (incorrect encoding).
Solution
Remove the
out.close();
that should resolve the issue.
Ref: https://www.ibm.com/support/pages/apar/PM71666
Answered By - pmdinh
Answer Checked By - Terry (JavaFixing Volunteer)