Issue
I have a problem in my Java webapp.
Here is the code in index.jsp:
<%@page contentType="text/html" pageEncoding="UTF-8" %>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<% request.setCharacterEncoding("UTF-8");
response.setCharacterEncoding("UTF-8");
%>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>JSP Page</title>
</head>
<body>
<h1>Hello World!</h1>
<form action="index.jsp" method="get">
<input type="text" name="q"/>
</form>
Res: <%= request.getParameter("q") %>
</body>
</html>
When I wireshark a request, my browser sends this header:
GET /kjd/index.jsp?q=%C3%A9 HTTP/1.1\r\n
...
Accept-Charset: UTF-8,*\r\n
And the Tomcat server returns me this:
Content-Type: text/html;charset=UTF-8\r\n
But if I send "é"(%C3%A9 in UTF-8) in my form, "é" is displayed instead.
What I understand is that the browser sends an "é" encoded with UTF-8 (the %C3%A9).
But the server interpret this as ISO-8859-1. So the %C3 is decoded as à and %A9 as ©, and then sends back the response encoded in UTF-8.
In the code, the requests should be decoded with UTF-8:
request.setCharacterEncoding("UTF-8");
But, if I send this url:
http://localhost:8080/kjd/index.jsp?q=%E9
the "%E9" is decocded with ISO-8859-1 and an "é" is displayed.
Why isn't this working? Why requests are decoded with ISO-8859-1?
I've tried it on Tomcat 6 and 7, and on Windows and Ubuntu.
Solution
The request.setCharacterEncoding("UTF-8");
only sets the encoding of the request body (which is been used by POST requests), not the encoding of the request URI (which is been used by GET requests).
You need to set the URIEncoding
attribute to UTF-8
in the <Connector>
element of Tomcat's /conf/server.xml
to get Tomcat to parse the request URI (and the query string) as UTF-8. This indeed defaults to ISO-8859-1. See also the Tomcat HTTP Connector Documentation.
<Connector ... URIEncoding="UTF-8">
or to ensure that the URI is parsed using the same encoding as the body1:
<Connector ... useBodyEncodingForURI="true">
See also:
1 From Tomcat's documentation (emphasis mine):
This setting is present for compatibility with Tomcat 4.1.x, where the encoding specified in the contentType, or explicitly set using Request.setCharacterEncoding method was also used for the parameters from the URL. The default value is false.
Please get rid of those scriptlets in your JSP. The request.setCharacterEncoding("UTF-8");
is called at the wrong moment. It would be too late whenever you've properly used a Servlet to process the request. You'd rather like to use a filter for this. The response.setCharacterEncoding("UTF-8");
part is already implicitly done by pageEncoding="UTF-8"
in top of JSP.
I also strongly recommend to replace the old fashioned <%= request.getParameter("q") %>
scriptlet by EL ${param.q}
, or with JSTL XML escaping ${fn:escapeXml(param.q)}
to prevent XSS attacks.
Answered By - BalusC
Answer Checked By - Marilyn (JavaFixing Volunteer)