Issue
Hey I'm working on a web application and have problems with read UTF-8 chars from txt files. I get UTF-8 working that way: UTF-8 web encoding (and it workes fine except at the import). I tryed a lot of thinks (especially from: read UTF-8 string literal java) but nothing work and I have no idea why.
The importent codesnippets:
import.jsp
<%@ page language="java" contentType="text/html; charset=UTF-8"
pageEncoding="UTF-8"%>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="fi">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Import</title>
<link rel="stylesheet" href="//code.jquery.com/ui/1.12.1/themes/base/jquery-ui.css">
<script src="https://code.jquery.com/jquery-1.12.4.js"></script>
<script src="https://code.jquery.com/ui/1.12.1/jquery-ui.js"></script>
<script src="script.js"></script>
<link rel="stylesheet" type="text/css"
media="screen and (min-device-width: 500px)" href="style.css" />
<link rel="stylesheet"
href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
</head>
<body>
<form>
<!-- show import data -->
</form>
<form id="importForm" action="${pageContext.request.contextPath}/ImportData" method="post" onsubmit="return importValidation();" enctype="multipart/form-data">
<input type="file" name="file" accept=".txt"/>
<input type="submit" value="Import">
</form>
</body>
</html>
ImportData Servlet:
import java.nio.charset.StandardCharsets;
@WebServlet("/ImportData")
@MultipartConfig
public class ImportData extends HttpServlet {
protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
Part filePart = request.getPart("file"); // Retrieves <input type="file" name="file">
BufferedReader buf = new BufferedReader(new InputStreamReader(filePart.getInputStream(), StandardCharsets.UTF_8.name()));
String lineJustFetched = null;
String[] wordsArray = null;
ArrayList<String> texts = new ArrayList<String>();
while(true){
lineJustFetched = buf.readLine();
if(lineJustFetched == null){
break;
}else{
wordsArray = lineJustFetched.split("\t");
for(String each : wordsArray){
texts.add(each);
}
}
}
buf.close();
System.out.println(texts);
//create Import Data in Backend and write it into db
response.sendRedirect("import.jsp");
}
}
System details: Tomcat server 7 with Java 1.7
The outprint of texts for UTF-8 chars is a square and in html inputs (and texts) is a � instead of the UTF-8 chars
So my question is: Where and why do I lost the UTF-8 encoding?
Solution
Ok I didn't look right... The file is not UTF-8 encoded (it is ANSI encoded) with UTF-8 encoding this code workes fine.
To make it runnable for an other encoding you have only to change the InputStreamReader encoding to read the file correctly.
e.g.
BufferedReader buf = new BufferedReader(new
InputStreamReader(filePart.getInputStream(), "Cp1252"));
(for windows-ANSI)
Answered By - SaScH_MaN