Issue
I am writing a some text into file using the FileWriter object. I am specifying that I want the output to be in UTF-8,but when i open the text file and go to save as , I see that it is in ANSI encoding.
I want to also add that when there are characters other than the standard ascii charset (eg:- japansese character) the file encoding is UTF-8, but without then text file encoding is ANSI.
File json_file= new File(path);
FileWriter json_file_output=newFileWriter(json_file,StandardCharsets.UTF_8);
json_file_output.write("SOME JSON TEXT HERE");
json_file_output.flush();
I am not sure whether it is due to java code or notepad.
Thank you for the help.
Solution
Unicode is superset of US-ASCII character set,
UTF-8 is superset of 8-bit US-ASCII character encoding
There is no such thing as ANSI encoding. See What is ANSI format?.
Likely what is meant is US-ASCII. And every 8-bit US-ASCII file is also a UTF-8 file. Unicode is a superset of US-ASCII. When written out using octets, ASCII files are UTF-8 files. UTF-8 encoding was designed this way on purpose, to be compatible.
US-ASCII is a 7-bit character set, having only 128 characters, numbered 0-127. So if written using octets (8-bits), the first bit of every octet is a zero. See the Wikipedia page on UTF-8 encoding, and notice the role played by the first bit.
No file meta-data
Understand that both US-ASCII files and UTF-8 files (without a BOM) are just a bunch of bits, with no meta-data. The computer industry never managed to establish a standard for file system meta-data, unfortunately. So an app has to guess the content’s content, or the user must indicate the expected format.
Your text editor is likely looking at the domain of characters found in your file, and then trying to be helpfully conservative in labeling the file using the smallest-scope encoding possible. If only US-ASCII characters, then label as US-ASCII (and apparently misreport as “ANSI”). As soon as you add higher-numbered characters with a code point beyond that of ASCII, then label as UTF-8.
For background info, such as the distinction between character set and character encoding, see: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
Answered By - Basil Bourque
Answer Checked By - David Marino (JavaFixing Volunteer)