Issue
I am having Input String like below.
String comment = "Good morning! \u2028\u2028I am looking to purchase a new Honda car as I\u2019m outgrowing my current car. I currently drive a Hyundai Accent and I was looking for something a
little bit larger and more comfortable like the Honda Civic. May I know if you have any of the models currently in stock? Thank you! Warm regards Sandra";
I want to remove Unicode characters like "\u2028" , "\u2019" etc if it is present in the comment section.In runtime i don't know what are all extra characters coming. So what is the best way to handle this?
I tried like below which removes unicode characters in the given string.
Comments.replaceAll("\\P{Print}", "");
So what is the best way to match Unicode characters are present in the comment section and if present remove those, otherwise just pass the comment to target system.
Can anyone please help me to resolve this?
Solution
You can do this sequentially like below:
public static void main(final String args[]) {
String comment = "Good morning! \u2028\u2028I am looking to purchase a new Honda car as I\u2019m outgrowing my current car. I currently drive a Hyundai Accent and I was looking for something a little bit larger and more comfortable like the Honda Civic. May I know if you have any of the models currently in stock? Thank you! Warm regards Sandra";
// remove all non-ASCII characters
comment = comment.replaceAll("[^\\x00-\\x7F]", "");
// remove all the ASCII control characters
comment = comment.replaceAll("[\\p{Cntrl}&&[^\r\n\t]]", "");
// removes non-printable characters from Unicode
comment = comment.replaceAll("\\p{C}", "");
System.out.println(comment);
}
Answered By - Gaurav Jeswani
Answer Checked By - Pedro (JavaFixing Volunteer)