Issue
I'm having a string containing CSV lines. Some of its values contains the CRLF
characters, marked [CRLF] in the example below
NOTE: Line 1: and Line 2: aren't part of the CSV, but for the discussion
Line 1:
foo1,bar1,"john[CRLF]
dose[CRLF]
blah[CRLF]
blah",harry,potter[CRLF]
Line 2:
foo2,bar2,john,dose,blah,blah,harry,potter[CRLF]
Each time a value in a line have a CRLF, the whole value appears between quotes, as shown by line 1. Looking for a way to get ride of those CRLF when they appears between quotes.
Tried regexp such as:
data.replaceAll("(,\".*)([\r\n]+|[\n\r]+)(.*\",)", "$1 $3");
Or just ([\r\n]+)
, \n+
, etc. without success: the line continue to appears as if no replacement were made.
EDIT:
Solution
Found the solution here:
String data = "\"Test Line wo line break\", \"Test Line \nwith line break\"\n\"Test Line2 wo line break\", \"Test Line2 \nwith line break\"\n";
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile("\"[^\"]*\"").matcher(data);
while (m.find()) {
m.appendReplacement(result, m.group().replaceAll("\\R+", ""));
}
m.appendTail(result);
System.out.println(result.toString());
Solution
Using Java 9+ you can use a function code inside Matcher#replaceAll
and solve your problem using this code:
// pattern that captures quoted strings ignoring all escaped quotes
Pattern p = Pattern.compile("\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\"");
String data1 = "\"Test Line wo line break\", \"Test Line \nwith line break\"\n\"Test Line2 wo line break\", \"Test Line2 \nwith line break\"\n";
// functional code to get all quotes strings and then remove all line
// breaks from matched substrings
String repl = p.matcher(data1).replaceAll(
m -> m.group().replaceAll("\\R+", "")
);
System.out.println(repl);
Output:
"Test Line wo line break", "Test Line with line break"
"Test Line2 wo line break", "Test Line2 with line break"
Answered By - anubhava
Answer Checked By - Terry (JavaFixing Volunteer)