Issue
There is a use case in which I have a long String
which can contain many <img>
tags.
I need to collect the entire image tag from start(<img src="
) to close(">
) in a List.
I wrote a regex("<img.*?\">"gm
) for seleting these but don't know how to collect them all in a List.
eg:
final String regex = "<img.*?\\\">";
final String string = "Hello World <img src=\"https://dummyimage.com/300.png/09f/777\"> \nMy Name <img src=\"https://dummyimage.com/300.png/09f/ff2\"> Random Text\nHello\nHello Random <img src=\"https://dummyimage.com/300.png/09f/888\"> \nMy Name <img src=\"https://dummyimage.com/300.png/09f/2ff\">adaad\n";
final String replace = "";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
final String result = matcher.replaceAll(replace); // Here, how can I collect all the image tags in a list
Solution
Java 8 - Pattern.splitAsStream()
We can split the given string using so-called Lookaheads and Lookbehinds (for more information, check the reference provided below):
(?<=.)(?=<)
- matches a position between a character of any kind and an opening angle bracket<
(i.e. it captures an empty substring between any character and beginning of a tag).(?<=>)(?=.)
- matches a position between a closing angle bracket>
and any kind of character.
public static final Pattern ANGLE_BRACKETS =
Pattern.compile("(?<=.)(?=<)|(?<=>)(?=.)");
By using this Pattern, we generate a stream of substring stilted on an empty string on the border of opening and closing angle brackets. And then filter the strings that represent a valid image-tag.
final String string = "Hello World <img src=\"https://dummyimage.com/300.png/09f/777\"> \nMy Name <img src=\"https://dummyimage.com/300.png/09f/ff2\"> Random Text\nHello\nHello Random <img src=\"https://dummyimage.com/300.png/09f/888\"> \nMy Name <img src=\"https://dummyimage.com/300.png/09f/2ff\">adaad\n";
List<String> imageTags = ANGLE_BRACKETS.splitAsStream(string)
.filter(str -> str.strip().matches("<img[^<]+>")) // verifying that a string is a valid image tag
.toList();
imageTags.forEach(System.out::println);
- Information on Lookaheads and Lookbehinds
Java 9 - Matcher.results()
In the regular expression, you need to care about the opening angle bracket <
(not quotation mark) to ensure that a captured substring contains only one tag:
public static final Pattern IMG_TAG = Pattern.compile("img[^<]+>");
Using Java 9 method Matcher.results()
we can create a stream of MatchResult
objects, which contain information about captured sequences in the given string. And to obtain the matching substring, we can use MatchResult.group()
.
final String string = "Hello World <img src=\"https://dummyimage.com/300.png/09f/777\"> \nMy Name <img src=\"https://dummyimage.com/300.png/09f/ff2\"> Random Text\nHello\nHello Random <img src=\"https://dummyimage.com/300.png/09f/888\"> \nMy Name <img src=\"https://dummyimage.com/300.png/09f/2ff\">adaad\n";
List<String> imageTags = IMG_TAG.matcher(string).results() // Stream<MatchResult>
.map(MatchResult::group) // Stream<String>
.toList();
imageTags.forEach(System.out::println);
Output:
<img src="https://dummyimage.com/300.png/09f/777">
<img src="https://dummyimage.com/300.png/09f/ff2">
<img src="https://dummyimage.com/300.png/09f/888">
<img src="https://dummyimage.com/300.png/09f/2ff">
Answered By - Alexander Ivanchenko
Answer Checked By - Marie Seifert (JavaFixing Admin)