Issue
Why does the flatMap operation require a function which returns Stream instead of a function that returns a Collection? Any particular reason it forces the user to do the stream conversion manually?
Reading the source code example I can see that this way the compatibilioty can be extended to arrays but wouldn't an overload of flatMap achieve the same result?
// Java 8 source code example:
Stream<String> words = lines.flatMap(line -> Stream.of(line.split(" +")));
What are the use cases where it's better to have the streaming process explicited?
Example: why am I forced to do this
Map<String, List<String>> map = new HashMap<String, List<String>>();
List<String> flatList = map.entrySet().stream().flatMap(e -> e.getValue().stream()).collect(Collectors.toList());
instead of this?
Map<String, List<String>> map = new HashMap<String, List<String>>();
List<String> flatList = map.entrySet().stream().flatMap(Map.Entry::getValue).collect(Collectors.toList());
Solution
Why does the
flatMap()
operation require a function which returnsStream
instead of a function that returns aCollection
?
There are many reasons for that:
Stream is a means of iteration, i.e. we're not storing the data in the stream, its purpose is to iterate lazily many over the source of data, which can be a
String
, Array, IO-Stream, etc.Secondly, Stream operations are divided into two groups: terminal, which are meant to produce the result and terminate the execution of the stream pipeline (i.e. it's not possible to apply any operation after a terminal one), and intermediate operations, which transform the stream. Intermediate operations are always lazy. A stream takes elements from the source one-by-one and processes them lazily, i.e. operations occur only when needed. Don't a new stream with a chain of nested
for
-loops, they act differently. Every intermediate operation produces a new stream.
Here's a quote from the API documentation:
Streams differ from collections in several ways:
No storage. A stream is not a data structure that stores elements; instead, it conveys elements from a source such as a data structure, an array, a generator function, or an I/O channel, through a pipeline of computational operations.
Laziness-seeking. Many stream operations, such as filtering, mapping, or duplicate removal, can be implemented lazily, exposing opportunities for optimization. For example, "find the first String with three consecutive vowels" need not examine all the input strings. Stream operations are divided into intermediate (Stream-producing) operations and terminal (value- or side-effect-producing) operations. Intermediate operations are always lazy.
- Since Stream are internal iterators over the source of data which can have a different nature (not necessarily a
Collectoin
) it's reasonable forflatMap()
to expect data in a predictable uniform shape, not an Array, Collection, Iterable, etc. but another internal iterator, i.e. another Stream, so that's obvious how to deal with it.
Any option that you can up with would be less intuitive. If flatMap()
was implemented in such a way so that it would expect a function producing Collection
how would you deal with strings, arrays, IO-Streams, various implementations of Iterable
? By dumping the data into a Collection - that's not an option. Same issue would arise if we imagine that flatMap()
required Iterable
, how would we produce Iterable
from a String
? Streams are designed to be versatile.
I suspect that your judgement regarding flatMap()
is biased because you are not accustomed to it. When you embrace the idea that a Stream is an Internal Iterator, the fact that operation for flattening the data expect function producing another iterator would be perceived as more intuitive.
Answered By - Alexander Ivanchenko
Answer Checked By - Robin (JavaFixing Admin)