Issue
I don't know exactly how Streams work internally, but I have always wondered why Stream#findAny()
exists, when there is Stream#findFirst()
. Find first suggests that streams keep the order of the array/Collection
/Iterator
from which they are created. So why use any? If it is irrelevant which element you are getting, it may as well be the first one. From what I'm thinking, findFirst
should always execute in constant time.
The Javadoc of Stream#findAny()
states:
The behavior of this operation is explicitly nondeterministic; it is free to select any element in the stream. This is to allow for maximal performance in parallel operations; the cost is that multiple invocations on the same source may not return the same result. (If a stable result is desired, use {@link #findFirst()} instead.)
But even then isn't the order of elements known? The provided datastructure stays the same.
Solution
Stream#findAny is used it when we're looking for an element without paying an attention to the encounter order.
Comment from Lino at the time of answer which correctly summarizes the difference:
Take a stream of two elements and process them to get a result. The first takes 400 years to complete. The second 1 minute.
findFirst
will wait the whole 400 years to return the first elements result. WhilefindAny
will return the second result after 1 minute. See the difference? Depending on the circumstances of what exactly you're doing, you just want the fastest result, and don't care about the order.
Consider this code for findFirst
:
List<Integer> list = Arrays.asList(1, 2, 3, 4, 5);
Optional<Integer> result = list
.stream()
.filter(num -> num < 4)
.findFirst(); // .findAny();
Here, findAny
operates as findFirst
, still not guarantee. On my machine, after 10 odd runs, The output is always 1
though.
Now, let's consider a parallel stream, for which findAny
is designed for:
List<Integer> list = Arrays.asList(1, 2, 3, 4, 5);
Optional<Integer> result = list
.stream()
.parallel()
.filter(num -> num < 4)
.findAny();
Now the output on my machine 3
. But it could be anything 1
, 2
, or 3
.
Let's do some benchmarking with inputs favoring findAny
(i.e. the predicate should be true after the middle of the stream).
List<Integer> list = IntStream.rangeClosed(1, 1000000).boxed().collect(Collectors.toList());
long findFirstStartTime = System.currentTimeMillis();
for (int i = 0; i < 10000; i++) {
list.stream().filter(num -> num > 500000).findFirst();
}
long findFirstEndTime = System.currentTimeMillis();
long findAnyStartTime = System.currentTimeMillis();
for (int i = 0; i < 10000; i++) {
list.stream().parallel().filter(num -> num > 500000).findAny();
}
long findAnyEndTime = System.currentTimeMillis();
System.out.println("findFirst time taken: " + (findFirstEndTime - findFirstStartTime));
System.out.println("findAny time taken: " + (findAnyEndTime - findAnyStartTime));
findFirst
will go sequentially, till the middle of the stream and findAny
will go as the almighty wants. The results are staggering:
findFirst time taken: 29324
findAny time taken: 623
Using JMH Benchmarking with params:
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Measurement(iterations = 10)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
Benchmark Mode Cnt Score Error Units
FindAny.findAny avgt 50 191700.823 ± 251565.289 ns/op
FindAny.findFirst avgt 50 4157585.786 ± 355005.501 ns/op
Answered By - Harshal Parekh
Answer Checked By - Clifford M. (JavaFixing Volunteer)