The three methods,
reduce, are the cornerstone of any functional programming.
Usually, our data pipelines consist of one or more intermediate operations, transforming (aka mapping) and/or filtering elements, and a terminal operation to gather the data again (aka reducing).
With just these three, we can do a lot, so it’s worth knowing them intimately. But they have some close relatives that can be useful, too.
This article assumes Java 9.
Method signatures and visibility modifiers are shortened for readability.
Stream#map(Function<T> mapper) is an intermediate stream operation that transforms each element.
That’s the gist;
map is pretty straightforward to use.
But there are specialized
map functions depending on the types.
Stream#flatMap(Function<T, Stream<R>) is the often-misunderstood sibling of
Sometimes the mapping function will return an arbitrary number of results, wrapped in another type, like
Most likely, we want to work on the list’s content, not the list itself.
flatMap, we can map the
Stream<List<String>> to a
Value-type map / flatMap
We could rely on auto-boxing, but we can’t deny that there’s an added overhead. The JDK includes specialized Stream types to improve dealing with value types:
If our mapping function returns one of the related value types, we could use the corresponding
flatMapTo...(mapper) to create a value-type-based Stream:
IntStream mapToInt(ToIntFunction<T> mapper)
LongStream mapToLong(ToLongFunction<T> mapper)
DoubleStream mapToDouble(ToDoubleFunction<T> mapper)
IntStream flatMapToInt(Function<T, IntStream> mapper)
LongStream flatMapToLong(Function<T, LongStream> mapper)
DoubleStream flatMapToDouble(Function<T, DoubleStream> mapper)
This way, we can get a real array of
long, without intermediate boxing:
As mentioned before,
map is an intermediate operation.
Many other languages use it to perform actions on all elements, discarding any return type, if not
We can use
map just like that too, but there’s a better way.
By utilizing the terminal operation
Stream#forEach(Consumer<T>), we apply the consumer on every element of the stream:
Stream<T>#filter(Predicate<T> predicate) is used for, you guessed it, filtering elements.
If the predicate returns
true, the elements will travel further down the stream:
If we use a variable for the predicate, it’s easily negatable using
Java 11 even provides us with the
static <T> Predicate<T>not(Predicate<T> target) method, so we can use it with a lambda:
Not all of us are already on Java 11. But we can replicate it in a helper class:
The helper class could also be
import static, so we can omit
takeWhile / dropWhile
They are short-circuiting stream operations, not processing all elements of a stream if not necessary.
As long as a stream is ordered, these methods work as intended. In the case of unordered streams, they can easily become non-deterministic. If not all elements match the predicate, the returned elements are arbitrary:
The reason is simple: Because it’s not clear in which order the predicate encounters the elements, the result can’t be deterministic.
A common use case is summing up values:
There are two additional
reduce variants available:
The first one doesn’t require an initial value.
As a consequence, we might not find matching elements to accumulate, hence the
Optional<T> as return type.
The second one is used for parallel streams. The accumulation can be parallelized, and the multiple results are combined.
reduce use cases are already available to us, depending on the stream type:
To better understand
reduce operations, let’s make a naive implementation ourselves:
Collectors are thematically related to
reduce by aggregating elements of a stream. We can achieve similar results with both, but the difference between them is more subtle.
reduce operation creates a new value by combining two values in an immutable way. Collectors, however, are using mutable accumulate objects.
String concatenation with both:
reduce version creates many
String objects because it can only work in an immutable way. But the collector can leverage a mutable accumulation object to reduce instantiations.
Which one we should prefer depends on our requirements, considering the actual intended purpose, performance considerations, etc. If we’re dealing with immutable value types, a typical reduction should be used. But if we need to accumulate into a mutable data structure, a collector might make more sense.
It’s always a good idea to know the most important tools in our (functional) toolbox.
mapapplies as a transformation to an element.
filteraccumulates only elements matching a
reduceaccumulates all elements to a single value, by using immutable values.
This tweet summarizes it perfectly: