The three methods,
reduce, are the cornerstone of any functional programming.
Usually, our data pipelines consist of one or more intermediate operations, transforming (aka mapping) and/or filtering elements, and a terminal operation to gather the data again (aka reducing).
With just these three, we can do a lot, so it’s worth knowing them intimately. But they have some close relatives that can be useful, too.
This article assumes Java 9.
Method signatures and visibility modifiers are shortened for readability.
Stream#map(Function<T> mapper) is an intermediate stream operation that transforms each element.
It applies its argument, a
Function<T, R>, and returns a
That’s the gist;
map is pretty straightforward to use.
But there are specialized
map functions depending on the type.
Stream#flatMap(Function<T, Stream<R>) is the often-misunderstood sibling of
Sometimes the mapping function will return an arbitrary number of results, wrapped in another type, like
Most likely, we want to work on the list’s content, not the list itself.
flatMap, we can map the
Stream<List<String>> to a
In the case of
flatMap method is used to flatten the
Optional back to its content:
Actually, the implementation of
flatMap is even doing less than
map by omitting to repackage the mapper’s returned value into a new
Value-type map / flatMap
Until Project Valhalla with generic specialization arrives, handling with value types and generics is always a special case.
We could rely on auto-boxing, but we can’t deny that there’s an added overhead. The JDK includes specialized Stream types to improve dealing with value types:
If our mapping function returns one of the related value types, we could use the corresponding
flatMapTo...(mapper) to create a value-type-based Stream:
IntStream mapToInt(ToIntFunction<T> mapper)
LongStream mapToLong(ToLongFunction<T> mapper)
DoubleStream mapToDouble(ToDoubleFunction<T> mapper)
IntStream flatMapToInt(Function<T, IntStream> mapper)
LongStream flatMapToLong(Function<T, LongStream> mapper)
DoubleStream flatMapToDouble(Function<T, DoubleStream> mapper)
This way, we can get a real array of
long, without intermediate boxing:
As mentioned before,
map is an intermediate operation.
Many other languages use it to perform actions on all elements, discarding any return type, if not
We can use
map just like that too, but there’s a better way.
By utilizing the terminal operation
Stream#forEach(Consumer<T>), we apply the consumer on every element of the stream:
Stream<T>#filter(Predicate<T> predicate) is used for, you guessed it, filtering elements.
If the predicate returns
true, the elements will travel further down the stream:
If we use a variable for the predicate, it’s easily negatable using
Java 11 even provides us with the
static <T> Predicate<T>not(Predicate<T> target) method, so we can use it with a lambda:
Not all of us are already on Java 11. But we can replicate it in a helper class:
The helper class could also be
import static, so we can omit
takeWhile / dropWhile
The two methods,
dropWhile, are close relatives to
Their names are pretty self-explanatory.
They are short-circuiting stream operations, not processing all elements of a stream if not necessary.
If the predicate returns
false, the rest of the stream is discarded (
takeWhile), or everything before is discarded (
As long as a stream is ordered, these methods work as intended. In the case of unordered streams, they can easily become non-deterministic. If not all elements match the predicate, the returned elements are arbitrary:
The reason is simple: Because it’s not clear in which order the predicate encounters the elements, the result can’t be deterministic.
Due to the ordered nature of the methods, using them in parallel streams is quite expensive, impacting overall performance. Usually, a sequential stream is a better choice for
reduce method, also known as fold in functional programming lingo, accumulates the elements of the stream with a
BinaryOperator<T> and reduces them to a single value:
A common use case is summing up values:
There are two additional
reduce variants available:
<U> U reduce(U initialValue, BiFunction<U, T, U> accumulator, BinaryOperator<U> combiner);
The first one doesn’t require an initial value.
As a consequence, we might not find matching elements to accumulate, hence the
Optional<T> as return type.
The second one is used for parallel streams. The accumulation can be parallelized, and the multiple results are combined.
reduce use cases are already available to us, depending on the stream type:
To better understand
reduce operations, let’s make a naive implementation ourselves:
Collectors are thematically related to
reduce by aggregating elements of a stream. We can achieve similar results with both, but the difference between them is more subtle.
reduce operation creates a new value by combining two values in an immutable way. Collectors, however, are using mutable accumulate objects.
String concatenation with both:
reduce version creates many
String objects because it can only work in an immutable way. But the collector can leverage a mutable accumulation object to reduce instantiations.
Which one we should prefer depends on our requirements, considering the actual intended purpose, performance considerations, etc. If we’re dealing with immutable value types, a typical reduction should be used. But if we need to accumulate into a mutable data structure, a collector might make more sense.
It’s always a good idea to know the most important tools in our (functional) toolbox.
mapapplies as a transformation to an element.
filteraccumulates only elements matching a
reduceaccumulates all elements to a single value, by using immutable values.
This tweet summarizes it perfectly: