Functional Programming With Java: map, filter, reduce
The concepts of map
, filter
, and reduce
, are a cornerstone of any functional programming.
Usually, our data pipelines consist of one or more intermediate operations, transforming (aka mapping) and/or filtering elements, and a terminal operation to gather the data again (aka reducing).
With just these three, we can do a lot, so it’s worth knowing them intimately. But they have some close relatives that can be useful, too.
Table of Contents
This article assumes Java 9.
Method signatures and visibility modifiers are shortened for readability.
map
Stream#map(Function<T> mapper)
is an intermediate stream operation that transforms each element.
It applies its argument, a Function<T, R>
, and returns a Stream<R>
:
That’s the gist; map
is pretty straightforward to use.
But there are specialized map
functions depending on the type.
flatMap
Stream#flatMap(Function<T, Stream<R>)
is the often-misunderstood sibling of map
.
Sometimes the mapping function will return an arbitrary number of results, wrapped in another type, like java.util.List
:
Most likely, we want to work on the list’s content, not the list itself.
By using flatMap
, we can map the Stream<List<String>>
to a Stream<String>
:
Optional#flatMap
In the case of java.util.Optional<T>
, the flatMap
method is used to flatten the Optional
back to its content:
Actually, the implementation of flatMap
is even doing less than map
by omitting to repackage the mapper’s returned value into a new Optional
.
Value-type map / flatMap
Until Project Valhalla with generic specialization arrives, handling with value types and generics is always a special case.
We could rely on auto-boxing, but we can’t deny that there’s an added overhead. The JDK includes specialized Stream types to improve dealing with value types:
If our mapping function returns one of the related value types, we could use the corresponding mapTo...(mapper)
/ flatMapTo...(mapper)
to create a value-type-based Stream:
IntStream mapToInt(ToIntFunction<T> mapper)
LongStream mapToLong(ToLongFunction<T> mapper)
DoubleStream mapToDouble(ToDoubleFunction<T> mapper)
IntStream flatMapToInt(Function<T, IntStream> mapper)
LongStream flatMapToLong(Function<T, LongStream> mapper)
DoubleStream flatMapToDouble(Function<T, DoubleStream> mapper)
This way, we can get a real array of long
, without intermediate boxing:
forEach
As mentioned before, map
is an intermediate operation.
Many other languages use it to perform actions on all elements, discarding any return type, if not void
.
We can use map
just like that too, but there’s a better way.
By utilizing the terminal operation Stream#forEach(Consumer<T>)
, we apply the consumer on every element of the stream:
filter
Stream<T>#filter(Predicate<T> predicate)
is used for, you guessed it, filtering elements.
If the predicate returns true
, the elements will travel further down the stream:
If we use a variable for the predicate, it’s easily negatable using Predicate<T>#negate()
.
Java 11 even provides us with the static <T> Predicate<T>not(Predicate<T> target)
method, so we can use it with a lambda:
Not all of us are already on Java 11. But we can replicate it in a helper class:
The helper class could also be import static
, so we can omit StreamHelpers
.
takeWhile / dropWhile
The two methods, takeWhile
and dropWhile
, are close relatives to filter
.
Their names are pretty self-explanatory.
They are short-circuiting stream operations, not processing all elements of a stream if not necessary.
If the predicate returns false
, the rest of the stream is discarded (takeWhile
), or everything before is discarded (dropWhile
):
Unordered streams
As long as a stream is ordered, these methods work as intended. In the case of unordered streams, they can easily become non-deterministic. If not all elements match the predicate, the returned elements are arbitrary:
The reason is simple: Because it’s not clear in which order the predicate encounters the elements, the result can’t be deterministic.
Parallel streams
Due to the ordered nature of the methods, using them in parallel streams is quite expensive, impacting overall performance. Usually, a sequential stream is a better choice for takeWhile
or dropWhile
.
reduce
The reduce
method, also known as fold in functional programming lingo, accumulates the elements of the stream with a BinaryOperator<T>
and reduces them to a single value:
A common use case is summing up values:
There are two additional reduce
variants available:
The first one doesn’t require an initial value.
As a consequence, we might not find matching elements to accumulate, hence the Optional<T>
as return type.
The second one is used for parallel streams. The accumulation can be parallelized, and the multiple results are combined.
count/sum/min/max
Common reduce
use cases are already available to us, depending on the stream type:
To better understand reduce
operations, let’s make a naive implementation ourselves:
Collectors
Collectors are thematically related to reduce
by aggregating elements of a stream. We can achieve similar results with both, but the difference between them is more subtle.
A reduce
operation creates a new value by combining two values in an immutable way. Collectors, however, are using mutable accumulate objects.
Let’s implement String
concatenation with both:
The reduce
version creates many String
objects because it can only work in an immutable way. But the collector can leverage a mutable accumulation object to reduce instantiations.
Which one we should prefer depends on our requirements, considering the actual intended purpose, performance considerations, etc. If we’re dealing with immutable value types, a typical reduction should be used. But if we need to accumulate into a mutable data structure, a collector might make more sense.
Conclusion
It’s always a good idea to know the most important tools in our (functional) toolbox.
map
applies as a transformation to an element.filter
accumulates only elements matching aPredicate<T>
.reduce
accumulates all elements to a single value, by using immutable values.
This tweet summarizes it perfectly: