Java 8 gave us the Stream API, a lazy-sequential data pipeline of functional blocks.
It isn’t implemented as a data structure or by changing its elements directly. It’s just a dumb pipe providing the scaffolding to operate on, making it really a smart pipe.
The basic concept behind streams is simple: We got a data source, perform zero or more intermediate operations, and get a result.
The parts of a stream can be separated into three groups:
- Obtaining the stream (source)
- Doing the work (intermediate operations)
- Getting a result (terminal operation)
Obtaining the stream
The first step is obtaining a stream. Many data structures of the JDK already support providing a stream:
Doing the work
java.util.Stream interface provides a lot of different operations.
map(Function<? super T, ? extends R> mapper)
mapToInt(ToIntFunction<? super T> mapper)
mapToLong(ToLongFunction<? super T> mapper)
mapToDouble(ToDoubleFunction<? super T> mapper)
flatMap(Function<? super T, ? extends Stream<? extends R>> mapper)
flatMapToInt(Function<? super T, ? extends IntStream> mapper)
flatMapToLong(Function<? super T, ? extends LongStream> mapper)
flatMapToDouble(Function<? super T, ? extends DOubleStream> mapper)
Getting a result
Performing operations on the stream elements is great. But at some point, we want to get a result back from our data pipeline.
Terminal operations are initiating the lazy pipeline to do the actual work and don’t return a new stream.
Aggregate to new collection/array
R collect(Collector<? super T, A, R> collector)
R collect(Supplier<R> supplier, BiConsumer<R, ? super T> accumulator, BiConsumer<R, R> combiner)
A toArray(IntFunction<A> generator)
Reduce to a single value
T reduce(T identity, BinaryOperator<T> accumulator)
Optional<T> reduce(BinaryOperator<T> accumulator)
U reduce(U identity, BiFunction<U, ? super T, U> accumulator, BinaryOperator<U> combiner)
Optional<T> min(Comparator<? super T> comparator)
Optional<T> max(Comparator<? super T> comparator)
boolean allMatch(Predicate<? super T> predicate)
boolean anyMatch(Predicate<? super T> predicate)
boolean noneMatch(Predicate<? super T> predicate)
Streams aren’t just glorified loops. Sure, we can express any stream with a loop — and most loops with streams. But this doesn’t mean they’re equal or one is always better than the other.
The most significant advantage of streams over loops is laziness. Until we call a terminal operation on a stream, no work is done. We can build up our processing pipeline over time and only run it at the exact time we want it to.
And not just the building of the pipeline is lazy. Most intermediate operations are lazy, too. Elements are only consumed as they’re needed.
Even though Java allows the building of stateful lambdas, we should always strive to design them to be stateless. Any state can have severe impacts on safety and performance and might introduce unintended side effects.
Thanks to being (mostly) stateless, streams can optimize themselves quite efficiently. Stateless intermediate operations can be fused together to a combined consumer. Redundant operations might be removed. And some pipeline paths might be short-circuited.
The JVM will optimize traditional loops, too. But streams are an easier target due to their multioperation design and are mostly statelessness.
Being just a dumb pipeline, streams can’t be reused. But they don’t change the original data source — we can always create another stream from the source.
Streams are often easier to read and comprehend.
This is a simple data-processing example with a
This code is equivalent to:
We have a shorter codeblock, clearer operations, no loop boilerplate, and no extra temporary variables. All packaged in a fluent API. This way, our code reflects the what, and we no longer need to care about the actual iteration process, the how.
Concurrency is hard to do right and easy to do wrong.
Streams support parallel execution (
forkJoin) and remove much of the overhead if we’re doing it ourselves.
A stream can be parallelized by calling the intermediate operation
parallel() and turned back to sequential by calling
But not every stream pipeline is a good match for parallel processing.
The source must be big enough and the operations costly enough to justify the overhead of multiple threads. Context switches are expensive. We shouldn’t parallelize a stream just because we can.
Just like with functional interfaces, streams have specialized classes for dealing with primitives to avoid autoboxing/unboxing:
Best Practices and Caveats
Lambdas can be simple one-liners or huge code blocks if wrapped in curly braces. To retain the simplicity and conciseness, we should restrict ourselves to these two use cases for operations:
- One-line expressions
.filter(album -> album.getYear() > 4)
- Method references
By using method references, we can have more complex operations, reuse operational logic, and even unit test it more easily.
Not only simplicity and conciseness are affected by using method references. There are also implications on the bytecode level.
The bytecode between a lambda and a method reference differs slightly — with the method reference generating less. A lambda might be translated into an anonymous class calling the body, creating more code than needed.
Also, by using method references, we lose the visual noise of the lambda:
Cast and Type Checks
Don’t forget that
Class<T> is an object, too, providing many helpful methods:
Return a value or check for null
Intermediate operations should either return a value or handle
null in the next operation.
Adding a simple
.filter(Objects::nonNull) might be enough to ensure no NPEs.
By putting each pipeline step into a new line, we can improve readability:
It also allows us to set breakpoints at the correct pipeline step easier.
Not every Iteration is a stream
As written before, we shouldn’t replace every loop. Just because it iterates, doesn’t make it a valid target for stream-based processing. Often a traditional loop might be a better choice than using
forEach(...) on a stream.
We can access variables outside of intermediate operations, as long as they are in scope and effectively final.
This means it’s not allowed to change after initialization.
But doesn’t need an explicit
And by just re-assigning to a new variable we can make it effectively final:
Sometimes this restriction seems cumbersome, and we can change the state of effectively final objects, as long as the variable is final. But doing so undermines the concept of immutability and introduces unintended side effects.
Streams and Exceptions are a subject that warrants their own article(s), but I’ll try to summarize it.
This code won’t compile:
By refactoring the
className conversion to a dedicated method, we can retail the simplicity of the stream:
We still need to handle possible
null values, but the checked exception isn’t visible in the stream code.
Another solution for dealing with checked exceptions is wrapping the intermediate operations in consumers/functions etc. that catch the checked exceptions and rethrowing them as unchecked. But, in my opinion, that’s more like an ugly hack than a valid solution.
If an operation throws a checked exception, we should refactor it to a method and handle its exception accordingly.
Even if we handle all checked exceptions, our streams can still blow up thanks to unchecked exceptions.
There’s not a one-size-fits-all solution for preventing exceptions, just as there’s not in any other code. Developer discipline can greatly reduce the risk. Use small, well-defined operations with enough checks and validation. This way we can at least minimize the risk.
Streams can be debugged as any other fluent call. If we have a single operation in a line, a break point will stop accordingly. But the creation of anonymous classes for lambdas can result in a really confusing stack trace.
During development, we could also utilize the intermediate operation
peek(Consumer<? super T> action) to intercept an element.
The operation is mainly for debugging purposes and shouldn’t be used in the stream’s final form.
IntelliJ provides a visual debugger.
Order of operations
Think of a simple stream:
This code will run
map five times,
sorted eight times,
filter five times, and
forEach two times. This means a total of 20 operations to output two values.
If we reorder the pipeline parts, we can reduce the total operations count significantly without changing the actual outcome:
By filtering first, we’re going to restrict the other operations to a minimum:
filter five times,
map two times,
sort one time, and
forEach two times, which saves us 10 operations in total.
Java 9 Enhancements
Java 9 brought four additions to streams:
dropWhile(Predicate<? super T> predicate)
Drops elements until the first
falsepredicate is encountered.
takeWhile(Predicate<? super T> predicate)
Takes elements until the first
falsepredicate is encountered.
Stream<T> ofNullable(T t)` Returns a single-element stream if the nullable is not empty — otherwise, an empty stream.
iterate(T seed, Predicate<? super T> hasNext, UnaryOperator<T> next)
Generates a finite stream, equivalent of a
Stream.iterate(0, i -> i < 10, i -> i + 1).