Java Stream Collectors Explained

2020-01-02 · 5 min

With Java 8 came one of the greatest additions to Java: the Stream API.

It made processing a stream of data very convenient by allowing us to chain operations together, lazily, and perform the actual data processing by ending a fluent call with a terminal operation.

java.util.Stream provides two different terminal operations named collect(...), which will perform a mutable reduction:

A mutable reduction operation accumulates input elements into a mutable result container, such as a Collection or StringBuilder, as it processes the elements in the stream.
- Oracle

Table of Contents

Batteries Included

Java 8 provides us with a range of 37 different Collectors in the class java.util.stream.Collectors, which can roughly be divided into three separate groups:

Reducing/summarizing into a single value or collection type
Everything from joining Strings with joining() to creating new Collections with toSet() to even leveraging new features like summaries of numeric streams with summarizingInt(...) — and much more.
Grouping
Three different ways to use groupingBy(...) and another three for concurrent/parallel processing.
Partitioning
Two partitionBy(...) methods available.

And the best thing: We’re not restricted to the provided Collectors. If we need some more unique handling, we can always create our own.

Collector<T, A, R>

If you ever checked out some of the source code of the Stream API, you’ll find a lot of generics and a lot of not easily readable or comprehensible code.

This originates from Java itself because it wasn’t easy to implement functional programming features without changing its core or changing the language itself. But they managed to add these great new features without compromising backward compatibility with some intimidating-looking code — at least at first glance.

Every Collector must implement the interface Collector<T, A, R>:

java

interface Collector<T, A, R> {

    // Supplier that creates a new mutable result container.
    Supplier<A> supplier();

    // Function that folds a value into the result container. 
    BiConsumer<A, T> accumulator();

    // Merges two partial results.
    BinaryOperator<A> combiner();

    // Perform the final transformation from the intermediate accumulation type
    // "A" to the final result type "R".
    Function<A, R> finisher();

    // Returns a Set of Collector.Characteristics indicating
    // Collectors characteristics.
    Set<Characteristics> characteristics();

    // <Multiple default methods omitted>
}

Let’s dissect the interface a little bit to understand better what’s going on.

Generic types

The interface consists of three generic types:

T: – the type of input elements to the reduction operation
A: – the mutable accumulation type of the reduction operation.
The accumulator object type for keeping partial results during the collection process.
R: – the result type of the reduction operation. The actual return type of the collection process.

Methods

The methods make more sense knowing what the generic types represent:

supplier()
Provides a Supplier<A> used for creating new instances of accumulator objects.
accumulator()
The core of the Collector, including a BiConsumer<A, T> responsible for accumulating stream elements of type T into an accumulator object.
combiner()
In the case of parallel processing a Stream, the Collector might create multiple accumulator objects. The combiner provides the functionality to merge the results.
finisher()
Finishes the collection process by transforming an accumulator object into the return type R.
characteristics()
Describes the characteristics of the Collector.

Collector characteristics

The characteristics of a Collector can be used to optimize the implementation of the reduction operation. Any combination of these three characteristics is possible:

Collector.Characteristics.CONCURRENT
Indicates the accumulator objects support parallel or concurrent processing.
Collector.Characteristics.IDENTITY_FINISH
Indicates the finisher function is the identity function so the accumulator might be cast directly in the result type.
Collector.Characteristics.UNORDERED
Indicates the order of elements in the stream isn’t necessarily preserved.

Example: Joining Strings

Java already provides a Collector for joining Strings with a delimiter, but it makes for a good example to implement ourselves:

java

public class Joinector implements Collector<CharSequence, StringJoiner, String> {

    private final CharSequence delimiter;

    public Joinector(CharSequence delimiter) {
        this.delimiter = delimiter;
    }

    @Override
    public Supplier<StringJoiner> supplier() {
        // The accumulator object creation.
        return () -> new StringJoiner(this.delimiter);
    }

    @Override
    public BiConsumer<StringJoiner, CharSequence> accumulator() {
        // How to add new stream elements to the accumulator object.
        return StringJoiner::add;
    }

    @Override
    public BinaryOperator<StringJoiner> combiner() {
        // How to merge different accumulator objects together.
        return StringJoiner::merge;
    }

    @Override
    public Function<StringJoiner, String> finisher() {
        // How to extract the final result.
        return StringJoiner::toString;
    }

    @Override
    public Set<Characteristics> characteristics() {
        // Special characteristics of our Collector.
        return Collections.emptySet();
    }
}

Simple enough — but it’s still a lot of code for very little functionality.

The interface Collector provides the static method of(...) to create a Collector in a more functional way, helping us to reduce the need for an extra class:

java

public static Collector<CharSequence, StringJoiner, String> joinector(CharSequence delimiter) {
    return Collector.of(() -> new StringJoiner(delimiter), // supplier
                        StringJoiner::add,                 // accumulator
                        StringJoiner::merge,               // combiner
                        StringJoiner::toString);           // finisher
}

Now we can combine our custom Collector creator methods in an interface or non-instantiable class — as Java did with java.util.Collectors for more straightforward usage.

What About reduce(…)?

Instead of a Collector, we could also use Stream#reduce(...) to achieve similar results. The difference between the two is more subtle. A reduce operation creates a new value by combining two values in an immutable way.

A collect operation, however, is working with accumulate-objects in a mutable way and uses a finisher to obtain the final result.

Which one you should prefer depends on your requirements — considering the actual intended purpose, performance considerations, etc.

Conclusion

Creating a custom Collector isn’t complicated once you understand the general concepts behind them.

By combining our custom Collector creator methods the way Java did, we can use and share our Collectors throughout our projects.

Resources

java.util.Stream package (Oracle)
java.util.stream.Collectors (Oracle)
Processing Data with Java SE 8 Streams (Oracle)

#java #functional

Support Me on Ko-fi