Java Spliterator Explained

2021-03-15 · 7 min

Java has multiple types for traversing elements. My last article showed how java.util.Iterator<T> and java.util.ListIterator<T> can be used to traverse data structures like Collections.

The general concept of iterators exists since Java 1.2, but in Java 8, it got a new relative: java.util.Spliterator<T>.

Table of Contents

The Spliterator

Besides traversing sequences of data, like an Iterator<T>, a Spliterator<T> can also partition it:

Iterator + Splitting => Spliterator

The trySplit() method allows it to partition off some elements of the sequence as another Spliterator<T>.

This particular advantage over Iterator makes it the core component of the Stream API. By splitting up data into apt sub-sequences, it allows parallel processing.

Characteristics

Not every data sequence can be just split into multiple parts.

By being an enabler of parallel processing, the actual processor must know more about the data it’s getting from the Spliterator<T>. For example, if a data sequence must be ordered, a Stream needs to process it differently from an unordered collection.

To provide as much information about the “spliterated” data sequence, a Spliterator<T> can have certain characteristics:

CONCURRENT: Is the underlying element source thread-safe without external synchronization?
DISTINCT: Are all elements unique and x.equals(y) == false for any pair?
IMMUTABLE: Are no changes to the source, like add/remove/replace, allowed during traversal?
NONNULL: Are all elements guaranteed != null?
ORDERED: Is the element source ordered?
SIZED: Is the estimated size before traversal the actual size?
SORTED: Are the elements sorted?
SUBSIZED: Are split-up parts of the source still SIZED?

The characteristics are represented by int, so to provide multiple values, we need to use bitwise or |.

Here are the characteristics of Spliterators created by commonly used collection types:

java

// ArrayList<T>

public int characteristics() {
    return Spliterator.ORDERED | Spliterator.SIZED | Spliterator.SUBSIZED;
}

// PriorityQueue<T>

public int characteristics() {
    return Spliterator.SIZED | Spliterator.SUBSIZED | Spliterator.NONNULL;
}

// HashSet<T>

public int characteristics() {
    return (fence < 0 || est == map.size ? Spliterator.SIZED : 0) |
                Spliterator.DISTINCT;
}

The HashSet<T> example also shows that the characteristics don’t have to be fixed. They can use the surrounding context to make an informed decision.

The Interface

java.util.Spliterator<T> has only 8 methods.

And if we don’t count default methods, the number goes down to 4:

java

// CHARACTERISTICS

int characteristics();

default boolean hasCharacteristics(int characteristics)

// SIZE

long estimateSize();

default long getExactSizeIfKnown();

// COMPARATOR

default Comparator<? super T> getComparator();

// TRAVERSING

boolean tryAdvance(Consumer<? super T> action);

default void forEachRemaining(Consumer<? super T> action);

// SPLITTING

Spliterator<T> trySplit();

With this article being more of a high-level introduction, I won’t go into detail. The methods have quite descriptive names, and they are documented well.

Primitive Types

With Project Valhalla still being in the distant future, handling primitive types with generics is a no-go. So specialized types for primitives are currently our only options, just like specialized Streams.

Spliterator<T> 3 specialized types:

Other primitive types aren’t directly supported because no specialized Consumer types exist besides these 3. We could create our own by utilizing Spliterator.OfPrimitive, but it would need additional functional interfaces not readily available in the JDK.

Create a Spliterator

The simplest way is using the retro-fitted spliterator() method of java.util.Collection, it creates a rudimentary Spliterator<T> without any characteristics. But all collection types I’ve encountered in the JDK provide a more specialized Spliterator<T> with optimized characteristics.

If we need to create a Spliterator<T> for a custom data sequence, we don’t necessarily have to implement the interface ourselves. Instead, we can use the convenience methods of java.util.Spliterators. Just like other types with only static convenience methods, it provides a wide range of options:

java

// Spliterator<T>

<T> Spliterator<T> emptySpliterator()

<T> Spliterator<T> spliterator(Collection<? extends T> c,
                               int characteristics)

<T> Spliterator<T> spliterator(Object[] array,
                               int additionalCharacteristics)

<T> Spliterator<T> spliterator(Object[] array,
                               int fromIndex,
                               int toIndex,
                               int additionalCharacteristics)

<T> Spliterator<T> spliterator(Iterator<? extends T> iterator,
                               long size,
                               int characteristics)

<T> Spliterator<T> spliteratorUnknownSize(Iterator<? extends T> iterator,
                                          int characteristics)

<T> Iterator<T> iterator(Spliterator<? extends T> spliterator)


// Spliterator.OfDouble

Spliterator.OfDouble emptyDoubleSpliterator()

Spliterator.OfDouble spliterator(double[] array,
                                 int additionalCharacteristics)

Spliterator.OfDouble spliterator(double[] array,
                                 int fromIndex,
                                 int toIndex,
                                 int additionalCharacteristics)

PrimitiveIterator.OfDouble iterator(Spliterator.OfDouble spliterator)

Spliterator.OfDouble spliterator(PrimitiveIterator.OfDouble iterator,
                                 long size,
                                 int characteristics)

Spliterator.OfDouble spliteratorUnknownSize(PrimitiveIterator.OfDouble iterator,
                                            int characteristics)

// Spliterator.OfInt

Spliterator.OfInt emptyIntSpliterator()

Spliterator.OfInt spliterator(int[] array,
                              int additionalCharacteristics)

Spliterator.OfInt spliterator(int[] array,
                              int fromIndex,
                              int toIndex,
                              int additionalCharacteristics)

PrimitiveIterator.OfInt iterator(Spliterator.OfInt spliterator)

Spliterator.OfInt spliterator(PrimitiveIterator.OfInt iterator,
                              long size,
                              int characteristics)

Spliterator.OfInt spliteratorUnknownSize(PrimitiveIterator.OfInt iterator,
                                         int characteristics)

// Spliterator.OfLong

Spliterator.OfLong emptyLongSpliterator()

Spliterator.OfLong spliterator(long[] array,
                               int additionalCharacteristics)

Spliterator.OfLong spliterator(long[] array,
                               int fromIndex,
                               int toIndex,
                               int additionalCharacteristics)

PrimitiveIterator.OfLong iterator(Spliterator.OfLong spliterator)

Spliterator.OfLong spliterator(PrimitiveIterator.OfLong iterator,
                               long size,
                               int characteristics)

Spliterator.OfLong spliteratorUnknownSize(PrimitiveIterator.OfLong iterator,
                                          int characteristics)

Especially interesting is the <T> Spliterator<T> spliterator(Iterator<? extends T> iterator, long size, int characteristics) method. With its help, every iterable type is usable as a Stream<T>, even if it doesn’t provide its own Spliterator<T>:

java

class MyCustomDataStructure implements Iterable<MyCustomData> {
    // ...

    int size();
}

var data = new MyCustomDataStructure();

var characteristics = Spliterator.ORDERED
                    | Spliterator.SIZED
                    | Spliterator.SUBSIZED;

var spliterator = Spliterators.spliterator(data,
                                           data.size(),
                                            characteristics);

var asStream = StreamSupport.stream(spliterator,
                                    false);

The utility class java.util.Arrays also provides static convenience methods for creating a Spliterator<T>, but they mostly just wrap a call to Spliterators:

java

<T> Spliterator<T> spliterator(T[] array);

<T> Spliterator<T> spliterator(T[] array,
                               int startInclusive,
                               int endExclusive);
    
Spliterator.OfInt spliterator(int[] array);
   
Spliterator.OfInt spliterator(int[] array,
                              int startInclusive,
                              int endExclusive);
   
Spliterator.OfLong spliterator(long[] array);
    
Spliterator.OfLong spliterator(long[] array,
                               int startInclusive,
                               int endExclusive);

Spliterator.OfDouble spliterator(double[] array);

Spliterator.OfDouble spliterator(double[] array,
                                 int startInclusive,
                                 int endExclusive);

Parallelism

Parallel data processing was also a complex task. The Java Collection Framework types are either not thread-safe, or they provide synchronized wrappers to mitigate. This approach’s problem is thread contention: a conflict over access to a shared data structure between threads, leading to slowdowns while a thread is waiting for another one to finish its work.

Instead of just traversing a single item at a time, a Spliterator<T> can split up a portion of its elements to allow parallel processing, not provide the parallel behaviour itself.

We can make use of this in Streams by calling Collection#parallelStream() or the intermediate Stream operation parallel().

If we look at the actual code, there’s not much of a difference regarding the Spliterator<T>:

java

// Excerpt of java.util.Collection

default Stream<E> stream() {
    return StreamSupport.stream(spliterator(), false);
}

default Stream<E> parallelStream() {
    return StreamSupport.stream(spliterator(), true);
}

The Spliterator<T> facilitates parallelism, but it’s the stream that has to do the actual work.

When to use parallel streams is out of the scope of this article.

The quintessence is that using a parallel stream won’t automagically lead to better performance. It highly depends on the type and amount of data you process, your execution environment, and more.

Here’s a Medium article about possible pitfalls with parallel streams:

Be Careful With Java Parallel Streams

Iterator vs. Spliterator

Even though both Spliterator<T> and Iterator<T> share the concept of iteration, they differ in their scope of application.

Iterator<T> is a universal, non-parallel way of iteration available since Java 1.2. It’s deeply ingrained in the Java Collection Framework. Most likely, many of us have some form of hands-on experience with Iterator<T> or the related java.lang.Iterable<T>, for concurrent modification, or due to the enhanced for-loop.

Spliterator<T> can be used for iterating both the Java Collection Framework types and is the core of Stream API. It also enables parallel traversing if the underlying data allows it. Even though it’s such an essential part of the Stream API, it’s more of a “background” type. We only use it through convenience methods or use already existing types in the JDK.

Resources

#java

Support Me on Ko-fi