Java has multiple types for traversing elements of a source.
My last article showed how
java.util.ListIterator<T> can be used to traverse data structures like Collections.
The concept of iterators is supported since Java 1.2, but got a new relative,
java.util.Spliterator<T>, in Java 8.
Besides traversing sequences of data, like an
Spliterator<T> can also partition it:
Iterator + Splitting => Spliterator
trySplit() method allows it to partition off some elements of the sequence as another
This particular advantage over Iterator makes it the core component of the Stream API. By splitting up data into apt sub-sequences, it allows parallel processing.
Not every data sequence can be just split into multiple parts.
By being an enabler of parallel processing, the actual processor must know more about the data it’s getting from the
For example, if a data sequence must be ordered, a Stream needs to process it differently from an unordered collection.
To provide as much information about the “spliterated” data sequence, a
Spliterator<T> can have certain characteristics:
Is the underlying element source thread-safe without external synchronization?
Are all elements unique and
x.equals(y) == falsefor any pair?
Are no changes to the source, like add/remove/replace, allowed during traversal?
Are all elements guaranteed
Is the element source ordered?
Is the estimated size before traversal the actual size?
Are the elements sorted?
The characteristics are represented by
int, so to provide multiple values, we need to use bitwise or
Here are the characteristics of Spliterators created by commonly used collection types:
HashSet<T> example also shows that the characteristics don’t have to be fixed. They can use the surrounding context to make an informed decision.
java.util.Spliterator<T> has only 8 methods.
And if we don’t count
default methods, the number goes down to 4:
With this article being more of a high-level introduction, I won’t go into detail. The methods have quite descriptive names, and they are documented well.
With Project Valhalla still being in the distant future, handling primitive types with generics is a no-go. So specialized types for primitives are currently our only options, just like specialized Streams.
Spliterator<T> 3 specialized types:
Other primitive types aren’t directly supported because no specialized
Consumer types exist besides these 3.
We could create our own by utilizing
Spliterator.OfPrimitive, but it would need additional functional interfaces not readily available in the JDK.
Create a Spliterator
The simplest way is using the retro-fitted
spliterator() method of
java.util.Collection, it creates a rudimentary
Spliterator<T> without any characteristics.
But all collection types I’ve encountered in the JDK provide a more specialized
Spliterator<T> with optimized characteristics.
If we need to create a
Spliterator<T> for a custom data sequence, we don’t necessarily have to implement the interface ourselves.
Instead, we can use the convenience methods of
Just like other types with only
static convenience methods, it provides a wide range of options:
Especially interesting is the
<T> Spliterator<T> spliterator(Iterator<? extends T> iterator, long size, int characteristics) method.
With its help, every iterable type is usable as a
Stream<T>, even if it doesn’t provide its own
The utility class
java.util.Arrays also provides
static convenience methods for creating a
Spliterator<T>, but they mostly just wrap a call to
Parallel data processing was also a complex task. The Java Collection Framework types are either not thread-safe, or they provide
synchronized wrappers to mitigate.
This approach’s problem is thread contention: a conflict over access to a shared data structure between threads, leading to slowdowns while a thread is waiting for another one to finish its work.
Instead of just traversing a single item at a time, a
Spliterator<T> can split up a portion of its elements to allow parallel processing, not provide the parallel behaviour itself.
We can make use of this in Streams by calling
Collection#parallelStream() or the intermediate Stream operation parallel().
If we look at the actual code, there’s not much of a difference regarding the
Spliterator<T> facilitates parallelism, but it’s the stream that has to do the actual work.
When to use parallel streams is out of the scope of this article.
The quintessence is that using a parallel stream won’t automagically lead to better performance. It highly depends on the type and amount of data you process, your execution environment, and more.
Here’s a Medium article about possible pitfalls with parallel streams:
Iterator vs. Spliterator
Even though both
Iterator<T> share the concept of iteration, they differ in their scope of application.
Iterator<T> is a universal, non-parallel way of iteration available since Java 1.2.
It’s deeply ingrained in the Java Collection Framework.
Most likely, many of us have some form of hands-on experience with
Iterator<T> or the related
java.lang.Iterable<T>, for concurrent modification, or due to the enhanced for-loop.
Spliterator<T> can be used for iterating both the Java Collection Framework types and is the core of Stream API.
It also enables parallel traversing if the underlying data allows it.
Even though it’s such an essential part of the Stream API, it’s more of a “background” type.
We only use it through convenience methods or use already existing types in the JDK.
- Spliterator (Oracle)
- Java Tutorial Parallelism (Oracle)
- Java Stream API (Oracle)
- Java Collection Framework (Oracle)
- Iterator (Oracle)
- How to Iterate with Java (BDD)