Remaining issues with Java Collections
Introduction
At the time of writing, Java 25 has been available for over a month, showcasing the platform’s continuous evolution. However, some performance challenges with collections remain.
Let’s take a look at what these issues are and how to mitigate them.
ArrayList<E>
When you initialize an ArrayList, it will create an array (Object[]) as backing structure to store the elements[1].
The array is used for every operation: indexOf, contains, get etc…
Using contains(Object o)
Now here’s the first problem.
An array only knows two things about the elements it stores: their order and their respective indexes. This means that for each operation, the work can only be done using these two pieces of information.
Let’s look at the implementation of contains[2]:
public boolean contains(Object o) {
return indexOf(o) >= 0;
}
public int indexOf(Object o) {
return indexOfRange(o, 0, size);
}
int indexOfRange(Object o, int start, int end) {
Object[] es = elementData;
if (o == null) {
for (int i = start; i < end; i++) {
if (es[i] == null) {
return i;
}
}
} else {
for (int i = start; i < end; i++) {
if (o.equals(es[i])) {
return i;
}
}
}
return -1;
}
In indexOfRange, we can see that the lookup is done by iterating over the array and checking for each element if it matches with the given Object o.
That gives an O(n) complexity, which means that the time consumed during a contains will be proportional to the size of the array.
It’s negligible if the collection is small, but what if it’s huge?
Using containsAll(Collection<?> c)
The same logic applies to containsAll(Collection<?> c) but with a O(n*m) time complexity since every element of the given collection has to be checked against the elements of the other one.
In addition, IntelliJ issues a warning about this.
public boolean check(List<String> list, Collection<String> collection) {
// O(n*m) complexity
return list.containsAll(collection);
}
Solutions
As we saw, contains and containsAll implementations are not suitable for every collection size.
A better solution is to leverage a more suitable data structure: HashSet.
It is based on a hash table to provide consistent average-time performance for basic operations such as adding, deleting, and searching.
Using a HashSet
Let’s see the implementation[3]:
transient HashMap<E,Object> map;
public boolean contains(Object o) {
return map.containsKey(o);
}
HashSet uses its underlying HashMap to store the object as a key, which in most cases results in O(1) complexity.
So, if we have a large collection and/or a collection that is frequently looked up (contains, remove), HashSet is definitely a more efficient choice.
Converting the Collection
It is not always possible (or desirable) to replace all ArrayLists with HashSets.
In such cases, creating a temporary HashSet is often an excellent compromise.
Set<String> lookup = new HashSet<>(list);
if (lookup.contains(x)) {
LOG.info("Found");
}
Utility method
public static <T> boolean fastContains(Collection<T> collection, T element) {
if (collection == null || collection.isEmpty()) {
return false;
}
if (collection.size() < 10) {
return collection.contains(element);
}
return new HashSet<>(collection).contains(element);
}
List<String> list = List.of("elem", "element", "el");
if (CollectionUtils.fastContains(list, "element")) {
LOG.info("Found");
}
Utility method (Java 8+)
public static <T> Predicate<T> fastContains(Collection<T> collection) {
if (collection == null || collection.isEmpty()) return t -> false;
if (collection.size() < 10) return collection::contains;
Set<T> set = new HashSet<>(collection);
return set::contains;
}
List<String> list = List.of("elem", "element", "el");
Predicate<String> lookup = CollectionUtils.fastContains(list);
if (lookup.test("el")) {
LOG.info("Found");
}
Using containsAll(Collection<?> c)
Same as before, converting the collection into a HashSet is the way to go:
public boolean check(List<String> list, Collection<String> collection) {
// O(n+m) complexity
return new HashSet<>(list).containsAll(collection);
}
stream()
collection.stream()
.filter(s -> s.length() > 1)
.map(String::toUpperCase)
.sorted()
.distinct()
.toList();
When we call stream()[4] on a collection, certain objects are initialized to prepare the execution of the pipeline: ReferencePipeline, Spliterator etc…
The problem is that this happens regardless of whether the given collection is empty or not.
In other words we initialize objects that we are not sure will be used.
Heinz Kabutz wrote an excellent article on the matter.
Solutions
Adding a guard
The most straighforward solution is to simply adding a check on the emptiness of a collection:
if (collection != null && !collection.isEmpty()) {
List<String> formattedList = collection.stream()
.filter(s -> s.length() > 1)
.map(String::toUpperCase)
.toList();
}
Using Java 8
It can also be done in more functional way:
List<String> formattedList = Optional.ofNullable(collection)
.filter(c -> !c.isEmpty())
.map(coll -> .stream()
.filter(s -> s.length() > 1)
.map(String::toUpperCase)
.toList())
.orElseGet(Collections::emptyList);
Wrapping up
To sum things up:
- Choose the right structure for the job: Using a
HashSetor converting anArrayListis highly beneficial when the collection is large and/or requires frequent lookups, offering a significant performance boost. - Initialize only when you are certain to use it: As we’ve seen with Stream pipelines, even an empty collection can trigger the creation of multiple intermediate objects before being discarded.
References
- ArrayList.java#L139: elementData
- ArrayList.java#L275: contains(Object o)
- HashSet.java#L213: contains(Object o)
- Collection.java#L747: stream()
Demo
A showcase of the concepts illustrated in this post is available here: collections-issues-benchmark