The Problem

See the bottom of the page for a REPL

I recently needed to group a set of strings by regex. This isn’t a very hard problem but I wanted to utilize Java 8’s streams to do the hard work. The problem is easily solved via Streams collect method and Collectors.groupingBy.

So, given the following data…

vals.add("dog");
vals.add("cat");
vals.add("fish");
vals.add("elephant");

I want to group this data by small words (3 letters or less). Now, I don’t have to use regex for this, but it will illustrate the point.

The first thing to do is to define what I want.

I want the above list of words to be categorized into a few groups…

{
3letter : [ 'dog', 'cat' ]
4letter : [ 'fish' ]
}

But what about “elephant”? I don’t want to lose that, so let’s put that in a ‘default’ group.

{
3letter : [ 'dog', 'cat' ]
4letter : [ 'fish' ],
default : [ 'elephant' ]
}

So let’s build the grouping function. A grouping function is a function that accepts a value and returns a ‘key’ for the value. I’d like to categorize easily via regex, so I want to define my group -> key rules in the following way.

3letter -> .{3}
4letter -> .{4}

Here’s the function that produces the appropriate key based on a regex match. The function accepts a value via apply, finds the first regex match in groupNameToRegex and returns the key from that map.

class RegexBasedGroupingFunction<T> implements Function<T, String> {
final Map<String, String> groupNameToRegex = new HashMap<>();

public RegexBasedGroupingFunction(Map<String, String> groupNameToRegex) {
if (groupNameToRegex != null) {
this.groupNameToRegex.putAll(groupNameToRegex);
}
}

@Override
public String apply(T t) {
return groupNameToRegex.entrySet()
.stream()
.filter(entry -> t.toString().matches(entry.getValue()))
.map(entry -> entry.getKey())
.findFirst()
.orElse("default");
}
}

So, now all that’s left is to apply that function across the stream.

Map<String,String> groupToRegex = new HashMap<>();
groupToRegex.put("3letters",".{3}");
groupToRegex.put("4letters",".{4}");

RegexBasedGroupingFunction<String> f = new RegexBasedGroupingFunction(groupToRegex);

System.out.println(vals.stream().collect(Collectors.groupingBy(f)));

The result is:

Group by arbitrary regex expressions
{default=[elephant], 3letters=[dog, cat], 4letters=[fish]}

REPL