How to use regex groups in Java with ease

When searching for a pattern or a group of data while processing text in Java, regular expressions are a useful instrument to work with. Using regular expressions in Java, it is possible to identify data in a subset of other data. E.g. when having a number of groups or when developing a regular expression it might be helpful in case you don’t have to know the position of the group in context of the regular expression. For this purpose there is a feature called named capturing group.

Accessing groups the default way

By default building a regex, accessing the groups defined within the regex is possible by using the index of the group. The index is determined by the position of the group in the regex. So the first group gains 1 the second group 2 and so on.
The example below shows how groups can be accessed without having a named group. In order to access the groups the index positions 0-3 are used.

String text = "anythingWithPatternXXXfirst.second;any.thingXXXSomethingThatLooksLikeRandom";
Pattern pattern = Pattern.compile("XXX(.*)\\.(.*);(.*)\\.(.*)XXX");
Matcher matcher = pattern.matcher(text);
matcher.find();
String first = matcher.group(0);
String second = matcher.group(1);
String third = matcher.group(2);
String last = matcher.group(3);

// prints "first:second:any:thing"
System.out.println(first + ":" + second + ":" + third + ":" + last);

Using named capturing groups

The second and much more comfortable way is to use named capturing groups. In comparsion to the first example the regular expression must be extended by markers that describe the name of the named capturing group. So it is possible to use the name of the capturing group instead of the index.
Escpecially when having really really complex regular expressions it is definately an advantage accessing the groups by name instead of accessing them by index. That way it is much more simple to modify or to maintain the regular expression.

String text = "anythingWithPatternXXXfirst.second;any.thingXXXSomethingThatLooksLikeRandom";
Pattern pattern = Pattern.compile("XXX(?<first>.*)\\.(?<second>.*);(?<third>.*)\\.(?<last>.*)XXX");
Matcher matcher = pattern.matcher(text);
matcher.find();
String first = matcher.group("first");
String second = matcher.group("second");
String third = matcher.group("third");
String last = matcher.group("last");

// prints "first:second:any:thing"
System.out.println(first + ":" + second + ":" + third + ":" + last);

Accessing the groups by name isn’t magic at all. But there is no good reason to access them by index. Only a name must be provided within the regular expression, thats all. Hopefully this helps to develop, maintain and read regular expressions in Java with more ease.

Sources

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.