.Net — Regex pattern for lazy balanced group matching — Rahul Singla
Rahul.t('String to be translated.');
This again was a pretty good candidate to be handled in PowerShell, the only real challenge was to be able to effectively locate all invocations to this method in a file. I was aware of the balancing group extension to Regular Expressions available in .NET, and I decided to try it once instead of the traditional alternative of the string searching in a loop.
Fortunately, MSDN itself provided a pretty good example for matching balanced group definition as follows:
string pattern = "^[^<>]*" + "(" + "((?'Open'<)[^<>]*)+" + "((?'Close-Open'>)[^<>]*)+" + ")*" + "(?(Open)(?!))$";
According to MSDN itself, the above example:
demonstrates using a balancing group definition to match left and right angle brackets (<>) in an input string. The capture collections of the Open and Close groups in the example are used like a stack to track matching pairs of angle brackets: each captured left angle bracket is pushed into the capture collection of the Open group; each captured right angle bracket is pushed into the capture collection of the Close group; and the balancing group definition ensures there is a matching right angle bracket for each left angle bracket.
This was preety good and I adapted it to the following to instead match for balanced opening and closing braces:
string pattern = "(" + "((?'Open'\()[^\(\)]*)+" + "((?'Close-Open'\))[^\(\)]*)+" + ")*" + "(?(Open)(?!))$";
However it looked like a greedy match to me, which it actually was when I actually tried it. Greedy match as you might know means that it starts with the first matching character and continues as far as possible till the last matching character that can be matched successfully. This meant that the whole match started at the first Rahul.t method call and ended at the last Rahul.t method call’s ending round bracket, which clearly is undesirable.
It took me sometime to figure out a lazy pattern for matching balanced round braces idenifying each of the method calls to Rahul.t. The successfuly pattern turned out to be the following:
string pattern = "Rahul.t(" + "((?<Open>\()[^\(\)]*)+" + "((?<Close-Open>\))[^\(\)]*?)+" + ")+?" + "(?(Open)(?!))";
Compared to the above pattern, my pattern removes the ending “$” removing forcing matching at the end of the string. Further it adds a “?” in third and fourth lines of the pattern forcing a lazy match for as few characters as possible while looking for balancing closing braces.
Originally published at https://www.rahulsingla.com on March 16, 2018.