-
Notifications
You must be signed in to change notification settings - Fork 215
bug/231/fix regexp restore paren handling #244
bug/231/fix regexp restore paren handling #244
Conversation
lm = regExpCache.lastMatch.replace(esc, '\\$&'), | ||
reg = new List(); | ||
lastMatch = regExpCache.lastMatch.replace(esc, '\\$&'), | ||
exprStr = ''; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not strictly necessary, but I made a couple of changes for readability. One is removing the reg
variable and building up exprStr
from the start of the function. I think this makes it a little clearer that we're processing lastMatch
and adding the processed parts to exprStr
bit by bit.
} | ||
|
||
// Push it to the reg and chop lm to make sure further groups come after | ||
arrPush.call(reg, lm.slice(0, lm.indexOf('(') + 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First significant problem with the previous implementation. lm.indexOf('(')
is looking for the paren we added on line 170, but it could also potentially catch escaped parens that occur earlier in the string.
|
||
// Shorten the regex by replacing each part of the expression with a match | ||
// for a string of that exact length. This is safe for the type of | ||
// expressions generated above, because the expression matches the whole | ||
// match string, so we know each group and each segment between capturing | ||
// groups can be matched by its length alone. | ||
exprStr = exprStr.replace(/(\\\(|\\\)|[^()])+/g, (match) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second significant problem with the code. This second operation on exprStr
takes a series of characters other than unescaped parentheses and replaces it with [\s\S]{length}
. This allows our RegEx to work with much longer input strings. However, the prior version of this regex does not handle _un_escaped parentheses that come immediately after escaped backslashes. The new version is much more complicated, but accounts for sequences of escaped and unescaped backslashes and parentheses.
Looking at it now, I think the whole thing will be much more maintainable if we stop using a RegEx here and instead iterate over the characters in exprStr
and check for escaped characters and match groups character-by-character. Happy to work on that tomorrow under this PR or a separate one post-merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@caridy can I suggest we merge this as-is and circle back on the removal of this regex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@caridy: Just had a play around with implementing this bit of functionality via iteration rather than the existing regex-based approach. You can see the changes here. The iteration-based approach didn't turn out to be as clean as I was hoping it would be, so I'd propose we leave the regex-based approach in place.
…storation issues
74edeca
to
d150c95
Compare
d150c95
to
f91377b
Compare
…xRestore Previously this expression contains a fragment similar to /aa*/ - have simplified this to /a+/.
@mtlewis let me know when this is ready for review. |
@caridy hey! Its ready from my perspective 👍 |
thanks @mtlewis, this is looking good! |
Fixes #231.
Comments in line with the code explain the related issues. The test that I added is failing in master, passing with the changes in this branch (though I haven't included the updated build in the PR).