Skip to content
This repository was archived by the owner on Apr 14, 2021. It is now read-only.

bug/231/fix regexp restore paren handling #244

Merged

Conversation

mtlewis
Copy link
Contributor

@mtlewis mtlewis commented Oct 20, 2016

Fixes #231.

Comments in line with the code explain the related issues. The test that I added is failing in master, passing with the changes in this branch (though I haven't included the updated build in the PR).

lm = regExpCache.lastMatch.replace(esc, '\\$&'),
reg = new List();
lastMatch = regExpCache.lastMatch.replace(esc, '\\$&'),
exprStr = '';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not strictly necessary, but I made a couple of changes for readability. One is removing the reg variable and building up exprStr from the start of the function. I think this makes it a little clearer that we're processing lastMatch and adding the processed parts to exprStr bit by bit.

}

// Push it to the reg and chop lm to make sure further groups come after
arrPush.call(reg, lm.slice(0, lm.indexOf('(') + 1));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First significant problem with the previous implementation. lm.indexOf('(') is looking for the paren we added on line 170, but it could also potentially catch escaped parens that occur earlier in the string.


// Shorten the regex by replacing each part of the expression with a match
// for a string of that exact length. This is safe for the type of
// expressions generated above, because the expression matches the whole
// match string, so we know each group and each segment between capturing
// groups can be matched by its length alone.
exprStr = exprStr.replace(/(\\\(|\\\)|[^()])+/g, (match) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second significant problem with the code. This second operation on exprStr takes a series of characters other than unescaped parentheses and replaces it with [\s\S]{length}. This allows our RegEx to work with much longer input strings. However, the prior version of this regex does not handle _un_escaped parentheses that come immediately after escaped backslashes. The new version is much more complicated, but accounts for sequences of escaped and unescaped backslashes and parentheses.

Looking at it now, I think the whole thing will be much more maintainable if we stop using a RegEx here and instead iterate over the characters in exprStr and check for escaped characters and match groups character-by-character. Happy to work on that tomorrow under this PR or a separate one post-merge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@caridy can I suggest we merge this as-is and circle back on the removal of this regex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@caridy: Just had a play around with implementing this bit of functionality via iteration rather than the existing regex-based approach. You can see the changes here. The iteration-based approach didn't turn out to be as clean as I was hoping it would be, so I'd propose we leave the regex-based approach in place.

@mtlewis mtlewis force-pushed the bug/231/fix-regexp-restore-paren-handling branch 2 times, most recently from 74edeca to d150c95 Compare October 21, 2016 09:44
@mtlewis mtlewis force-pushed the bug/231/fix-regexp-restore-paren-handling branch from d150c95 to f91377b Compare October 21, 2016 13:06
…xRestore

Previously this expression contains a fragment similar to /aa*/ - have simplified this to /a+/.
@caridy
Copy link
Collaborator

caridy commented Nov 11, 2016

@mtlewis let me know when this is ready for review.

@mtlewis
Copy link
Contributor Author

mtlewis commented Nov 11, 2016

@caridy hey! Its ready from my perspective 👍

@caridy caridy merged commit c3ad4d4 into andyearnshaw:master Nov 16, 2016
@caridy
Copy link
Collaborator

caridy commented Nov 16, 2016

thanks @mtlewis, this is looking good!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants