fix: prevent apostrophes in docstrings from breaking parsing#695
Merged
boyter merged 1 commit intoboyter:masterfrom Apr 13, 2026
Merged
Conversation
In docStringState, the string trie was used to detect the end of a docstring. Since the trie matches any string delimiter (including ' and "), an apostrophe inside a """ docstring would be incorrectly matched, causing the state machine to exit the docstring prematurely. Replace the trie match with checkForMatchSingle, which checks specifically for the endString that started the current docstring. Fixes boyter#246
There was a problem hiding this comment.
Pull request overview
This PR fixes Python docstring parsing so that apostrophes (') inside triple-quoted docstrings (""" / ''') no longer cause the parser to prematurely exit docstring state, which previously led to miscounting docstring lines as code.
Changes:
- Update
docStringStateto detect docstring termination by matching only the active docstring’sendStringdelimiter rather than matching any string delimiter. - Adjust the docstring regression test that previously captured the buggy behavior (
TestCountStatsIssue123). - Add a new regression test for issue #246 to ensure apostrophes inside docstrings don’t break parsing.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
processor/workers.go |
Fixes docstring end detection by matching only the initiating delimiter (endString) instead of any string delimiter. |
processor/workers_regression_test.go |
Updates prior regression expectations and adds a new test to prevent reintroducing the apostrophe/docstring bug. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Owner
|
Neat! Thank you! This is one of those issues I have been meaning to fix for a while. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #246
Problem:
docStringStateusedstringTrie.Match()to detect the end of a docstring. The trie matches any string delimiter, so an apostrophe'inside a"""docstring was incorrectly matched as a string boundary, causing the state machine to exit the docstring prematurely.Before:
Lines inside the docstring were counted as code.
After:
Docstring lines are correctly counted as comments.
Fix: Replace
stringTrie.Match()withcheckForMatchSingle(), which checks specifically for theendStringthat started the current docstring (e.g."""), rather than matching any string delimiter.Validation: All existing tests pass (
go test ./...). UpdatedTestCountStatsIssue123expectations (which previously encoded the buggy behavior) and addedTestCountStatsIssue246regression test.I explicitly license this contribution under the MIT licence.