Skip to content

Fix GitConfigParser not removing quotes from values #2035

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 7, 2025

Conversation

betaboon
Copy link
Contributor

@betaboon betaboon commented Jun 6, 2025

closes #1923

@betaboon betaboon force-pushed the fix-configparser-quotes branch from d233b4c to 67468a2 Compare June 6, 2025 20:47
Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great, thank you!

@Byron Byron merged commit 4dd5d45 into gitpython-developers:main Jun 7, 2025
26 checks passed
Copy link
Member

@EliahKagan EliahKagan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change is an improvement and is better than what we had before, but I also worry that it may break some existing code that uses GitPython, because it leaves various other shortcomings intact while also making their presence harder to detect.

The contents of a quoted value are not, in general, intended to be interpreted literally. For example, if programs are handling double-quoted strings themselves and handling \ sequences in them, then removing the quotes here without handling \ sequences will break that: those programs will go from working correctly to working incorrectly when they upgrade to the next version of GitPython.

That doesn't mean this change isn't worthwhile, but it makes me think we should go further in supporting the syntax of Git configuration files. I also wonder if we should really be handling the single-line and multi-line cases separately as we are, since, if I understand correctly, they are parsed in a unified way by Git and not as separate cases.
 
The \n, \t, various other \ sequences, treatment of unrecognized \ sequences as errors, and various other differences between what we have here and what Git does can be seen in the parse_value function in Git.

It's possible that I'm missing something. GitConfigParser is probably the part of the GitPython code that I have done the least with. If I open a PR to make changes here before receiving feedback, then I'll wait for a review before merging it.

EliahKagan added a commit to EliahKagan/GitPython that referenced this pull request Jun 7, 2025
gitpython-developers#2035 fixed issue gitpython-developers#1923 by removing separate double quotation marks
appearing on a single-line configuration variable when parsing a
a configuration file. However, it also stripped leading and trailing
whitespace from the string obtained by removing the quotes.

This adds a test case of a plausible scenario where such whitespace
needs to be preserved and where a user would almost certainly expect
it to preserve: setting a value like `# ` for `core.commentString`,
in order to be able to easily create commit messages like this one,
that contain a line that begins with a literal `#`, while still
letting `#` in the more common case that it is followed by a space
be interpreted as a comment.

The effect of  `git config --local core.commentString '# '` is to
add a `commentString = "# "` line in the `[core]` section of
`.git/config`. The changes in gitpython-developers#2035 allow us to correctly parse
more quoted strings than before, and almost allow us to parse this,
but not quite, because of the `strip()` operation that turns `# `
into `#`.
EliahKagan added a commit to EliahKagan/GitPython that referenced this pull request Jun 7, 2025
At least in a single line, whitespace in a double-quoted value in a
configuration file, like `name = " abc def "`, would presumably be
intended. This removes the `strip()` call that is applied to text
`ConfigParser` obtained by removing the double quotes around it.

This slightly refines the changes in gitpython-developers#2035 by dropping the `strip()`
call while continuing to remove opening and closing double quotes.
EliahKagan added a commit to EliahKagan/GitPython that referenced this pull request Jun 7, 2025
gitpython-developers#2035 fixed issue gitpython-developers#1923 by removing separate double quotation marks
appearing on a single-line configuration variable when parsing a
configuration file. However, it also stripped leading and trailing
whitespace from the string obtained by removing the quotes.

This adds a test case of a plausible scenario where such whitespace
needs to be preserved and where a user would almost certainly expect
it to preserve: setting a value like `# ` for `core.commentString`,
in order to be able to easily create commit messages like this one,
that contain a line that begins with a literal `#`, while still
letting `#` in the more common case that it is followed by a space
be interpreted as a comment.

The effect of  `git config --local core.commentString '# '` is to
add a `commentString = "# "` line in the `[core]` section of
`.git/config`. The changes in gitpython-developers#2035 allow us to correctly parse
more quoted strings than before, and almost allow us to parse this,
but not quite, because of the `strip()` operation that turns `# `
into `#`.
EliahKagan added a commit to EliahKagan/GitPython that referenced this pull request Jun 7, 2025
At least in a single line, whitespace in a double-quoted value in a
configuration file, like `name = " abc def "`, would presumably be
intended. This removes the `strip()` call that is applied to text
`ConfigParser` obtained by removing the double quotes around it.

This slightly refines the changes in gitpython-developers#2035 by dropping the `strip()`
call while continuing to remove opening and closing double quotes.
@Byron
Copy link
Member

Byron commented Jun 7, 2025

@betaboon Maybe you could provide us with a follow-up PR that handles the escape sequences which may be present inside of quoted values, to more correctly capture this.

After all, the GitPython Git configuration parser is just an adapted INI parser, and that is inherently unable to capture all the subtleties, but we can try harder where we can - this is clearly one such opportunity.

For anything serious, it's better to use gitoxide which fully captures Git configuration files.

EliahKagan added a commit to EliahKagan/GitPython that referenced this pull request Jun 8, 2025
The ConfigParser has supported this for a long time, but it is now
done redundantly since gitpython-developers#2035. This adds a test for it, both to make
clearer that it is intended to work and to allow verifying that it
continues to hold once the legacy special-casing for it is removed.
EliahKagan added a commit to EliahKagan/GitPython that referenced this pull request Jun 8, 2025
The ConfigParser has supported this for a long time, but it is now
done redundantly since gitpython-developers#2035. This adds a test for it, both to make
clearer that it is intended to work and to allow verifying that it
continues to hold once the legacy special-casing for it is removed.
EliahKagan added a commit to EliahKagan/GitPython that referenced this pull request Jun 8, 2025
Because `""` is a special case of `"..."` as parsed since gitpython-developers#2035.
EliahKagan added a commit to EliahKagan/GitPython that referenced this pull request Jun 8, 2025
This is for single line quoting in the ConfigParser.

This leaves the changes in gitpython-developers#2035 (as adjusted in gitpython-developers#2036) intact for
the cases where it addressed gitpython-developers#1923: when the `...` in `"..."`
(appearing in the value position on a single `{name} = {value}"`
line) has no occurrences of `\` or `"`, quote removal is enough.

But when `\` or `"` does appear, this suppresses quote removal.
This is with the idea that, while it would be better to interpret
such lines as Git does, we do not yet do that, so it is preferable
to return the same results we have in the past (which some programs
may already be handling themselves).

This should make the test introduced in the preceding commit pass.
But it will be even better to support more syntax, at least
well-formed escapes. As noted in the test, both the test and the
code under test can be adjusted for that.

(See comments in gitpython-developers#2035 for context.)
EliahKagan added a commit to EliahKagan/GitPython that referenced this pull request Jun 8, 2025
Because literal `""` is a special case of `"..."` as parsed
since gitpython-developers#2035.
EliahKagan added a commit to EliahKagan/GitPython that referenced this pull request Jun 8, 2025
This is for single line quoting in the ConfigParser.

This leaves the changes in gitpython-developers#2035 (as adjusted in gitpython-developers#2036) intact for
the cases where it addressed gitpython-developers#1923: when the `...` in `"..."`
(appearing in the value position on a single `{name} = {value}"`
line) has no occurrences of `\` or `"`, quote removal is enough.

But when `\` or `"` does appear, this suppresses quote removal.
This is with the idea that, while it would be better to interpret
such lines as Git does, we do not yet do that, so it is preferable
to return the same results we have in the past (which some programs
may already be handling themselves).

This should make the test introduced in the preceding commit pass.
But it will be even better to support more syntax, at least
well-formed escapes. As noted in the test, both the test and the
code under test can be adjusted for that.

(See comments in gitpython-developers#2035 for context.)
EliahKagan added a commit to EliahKagan/GitPython that referenced this pull request Jun 8, 2025
This is for single line quoting in the ConfigParser.

This leaves the changes in gitpython-developers#2035 (as adjusted in gitpython-developers#2036) intact for
the cases where it addressed gitpython-developers#1923: when the `...` in `"..."`
(appearing in the value position on a single `{name} = {value}"`
line) has no occurrences of `\` or `"`, quote removal is enough.

But when `\` or `"` does appear, this suppresses quote removal.
This is with the idea that, while it would be better to interpret
such lines as Git does, we do not yet do that, so it is preferable
to return the same results we have in the past (which some programs
may already be handling themselves).

This should make the test introduced in the preceding commit pass.
But it will be even better to support more syntax, at least
well-formed escapes. As noted in the test, both the test and the
code under test can be adjusted for that.

(See comments in gitpython-developers#2035 for context.)
@betaboon
Copy link
Contributor Author

betaboon commented Jun 8, 2025

@EliahKagan beat me to it. thanks a bunch.

@EliahKagan
Copy link
Member

EliahKagan commented Jun 8, 2025

I actually have not implemented handling of escape sequences. If you mean #2048, that detects if the string may contain escape sequences (or if it may have a closing quotation mark before the one at the end of the line). If so, avoids removing the outer quotes at all. Otherwise, it allows them to be removed.

For example, if we merge #2048, then:

  • "John Smith" becomes John Smith, as it did since this PR (#2035) fixed #1923. That outcome is fully correct already, and #2048 avoids changing it.
  • "John\tSmith" stays as "John\tSmith", rather than becoming John\tSmith as it has since this PR (#2035). But the better effect would be for it to become John Smith (i.e. John followed by a tab character followed by Smith, and with the quotes removed).

While that should be enough to avoid breaking existing code as described in #2035 (review), it is not as good as actually translating the escape sequences as suggested in #2035 (comment).

The five recognized escape sequences for double-quoted values in Git configuration files are described in the last paragraph of the "Syntax" section of git-config(1):

The following escape sequences (beside \" and \\) are recognized: \n for newline character (NL), \t for horizontal tabulation (HT, TAB) and \b for backspace (BS). Other char escape sequences (including octal escape sequences) are invalid.

You are certainly under no obligation! However, if you are interested in adding handling for escape sequences, then I think you could still do so--either after, or instead of, #2048--and hopefully nothing I have done would get in the way. (Actually, I think #2048 might make it easier to proceed with adding translation of escape sequences, or other changes.)

@betaboon
Copy link
Contributor Author

betaboon commented Jun 8, 2025

i took a look at #2048. that seems like a good way forward.
if i find the time to attempt resolving escape sequences, i think it makes sense to do it after #2048.

tbh: i never anticipated my (seemingly trivial) change from #2035 to start this rabbit hole. I'm glad you caught this.

just for clarity: if after #2048 you feel like going ahead for the escape sequences, go ahead. i really can't promise i find the time to do so any time soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

GitConfigParser misparses quotes in options
3 participants