irc: unescape hexadecimal sequences in ISUPPORT parameter values #2429

SnoopJ · 2023-03-22T17:46:59Z

Description

This changeset handles hexadecimal escape sequences in ISUPPORT parameter values, implementing #2195

Checklist

I have read CONTRIBUTING.md
I can and do license this contribution under the EFLv2
No issues are reported by make qa (runs make quality and make test)
I have tested the functionality of the things this change touches

SnoopJ · 2023-03-22T17:50:22Z

The specification says also that:

If the server has not advertised a CHARSET parameter, it MUST not use
such sequences with a value outside those permitted by the above ABNF
grammar, with the exception of "\x20";

which I believe is not currently checked by Sopel. Not sure if we want to worry about that, but if so, it should probably be part of this changeset.

dgw · 2023-03-22T18:03:18Z

Refreshing to have such a small patch to review after you know what 😅 (and thank you @SnoopJ for helping with that one).

The specification says also that:

If the server has not advertised a CHARSET parameter, it MUST not use
such sequences with a value outside those permitted by the above ABNF
grammar, with the exception of "\x20";

which I believe is not currently checked by Sopel. Not sure if we want to worry about that, but if so, it should probably be part of this changeset.

I wouldn't worry about servers violating the MUST; we can "be liberal in what you accept" and decode the escapes anyway, assuming UTF-8 in the absence of a specified CHARSET.

You could add a test case where something that looks like an escape but isn't valid (what you called "pathological counter-examples" on IRC) gets handled. More importantly, the following behavior should be validated in tests as well:

Characters in multibyte encodings such as UTF-8 should be
sent as a series of \x sequences.

If we're gonna do this, it had better not mojibake the encoded values 😉

SnoopJ · 2023-03-22T20:06:18Z

Characters in multibyte encodings such as UTF-8 should be
sent as a series of \x sequences.

This 'should' confuses me a bit. I think what it's saying is that if the server advertises a CHARSET, the meaning of a sequence of \xHH escapes is different than what this changeset is currently doing.

To give a concrete example, let's say we receive the bytes NETWORK=b'\\xc5\\x81ibera' from the server, and that CHARSET=UTF-8. The way I understand this, the correct decoding of the parameter value involves first replacing \\xc5\\x81 with those octets, then decoding the parameter with UTF-8. This changeset will decode as ("NETWORK", "Å\x81ibera") instead of the correct decoding of ("NETWORK", "Łibera"), if I'm not mistaken.

Since this part of Sopel has already decoded received bytes to a string, doing this as described is tricky. I'll mull it over and maybe do a little sopelunking to see how to accomodate this part of the specification.

SnoopJ · 2023-03-22T23:35:39Z

After discussion with @Exirel, it's come to light that the Modern IRC specification and the IETF specification have slightly different opinions about escape sequences, with Modern IRC being more strict, limiting escapes to only \x20, \x5C, \x3D.

I've updated this PR to follow the Modern IRC perspective, which is not only simpler to implement, but avoids tangling with the complication of the antiquated/removed CHARSET field, and the ambiguity of bytes sent in a multi-byte encoding.

Exirel · 2023-03-22T23:37:56Z

Note from IRC: we have a different understanding of the spec, and we agree that it is a bit confusing. Following these two things led me to believe that we don't need to un-escape everything and only a subset of characters:

Latest version of the escaping function in inspircd
Modern IRC event 005 specification

As a result, we discussed it with @SnoopJ and we came to the conclusion that we are going to un-escape only the few bytes that need to.

Edit: @SnoopJ beat me to it!

sopel/irc/isupport.py

test/irc/test_irc_isupport.py

sopel/irc/isupport.py

dgw

Everyone else already did the good nitpicks, so this was all that was left for me to find. 🤣

sopel/irc/isupport.py

RPL_ISUPPORT may include escape sequences for octets confusible with the message grammar. This changeset adds support for this protocol feature by adding logic to unescape these sequences. Co-authored-by: dgw <[email protected]> Co-authored-by: Florian Strzelecki <[email protected]> Co-authored-by: Rusty Bower <[email protected]>

SnoopJ · 2023-03-23T03:37:12Z

Squashed and rebased with a summary in the commit message. Another ~~one~~8.0 bites the dust?

dgw

🚀

dgw added the Core/IRC Protocol Handling label Mar 22, 2023

dgw added this to the 8.0.0 milestone Mar 22, 2023

SnoopJ requested a review from a team March 22, 2023 23:36

Exirel requested changes Mar 22, 2023

View reviewed changes

sopel/irc/isupport.py Outdated Show resolved Hide resolved

sopel/irc/isupport.py Outdated Show resolved Hide resolved

test/irc/test_irc_isupport.py Outdated Show resolved Hide resolved

test/irc/test_irc_isupport.py Show resolved Hide resolved

Exirel approved these changes Mar 23, 2023

View reviewed changes

RustyBower requested changes Mar 23, 2023

View reviewed changes

sopel/irc/isupport.py Outdated Show resolved Hide resolved

SnoopJ force-pushed the feature/gh2195_support-hexadecimal-escape branch from 754471d to fa45f08 Compare March 23, 2023 00:34

RustyBower approved these changes Mar 23, 2023

View reviewed changes

dgw reviewed Mar 23, 2023

View reviewed changes

sopel/irc/isupport.py Outdated Show resolved Hide resolved

SnoopJ force-pushed the feature/gh2195_support-hexadecimal-escape branch from 10854c2 to 77132fc Compare March 23, 2023 03:36

dgw approved these changes Mar 23, 2023

View reviewed changes

Exirel linked an issue Mar 23, 2023 that may be closed by this pull request

isupport: hexadecimal escape sequences #2195

Closed

dgw merged commit f100238 into sopel-irc:master Mar 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

irc: unescape hexadecimal sequences in ISUPPORT parameter values #2429

irc: unescape hexadecimal sequences in ISUPPORT parameter values #2429

Uh oh!

SnoopJ commented Mar 22, 2023

Uh oh!

SnoopJ commented Mar 22, 2023

Uh oh!

dgw commented Mar 22, 2023

Uh oh!

SnoopJ commented Mar 22, 2023

Uh oh!

SnoopJ commented Mar 22, 2023

Uh oh!

Exirel commented Mar 22, 2023 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dgw left a comment

Uh oh!

Uh oh!

SnoopJ commented Mar 23, 2023

Uh oh!

dgw left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

irc: unescape hexadecimal sequences in ISUPPORT parameter values #2429

irc: unescape hexadecimal sequences in ISUPPORT parameter values #2429

Uh oh!

Conversation

SnoopJ commented Mar 22, 2023

Description

Checklist

Uh oh!

SnoopJ commented Mar 22, 2023

Uh oh!

dgw commented Mar 22, 2023

Uh oh!

SnoopJ commented Mar 22, 2023

Uh oh!

SnoopJ commented Mar 22, 2023

Uh oh!

Exirel commented Mar 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dgw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SnoopJ commented Mar 23, 2023

Uh oh!

dgw left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Exirel commented Mar 22, 2023 •

edited

Loading