Skip to content

[pylint] Implement missing-maxsplit-arg (PLC0207) #17454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
May 28, 2025

Conversation

vjurczenia
Copy link
Contributor

Summary

Implements use-maxsplit-arg (PLC0207)
https://pylint.readthedocs.io/en/latest/user_guide/messages/convention/use-maxsplit-arg.html

Emitted when accessing only the first or last element of str.split().
The first and last element can be accessed by using str.split(sep, maxsplit=1)[0] or str.rsplit(sep, maxsplit=1)[-1] instead.

This is part of #970

Test Plan

cargo test

Additionally compared Ruff output to Pylint:

pylint --disable=all --enable=use-maxsplit-arg crates/ruff_linter/resources/test/fixtures/pylint/missing_maxsplit_arg.py

cargo run -p ruff -- check crates/ruff_linter/resources/test/fixtures/pylint/missing_maxsplit_arg.py --no-cache --select PLC0207

Implements  `use-maxsplit-arg` (`PLC0207`)
https://pylint.readthedocs.io/en/latest/user_guide/messages/convention/use-maxsplit-arg.html
> Emitted when accessing only the first or last element of str.split().
The first and last element can be accessed by using
str.split(sep, maxsplit=1)[0] or str.rsplit(sep, maxsplit=1)[-1]
instead.
@Skylion007
Copy link
Contributor

Skylion007 commented Apr 18, 2025

Does the original pylint handle slicing too (ie calling split but only using the first n elements)? If not, that would be a great generalization of this rule, maybe one that can be made ruff specific

@vjurczenia
Copy link
Contributor Author

No, it only checks for indexes -1 and 0, but making it more general is a very neat idea.

Having looked into the Pylint source just now, I also realize that this PR is missing the following functionality:

  • checking the value of a variable used as the index
  • checking whether a variable used as the index might be mutated at runtime

I'm happy to add this functionality to stay aligned with Pylint and would appreciate maintainer input on whether or not to implement the generalized rule.

@ntBre ntBre added the rule Implementing or modifying a lint rule label Apr 18, 2025
@ntBre
Copy link
Contributor

ntBre commented Apr 18, 2025

I think making the rule more general (both by handling slices and by handling variables) sounds nice, but it's also okay to add a simpler form of a rule first and extend it in follow-up PRs.

@ntBre ntBre added the preview Related to preview mode features label Apr 18, 2025
Copy link
Contributor

github-actions bot commented Apr 18, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+47 -0 violations, +0 -0 fixes in 14 projects; 41 projects unchanged)

PlasmaPy/PlasmaPy (+1 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview

+ noxfile.py:445:16: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`

apache/airflow (+9 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview --select ALL

+ dev/backport/update_backport_status.py:78:21: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ dev/breeze/src/airflow_breeze/commands/common_options.py:442:26: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ dev/breeze/src/airflow_breeze/commands/release_management_commands.py:2625:50: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ dev/breeze/src/airflow_breeze/prepare_providers/provider_documentation.py:658:34: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ dev/breeze/src/airflow_breeze/utils/run_utils.py:118:25: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ dev/breeze/src/airflow_breeze/utils/version_utils.py:44:16: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ providers/amazon/tests/system/amazon/aws/example_bedrock.py:66:24: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ providers/amazon/tests/system/amazon/aws/example_glue.py:76:12: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ scripts/in_container/install_airflow_and_providers.py:32:21: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`

apache/superset (+2 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview --select ALL

+ tests/common/query_context_generator.py:214:29: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ tests/common/query_context_generator.py:255:22: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`

bokeh/bokeh (+3 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview --select ALL

+ src/bokeh/io/notebook.py:663:26: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ src/bokeh/util/token.py:142:41: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ src/bokeh/util/token.py:155:41: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`

freedomofpress/securedrop (+1 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview

+ securedrop/store.py:345:49: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`

ibis-project/ibis (+1 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview

+ ibis/examples/gen_registry.py:125:52: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`

langchain-ai/langchain (+9 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview

+ libs/core/langchain_core/_api/deprecation.py:362:17: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ libs/core/langchain_core/_api/deprecation.py:362:56: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ libs/core/langchain_core/_api/deprecation.py:370:17: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ libs/core/langchain_core/_api/deprecation.py:371:16: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ libs/core/langchain_core/_api/deprecation.py:475:24: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ libs/core/langchain_core/_api/deprecation.py:494:27: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ libs/core/langchain_core/runnables/graph_mermaid.py:157:24: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ libs/core/langchain_core/utils/mustache.py:85:19: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ libs/core/langchain_core/utils/utils.py:140:32: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`

pandas-dev/pandas (+1 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview

+ pandas/compat/_optional.py:166:14: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`

pypa/cibuildwheel (+3 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview

+ bin/update_pythons.py:150:22: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ cibuildwheel/platforms/macos.py:153:31: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ cibuildwheel/selector.py:78:26: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`

rotki/rotki (+1 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview

+ rotkehlchen/assets/converters.py:417:17: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`

zulip/zulip (+11 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview --select ALL

+ zerver/lib/send_email.py:287:16: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ zerver/lib/send_email.py:429:21: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ zerver/lib/upload/local.py:67:17: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ zerver/lib/upload/s3.py:176:25: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ zerver/lib/webhooks/common.py:242:26: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ zerver/lib/webhooks/common.py:269:16: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ zerver/middleware.py:228:49: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ zerver/views/upload.py:374:16: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ zerver/webhooks/gitlab/view.py:46:36: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ zerver/webhooks/stripe/view.py:356:39: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ zerver/webhooks/wekan/view.py:15:12: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`

wntrblm/nox (+1 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview

+ tests/test_sessions.py:1199:44: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`

pytest-dev/pytest (+3 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview

+ src/_pytest/config/findpaths.py:160:16: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ src/_pytest/terminal.py:1007:40: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`
+ src/_pytest/terminal.py:459:41: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`

astropy/astropy (+1 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview

+ astropy/io/ascii/latex.py:443:16: PLC0207 Accessing only the first or last element of `str.split()` without setting `maxsplit=1`

Changes by rule (1 rules affected)

code total + violation - violation + fix - fix
PLC0207 47 47 0 0 0

Comment on lines 102 to 113
if keywords.iter().any(|kw| {
kw.arg.as_deref() == Some("maxsplit")
&& matches!(
kw.value,
ast::Expr::NumberLiteral(ast::ExprNumberLiteral {
value: ast::Number::Int(ast::Int::ONE),
..
})
)
}) {
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ecosystem check turned up some results like this:

ext = file_path.rsplit(".", 2)[-1].lower()

The maxsplit argument doesn't have to be passed as a keyword. We have a find_argument_value helper that should handle this and also extract the value for your comparison to 1:

/// Return the value for the argument with the given name or at the given position, or `None` if no such
/// argument exists. Used to retrieve argument values that can be provided _either_ as keyword or
/// positional arguments.
pub fn find_argument_value(&self, name: &str, position: usize) -> Option<&Expr> {
self.find_argument(name, position).map(ArgOrKeyword::value)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, you're doing the positional comparison below, and this is a true positive because the maxsplit value also needs to be 1. In any case, this should help to combine the keyword and positional checks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's way neater with find_argument_value, so thanks for that!

However, this ecosystem check failure had me re-evaluate the test included, so I compared it to the Pylint one and noticed that maxsplit=2 should be a passing case. The relevant Pylint source is here.

Following that discovery, I've updated the test cases to closer match the Pylint ones, which left me with a few that require additional implementation. These will require identifying:

  • when split is called on slices of strings
  • kwargs set via unpacked dicts
  • if index is set via variable
  • if the index variable is modified in a loop (that includes the split call)
  • if split is called on a static class member that happens to be a string (I'm uncertain about this one, but think it should be possible.)
  • if split is called on a method that returns a string (I'm uncertain that this one is possible.)

As before, any advice is appreciated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if split is called on a static class member that happens to be a string

I've just managed to figure this one out, and its given me a bit of a confidence boost on the rest of them, so I've converted this PR to a Draft until they're done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for the update and for your work on this. Ping me again if you end up needing help. I'm also happy merging a simpler first draft of the rule that we can expand later, if these end up being trickier than you want to deal with!

If it comes to that, let's just include tests for the unhandled cases with clear TODO comments for future work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ntBre , hope all is well. Sorry for the delay on getting back to this.

As you can see from my most recent commits, I've managed to get through a few of the cases I had listed. However, I think I've hit a bit of a wall with the rest. The remaining ones are:

  • if index is set via a variable
  • if the index variable is modified in a loop (that includes the split call)
  • if split is called on a method that returns a string
  • if kwargs are set via an unpacked dict variable

I've documented these as TODO in the test, and am happy to go ahead with merging this simpler first draft if you are.

@vjurczenia vjurczenia marked this pull request as draft April 23, 2025 00:56
@vjurczenia vjurczenia marked this pull request as ready for review May 12, 2025 21:22
Copy link
Contributor

@ntBre ntBre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is looking great! I think this would be a good place to finish this PR and leave other extensions as follow-ups, as you said. It could be fun to add an auto-fix for this rule, as another follow-up, if you're interested :)

Copy link
Contributor

@ntBre ntBre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this looks great!

I had two minor nits but this is otherwise good to go.

Copy link
Contributor

@ntBre ntBre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again, this is great!

@ntBre
Copy link
Contributor

ntBre commented May 28, 2025

I merged main to pick up our rust version updates and then ran cargo fmt to make CI happy

@ntBre ntBre changed the title [pylint] Implement missing-maxsplit-arg (PLC0207) [pylint] Implement missing-maxsplit-arg (PLC0207) May 28, 2025
@ntBre ntBre enabled auto-merge (squash) May 28, 2025 20:44
@ntBre ntBre merged commit 27743ef into astral-sh:main May 28, 2025
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preview Related to preview mode features rule Implementing or modifying a lint rule
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants