-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[pylint
] Implement missing-maxsplit-arg
(PLC0207
)
#17454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ntBre
merged 23 commits into
astral-sh:main
from
vjurczenia:vjurczenia/missing_maxsplit_arg
May 28, 2025
Merged
Changes from 21 commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
8248b80
`[pylint]` Implement `missing-maxsplit-arg` (`PLC0207`)
vjurczenia 38a9c81
Move rule to preview group
vjurczenia 347e9b9
Use `find_argument_value` to check maxsplit value
vjurczenia 5bec68b
Refactor to import required items from ast module instead of whole mo…
vjurczenia 3595079
cargo fmt
vjurczenia e6e2bfe
Update tests
vjurczenia 1f34119
Remove check of maxsplit value (aligned with Pylint)
vjurczenia df38ac6
Add functionality to check class members of type string
vjurczenia 9705675
Remove unnecessary assignments from test
vjurczenia 1d3502c
Add functionality to check sliced strings
vjurczenia 8967bc8
Update snapshot
vjurczenia a61c622
Enhance sliced strings check to include chained slices
vjurczenia 7342bbc
Add tests for class attribute named split (from Pylint)
vjurczenia d1bb448
Add functionality to check kwargs set via an unpacked dict literal
vjurczenia dfbde3e
Remove incorrect comments from test file
vjurczenia 5053354
Add remaining test cases
vjurczenia b67bc21
Action comments
vjurczenia 77cd518
Remove incorrect hyphen from comment
vjurczenia 378de24
Add TODO test comments
vjurczenia 5997aa7
Action comments
vjurczenia f7f6fbb
Merge branch 'main' into vjurczenia/missing_maxsplit_arg
ntBre 98a0f50
cargo fmt
ntBre ab261cb
tidy up after merge
ntBre File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
184 changes: 184 additions & 0 deletions
184
crates/ruff_linter/resources/test/fixtures/pylint/missing_maxsplit_arg.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,184 @@ | ||
SEQ = "1,2,3" | ||
|
||
class Foo(str): | ||
class_str = "1,2,3" | ||
|
||
def split(self, sep=None, maxsplit=-1) -> list[str]: | ||
return super().split(sep, maxsplit) | ||
|
||
class Bar(): | ||
split = "1,2,3" | ||
|
||
# Errors | ||
## Test split called directly on string literal | ||
"1,2,3".split(",")[0] # [missing-maxsplit-arg] | ||
"1,2,3".split(",")[-1] # [missing-maxsplit-arg] | ||
"1,2,3".rsplit(",")[0] # [missing-maxsplit-arg] | ||
"1,2,3".rsplit(",")[-1] # [missing-maxsplit-arg] | ||
|
||
## Test split called on string variable | ||
SEQ.split(",")[0] # [missing-maxsplit-arg] | ||
SEQ.split(",")[-1] # [missing-maxsplit-arg] | ||
SEQ.rsplit(",")[0] # [missing-maxsplit-arg] | ||
SEQ.rsplit(",")[-1] # [missing-maxsplit-arg] | ||
|
||
## Test split called on class attribute | ||
Foo.class_str.split(",")[0] # [missing-maxsplit-arg] | ||
Foo.class_str.split(",")[-1] # [missing-maxsplit-arg] | ||
Foo.class_str.rsplit(",")[0] # [missing-maxsplit-arg] | ||
Foo.class_str.rsplit(",")[-1] # [missing-maxsplit-arg] | ||
|
||
## Test split called on sliced string | ||
"1,2,3"[::-1].split(",")[0] # [missing-maxsplit-arg] | ||
"1,2,3"[::-1][::-1].split(",")[0] # [missing-maxsplit-arg] | ||
SEQ[:3].split(",")[0] # [missing-maxsplit-arg] | ||
Foo.class_str[1:3].split(",")[-1] # [missing-maxsplit-arg] | ||
"1,2,3"[::-1].rsplit(",")[0] # [missing-maxsplit-arg] | ||
SEQ[:3].rsplit(",")[0] # [missing-maxsplit-arg] | ||
Foo.class_str[1:3].rsplit(",")[-1] # [missing-maxsplit-arg] | ||
|
||
## Test sep given as named argument | ||
"1,2,3".split(sep=",")[0] # [missing-maxsplit-arg] | ||
"1,2,3".split(sep=",")[-1] # [missing-maxsplit-arg] | ||
"1,2,3".rsplit(sep=",")[0] # [missing-maxsplit-arg] | ||
"1,2,3".rsplit(sep=",")[-1] # [missing-maxsplit-arg] | ||
|
||
## Special cases | ||
"1,2,3".split("\n")[0] # [missing-maxsplit-arg] | ||
"1,2,3".split("split")[-1] # [missing-maxsplit-arg] | ||
"1,2,3".rsplit("rsplit")[0] # [missing-maxsplit-arg] | ||
|
||
## Test class attribute named split | ||
Bar.split.split(",")[0] # [missing-maxsplit-arg] | ||
Bar.split.split(",")[-1] # [missing-maxsplit-arg] | ||
Bar.split.rsplit(",")[0] # [missing-maxsplit-arg] | ||
Bar.split.rsplit(",")[-1] # [missing-maxsplit-arg] | ||
|
||
## Test unpacked dict literal kwargs | ||
"1,2,3".split(**{"sep": ","})[0] # [missing-maxsplit-arg] | ||
|
||
|
||
# OK | ||
## Test not accessing the first or last element | ||
### Test split called directly on string literal | ||
"1,2,3".split(",")[1] | ||
"1,2,3".split(",")[-2] | ||
"1,2,3".rsplit(",")[1] | ||
"1,2,3".rsplit(",")[-2] | ||
|
||
### Test split called on string variable | ||
SEQ.split(",")[1] | ||
SEQ.split(",")[-2] | ||
SEQ.rsplit(",")[1] | ||
SEQ.rsplit(",")[-2] | ||
|
||
### Test split called on class attribute | ||
Foo.class_str.split(",")[1] | ||
Foo.class_str.split(",")[-2] | ||
Foo.class_str.rsplit(",")[1] | ||
Foo.class_str.rsplit(",")[-2] | ||
|
||
### Test split called on sliced string | ||
"1,2,3"[::-1].split(",")[1] | ||
SEQ[:3].split(",")[1] | ||
Foo.class_str[1:3].split(",")[-2] | ||
"1,2,3"[::-1].rsplit(",")[1] | ||
SEQ[:3].rsplit(",")[1] | ||
Foo.class_str[1:3].rsplit(",")[-2] | ||
|
||
### Test sep given as named argument | ||
"1,2,3".split(sep=",")[1] | ||
"1,2,3".split(sep=",")[-2] | ||
"1,2,3".rsplit(sep=",")[1] | ||
"1,2,3".rsplit(sep=",")[-2] | ||
|
||
## Test varying maxsplit argument | ||
### str.split() tests | ||
"1,2,3".split(sep=",", maxsplit=1)[-1] | ||
"1,2,3".split(sep=",", maxsplit=1)[0] | ||
"1,2,3".split(sep=",", maxsplit=2)[-1] | ||
"1,2,3".split(sep=",", maxsplit=2)[0] | ||
"1,2,3".split(sep=",", maxsplit=2)[1] | ||
|
||
### str.rsplit() tests | ||
"1,2,3".rsplit(sep=",", maxsplit=1)[-1] | ||
"1,2,3".rsplit(sep=",", maxsplit=1)[0] | ||
"1,2,3".rsplit(sep=",", maxsplit=2)[-1] | ||
"1,2,3".rsplit(sep=",", maxsplit=2)[0] | ||
"1,2,3".rsplit(sep=",", maxsplit=2)[1] | ||
|
||
## Test user-defined split | ||
Foo("1,2,3").split(",")[0] | ||
Foo("1,2,3").split(",")[-1] | ||
Foo("1,2,3").rsplit(",")[0] | ||
Foo("1,2,3").rsplit(",")[-1] | ||
|
||
## Test split called on sliced list | ||
["1", "2", "3"][::-1].split(",")[0] | ||
|
||
## Test class attribute named split | ||
Bar.split[0] | ||
Bar.split[-1] | ||
Bar.split[0] | ||
Bar.split[-1] | ||
|
||
## Test unpacked dict literal kwargs | ||
"1,2,3".split(",", **{"maxsplit": 1})[0] | ||
"1,2,3".split(**{"sep": ",", "maxsplit": 1})[0] | ||
|
||
|
||
# TODO | ||
|
||
## Test variable split result index | ||
## TODO: These require the ability to resolve a variable name to a value | ||
# Errors | ||
result_index = 0 | ||
"1,2,3".split(",")[result_index] # TODO: [missing-maxsplit-arg] | ||
result_index = -1 | ||
"1,2,3".split(",")[result_index] # TODO: [missing-maxsplit-arg] | ||
# OK | ||
result_index = 1 | ||
"1,2,3".split(",")[result_index] | ||
result_index = -2 | ||
"1,2,3".split(",")[result_index] | ||
|
||
|
||
## Test split result index modified in loop | ||
## TODO: These require the ability to recognize being in a loop where: | ||
## - the result of split called on a string is indexed by a variable | ||
## - the variable index above is modified | ||
# OK | ||
result_index = 0 | ||
for j in range(3): | ||
print(SEQ.split(",")[result_index]) | ||
result_index = result_index + 1 | ||
|
||
|
||
## Test accessor | ||
## TODO: These require the ability to get the return type of a method | ||
## (possibly via `typing::is_string`) | ||
class Baz(): | ||
def __init__(self): | ||
self.my_str = "1,2,3" | ||
|
||
def get_string(self) -> str: | ||
return self.my_str | ||
|
||
# Errors | ||
Baz().get_string().split(",")[0] # TODO: [missing-maxsplit-arg] | ||
Baz().get_string().split(",")[-1] # TODO: [missing-maxsplit-arg] | ||
# OK | ||
Baz().get_string().split(",")[1] | ||
Baz().get_string().split(",")[-2] | ||
|
||
|
||
## Test unpacked dict instance kwargs | ||
## TODO: These require the ability to resolve a dict variable name to a value | ||
# Errors | ||
kwargs_without_maxsplit = {"seq": ","} | ||
"1,2,3".split(**kwargs_without_maxsplit)[0] # TODO: [missing-maxsplit-arg] | ||
# OK | ||
kwargs_with_maxsplit = {"maxsplit": 1} | ||
"1,2,3".split(",", **kwargs_with_maxsplit)[0] # TODO: false positive | ||
kwargs_with_maxsplit = {"sep": ",", "maxsplit": 1} | ||
"1,2,3".split(**kwargs_with_maxsplit)[0] # TODO: false positive |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
134 changes: 134 additions & 0 deletions
134
crates/ruff_linter/src/rules/pylint/rules/missing_maxsplit_arg.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,134 @@ | ||
use ruff_diagnostics::{Diagnostic, Violation}; | ||
use ruff_macros::{derive_message_formats, ViolationMetadata}; | ||
use ruff_python_ast::{ | ||
DictItem, Expr, ExprAttribute, ExprCall, ExprDict, ExprNumberLiteral, ExprStringLiteral, | ||
ExprSubscript, ExprUnaryOp, Keyword, Number, UnaryOp, | ||
}; | ||
use ruff_python_semantic::{analyze::typing, SemanticModel}; | ||
use ruff_text_size::Ranged; | ||
|
||
use crate::checkers::ast::Checker; | ||
|
||
/// ## What it does | ||
/// Checks for access to the first or last element of `str.split()` without | ||
/// `maxsplit=1` | ||
/// | ||
/// ## Why is this bad? | ||
/// Calling `str.split()` without `maxsplit` set splits on every delimiter in the | ||
/// string. When accessing only the first or last element of the result, it | ||
/// would be more efficient to only split once. | ||
/// | ||
/// ## Example | ||
/// ```python | ||
/// url = "www.example.com" | ||
/// prefix = url.split(".")[0] | ||
/// ``` | ||
/// | ||
/// Use instead: | ||
/// ```python | ||
/// url = "www.example.com" | ||
/// prefix = url.split(".", maxsplit=1)[0] | ||
/// ``` | ||
|
||
#[derive(ViolationMetadata)] | ||
pub(crate) struct MissingMaxsplitArg; | ||
|
||
impl Violation for MissingMaxsplitArg { | ||
#[derive_message_formats] | ||
fn message(&self) -> String { | ||
"Accessing only the first or last element of `str.split()` without setting `maxsplit=1`" | ||
.to_string() | ||
} | ||
} | ||
|
||
fn is_string(expr: &Expr, semantic: &SemanticModel) -> bool { | ||
if let Expr::Name(name) = expr { | ||
semantic | ||
.only_binding(name) | ||
.is_some_and(|binding_id| typing::is_string(semantic.binding(binding_id), semantic)) | ||
} else if let Some(binding_id) = semantic.lookup_attribute(expr) { | ||
typing::is_string(semantic.binding(binding_id), semantic) | ||
} else { | ||
expr.is_string_literal_expr() | ||
} | ||
} | ||
|
||
/// PLC0207 | ||
pub(crate) fn missing_maxsplit_arg(checker: &Checker, value: &Expr, slice: &Expr, expr: &Expr) { | ||
// Check the sliced expression is a function | ||
let Expr::Call(ExprCall { | ||
func, arguments, .. | ||
}) = value | ||
else { | ||
return; | ||
}; | ||
|
||
// Check the slice index is either 0 or -1 (first or last value) | ||
let index = match slice { | ||
Expr::NumberLiteral(ExprNumberLiteral { | ||
value: Number::Int(number_value), | ||
.. | ||
}) => number_value.as_i64(), | ||
Expr::UnaryOp(ExprUnaryOp { | ||
op: UnaryOp::USub, | ||
operand, | ||
.. | ||
}) => match operand.as_ref() { | ||
Expr::NumberLiteral(ExprNumberLiteral { | ||
value: Number::Int(number_value), | ||
.. | ||
}) => number_value.as_i64().map(|number| -number), | ||
_ => return, | ||
}, | ||
_ => return, | ||
}; | ||
|
||
if !matches!(index, Some(0 | -1)) { | ||
return; | ||
} | ||
|
||
let Expr::Attribute(ExprAttribute { attr, value, .. }) = func.as_ref() else { | ||
return; | ||
}; | ||
|
||
// Check the function is "split" or "rsplit" | ||
let attr = attr.as_str(); | ||
if !matches!(attr, "split" | "rsplit") { | ||
return; | ||
} | ||
|
||
let mut target_instance = value; | ||
// a subscripted value could technically be subscripted further ad infinitum, so we | ||
// recurse into the subscript expressions until we find the value being subscripted | ||
while let Expr::Subscript(ExprSubscript { value, .. }) = target_instance.as_ref() { | ||
target_instance = value; | ||
} | ||
|
||
// Check the function is called on a string | ||
if !is_string(target_instance, checker.semantic()) { | ||
return; | ||
} | ||
|
||
// Check the function does not have maxsplit set | ||
if arguments.find_argument_value("maxsplit", 1).is_some() { | ||
return; | ||
} | ||
|
||
// Check maxsplit kwarg not set via unpacked dict literal | ||
for keyword in &*arguments.keywords { | ||
let Keyword { value, .. } = keyword; | ||
|
||
if let Expr::Dict(ExprDict { items, .. }) = value { | ||
for item in items { | ||
let DictItem { key, .. } = item; | ||
if let Some(Expr::StringLiteral(ExprStringLiteral { value, .. })) = key { | ||
if value.to_str() == "maxsplit" { | ||
return; | ||
} | ||
} | ||
} | ||
} | ||
} | ||
|
||
checker.report_diagnostic(Diagnostic::new(MissingMaxsplitArg, expr.range())); | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.