-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Description
A while ago, I got the review from @jreback that:
- Replace instances of pd. with direct imports (from review TST/CLN: series.duplicated; parametrisation; fix warning #21899 (comment)):
don’t import pd, directly import instead
- Replace direct use of
assert_..._equal
with thetm.assert_..._equal
(from review TST/CLN: series.duplicated; parametrisation; fix warning #21899 (comment)):use tm; don’t import assert_series_equal
I kept applying these rules until #22730, where @jreback then said:
Before we do any more of this
[quote of the above]
I would like a lint rule to see what is what (and basically list the offenders as exclusions for now).
We have been changing things like this for a long time and I think we somtimes change them back (e.g. to use the imports ofassert_*
, rather thantm.
, and to use thepd.*
rather than the direct imports.
and further:
let's see which way is more common, then have a lint rule to enforce it, rather than ad-hoc conversions (which has been the policy up till now).
Opening this issue for someone to write such a lint rule, because I wouldn't know where to start.
Activity
pambot commentedon Oct 8, 2018
@h-vetinari I'll give this a shot
h-vetinari commentedon Oct 8, 2018
@pambot Great!
pambot commentedon Oct 8, 2018
@jreback Pinging you because it was your idea to add the linting rule - I'm happy to do it, but what's the best way to get get started? I looked around, and all I can find is
ci/lint.sh
(but that's probably just for CI) and a suggestion on StackOverflow to modify atox.ini
file (but it doesn't look like Tox is supported currently)h-vetinari commentedon Oct 8, 2018
@pambot
It was not my idea but @jreback's. I opened this issue to record that review feedback for someone else, because I don't know how to tackle setting up custom linting rules...
Any tips for @pambot, @TomAugspurger @datapythonista, maybe?
TomAugspurger commentedon Oct 8, 2018
You may want to wait till #22854 and #22863 are in before working on this too much, since those are changing up the linters. I would also wait for some consensus on what we want these import rules to look like.
Finally, I think @alimcmaster1 has some plans for a larger import order refactor using isort. That will be a large change that'll cause conflicts in a lot of outstanding PRs. Depending on the size of the changes for this issue, it may be good to do those two PRs close together, so that people only have to fix conflicts once.
TomAugspurger commentedon Oct 8, 2018
On this issue in particular, I don't have strong preferences.
tm.assert*
overassert*
pd.
rather than directly importing top-level things in the test filespambot commentedon Oct 8, 2018
Sounds good, I'm cool with waiting.
h-vetinari commentedon Oct 8, 2018
@TomAugspurger
Thanks for the infos/feedback.
Re:
there might be good reasons for having different rules between
pandas/core
andpandas/tests
, but in the former, it's almost exclusively(?) direct imports.TomAugspurger commentedon Oct 8, 2018
jreback commentedon Oct 8, 2018
just to be clea
this is really aimed at the test files
we may extend this to the code itself at a later point
[-]CLN: internal import rules[/-][+]CLN: Use consistent imports in tests[/+]SaturnFromTitan commentedon Oct 27, 2019
I just had a quick look at this. Using the following
grep
command with varying search texts gives the number of files where this pattern is used:grep -R -l --include="*.py" <search_text> pandas/tests/ | wc -l
search_text =
"from pandas.util.testing import "
: 118search_text =
"import pandas.util.testing as tm"
: 324search_text =
"from pandas import "
: 367search_text =
"import pandas as pd"
: 308For the test utils, the
tm.
import seems to be pretty dominant - so that decision should be straight forward.In the
.pd
case, I'd vote for direct imports because1.) It's already the dominant variation (migrating 60 additional files is quite some effort)
2.) It's in line with @jreback's earlier review comments
3.) According to @TomAugspurger's comment above, this is the way it is mostly done in the rest of the project as well
Wdyt?
jreback commentedon Oct 27, 2019
i would be +1 in applying those as lint rules (have to fix the current first of course)
prefer
WillAyd commentedon Oct 28, 2019
The more I've been reading up on it I would prefer
import pandas as pd
tofrom pandas import ...
if only because the latter seems more prone to circular imports; this probably doesn't matter much for tests but I think sets a better standard throughout code baseGuido himself recommends against using
from <module> import ...
:https://docs.python.org/3/faq/programming.html?highlight=circular%20import#how-can-i-have-modules-that-mutually-import-each-other
h-vetinari commentedon Oct 28, 2019
That may be true for the top level modules, but it is not true anymore when submodules are targeted directly, i.e. the following are not equivalent:
import pandas as pd
and then usingpd.Series
from pandas.core.series import Series
and then usingSeries
Number 1. cannot be used in any code that defines something that is itself exposed through the toplevel module. Number 2. can be imported even there (except within
pandas.core.series
itself, obviously).For this reason, it makes sense to stick with option 2. IMO, because it can be used consistently in more situations (and especially within the core itself).
WillAyd commentedon Oct 28, 2019
I don't believe that is true. read_json for instance is exposed in the top level and imports from there as well. Even
pandas.core.series
does aimport pandas as pd
I didn't realize this was one of the options, but I am -1 on that. Tests should generally be importing objects from the same locations that an end user would
h-vetinari commentedon Oct 28, 2019
Interesting. That goes against my experience but python may have gotten smarter about this with doing some lazy importing. For example,
pd
is only used once inpandas.core.series
inapply
, butpd
still cannot be used in the class constructor, for example (just tested this on master).There's probably a misunderstanding here. For tests, both styles are fine. But since you were specifically aiming at "a better standard throughout code base", I responded that the core has different restrictions. And if a uniform standard is desired, then tests should import
from pandas ...
(without submodules though).WillAyd commentedon Oct 28, 2019
Ah OK - sorry yea I think we misunderstood each other on the last post. I was specifically talking about tests.
Expanding on other areas in the code base though,
from ...
imports still will cause more circular dependencies than aliasing. You can see this locally with two modulescategorical
anddtype
If you uncomment and try to do
from categorical import Categorical
this will fail (at least on Python 3.7.4). I think we run into this issue a lot in core.FWIW I'm not an expert here so maybe simple I'm overlooking, but generally just prefer to go with what Guido suggests on language elements :-)
h-vetinari commentedon Oct 28, 2019
That's kind of what left me confused with your original comment - first you say you'd prefer
import pandas as pd
and then you quote Guido as supporting evidence to usefrom pandas import ...
:)WillAyd commentedon Oct 28, 2019
There was a typo in my original comment that I've since fixed, but may have confused us both. Quoting directly from the article:
SaturnFromTitan commentedon Oct 29, 2019
I think @WillAyd's reasoning is sound and convinced me. It seems like using
import pandas as pd
has some real advantages, whereas all arguments forfrom pandas import ...
seem to be about personal preference.As there's no clear prefered version in the current test suite I'd rather convert everything to
import pandas as pd
.If no one has further objections, I'd already start writing a follow-up ticket for the
import pandas.util.testing as tm
imports. By the time this is resolved, we hopefully came to a conclusion on the other case ✌️jreback commentedon Oct 29, 2019
let's hold off on this one
use the
import pandas.util.testing as tm
is fine thoughmroeschke commentedon Apr 14, 2020
Looks like #33203 is the more up to date discussion on this topic. Closing favor of that thread