Skip to content

CLN: Use consistent imports in tests #23018

@h-vetinari

Description

@h-vetinari
Contributor

A while ago, I got the review from @jreback that:

I kept applying these rules until #22730, where @jreback then said:

Before we do any more of this
[quote of the above]
I would like a lint rule to see what is what (and basically list the offenders as exclusions for now).
We have been changing things like this for a long time and I think we somtimes change them back (e.g. to use the imports of assert_*, rather than tm., and to use the pd.* rather than the direct imports.

and further:

let's see which way is more common, then have a lint rule to enforce it, rather than ad-hoc conversions (which has been the policy up till now).

Opening this issue for someone to write such a lint rule, because I wouldn't know where to start.

Activity

pambot

pambot commented on Oct 8, 2018

@pambot
Contributor

@h-vetinari I'll give this a shot

h-vetinari

h-vetinari commented on Oct 8, 2018

@h-vetinari
ContributorAuthor

@pambot Great!

pambot

pambot commented on Oct 8, 2018

@pambot
Contributor

@jreback Pinging you because it was your idea to add the linting rule - I'm happy to do it, but what's the best way to get get started? I looked around, and all I can find is ci/lint.sh (but that's probably just for CI) and a suggestion on StackOverflow to modify a tox.ini file (but it doesn't look like Tox is supported currently)

h-vetinari

h-vetinari commented on Oct 8, 2018

@h-vetinari
ContributorAuthor

@pambot
It was not my idea but @jreback's. I opened this issue to record that review feedback for someone else, because I don't know how to tackle setting up custom linting rules...

Any tips for @pambot, @TomAugspurger @datapythonista, maybe?

TomAugspurger

TomAugspurger commented on Oct 8, 2018

@TomAugspurger
Contributor

You may want to wait till #22854 and #22863 are in before working on this too much, since those are changing up the linters. I would also wait for some consensus on what we want these import rules to look like.

Finally, I think @alimcmaster1 has some plans for a larger import order refactor using isort. That will be a large change that'll cause conflicts in a lot of outstanding PRs. Depending on the size of the changes for this issue, it may be good to do those two PRs close together, so that people only have to fix conflicts once.

TomAugspurger

TomAugspurger commented on Oct 8, 2018

@TomAugspurger
Contributor

On this issue in particular, I don't have strong preferences.

  • I prefer tm.assert* over assert*
  • I slightly prefer pd. rather than directly importing top-level things in the test files
pambot

pambot commented on Oct 8, 2018

@pambot
Contributor

Sounds good, I'm cool with waiting.

h-vetinari

h-vetinari commented on Oct 8, 2018

@h-vetinari
ContributorAuthor

@TomAugspurger

Thanks for the infos/feedback.

Re:

I slightly prefer pd. rather than directly importing top-level things in the test files

there might be good reasons for having different rules between pandas/core and pandas/tests, but in the former, it's almost exclusively(?) direct imports.

TomAugspurger

TomAugspurger commented on Oct 8, 2018

@TomAugspurger
Contributor
jreback

jreback commented on Oct 8, 2018

@jreback
Contributor

just to be clea
this is really aimed at the test files
we may extend this to the code itself at a later point

changed the title [-]CLN: internal import rules[/-] [+]CLN: Use consistent imports in tests[/+] on Oct 9, 2018
added
Testingpandas testing functions or related to the test suite
Code StyleCode style, linting, code_checks
on Oct 9, 2018
SaturnFromTitan

SaturnFromTitan commented on Oct 27, 2019

@SaturnFromTitan
Contributor

I just had a quick look at this. Using the following grep command with varying search texts gives the number of files where this pattern is used:
grep -R -l --include="*.py" <search_text> pandas/tests/ | wc -l

search_text = "from pandas.util.testing import ": 118
search_text = "import pandas.util.testing as tm": 324
search_text = "from pandas import ": 367
search_text = "import pandas as pd": 308

For the test utils, the tm. import seems to be pretty dominant - so that decision should be straight forward.

In the .pd case, I'd vote for direct imports because
1.) It's already the dominant variation (migrating 60 additional files is quite some effort)
2.) It's in line with @jreback's earlier review comments
3.) According to @TomAugspurger's comment above, this is the way it is mostly done in the rest of the project as well

Wdyt?

jreback

jreback commented on Oct 27, 2019

@jreback
Contributor

i would be +1 in applying those as lint rules (have to fix the current first of course)

prefer

search_text = "import pandas.util.testing as tm": 324
search_text = "from pandas import ": 367

WillAyd

WillAyd commented on Oct 28, 2019

@WillAyd
Member

The more I've been reading up on it I would prefer import pandas as pd to from pandas import ... if only because the latter seems more prone to circular imports; this probably doesn't matter much for tests but I think sets a better standard throughout code base

Guido himself recommends against using from <module> import ... :

https://docs.python.org/3/faq/programming.html?highlight=circular%20import#how-can-i-have-modules-that-mutually-import-each-other

h-vetinari

h-vetinari commented on Oct 28, 2019

@h-vetinari
ContributorAuthor

@WillAyd [my bold]: The more I've been reading up on it I would prefer import pandas as pd to from pandas import ... if only because the latter seems more prone to circular imports; this probably doesn't matter much for tests but I think sets a better standard throughout code base

That may be true for the top level modules, but it is not true anymore when submodules are targeted directly, i.e. the following are not equivalent:

  1. import pandas as pd and then using pd.Series
  2. from pandas.core.series import Series and then using Series

Number 1. cannot be used in any code that defines something that is itself exposed through the toplevel module. Number 2. can be imported even there (except within pandas.core.series itself, obviously).

For this reason, it makes sense to stick with option 2. IMO, because it can be used consistently in more situations (and especially within the core itself).

WillAyd

WillAyd commented on Oct 28, 2019

@WillAyd
Member

Number 1. cannot be used in any code that defines something that is itself exposed through the toplevel module.

I don't believe that is true. read_json for instance is exposed in the top level and imports from there as well. Even pandas.core.series does a import pandas as pd

For this reason, it makes sense to stick with option 2. IMO, because it can be used consistently in more situations (and especially within the core itself).

I didn't realize this was one of the options, but I am -1 on that. Tests should generally be importing objects from the same locations that an end user would

h-vetinari

h-vetinari commented on Oct 28, 2019

@h-vetinari
ContributorAuthor

@WillAyd: I don't believe that is true. read_json for instance is exposed in the top level and imports from there as well. Even pandas.core.series does a import pandas as pd

Interesting. That goes against my experience but python may have gotten smarter about this with doing some lazy importing. For example, pd is only used once in pandas.core.series in apply, but pd still cannot be used in the class constructor, for example (just tested this on master).

@WillAyd: I didn't realize this was one of the options, but I am -1 on that. Tests should generally be importing objects from the same locations that an end user would

There's probably a misunderstanding here. For tests, both styles are fine. But since you were specifically aiming at "a better standard throughout code base", I responded that the core has different restrictions. And if a uniform standard is desired, then tests should import from pandas ... (without submodules though).

WillAyd

WillAyd commented on Oct 28, 2019

@WillAyd
Member

Ah OK - sorry yea I think we misunderstood each other on the last post. I was specifically talking about tests.

Expanding on other areas in the code base though, from ... imports still will cause more circular dependencies than aliasing. You can see this locally with two modules categorical and dtype

# categorical.py
import dtype as dt

class Categorical:
    pass

# dtype.py
import categorical as cat
#from categorical import Categorical

class Dtype:

    def __init__(self):
        self.known_types = [cat.Categorical]


if __name__ == "__main__":
    print("Hello world")

If you uncomment and try to do from categorical import Categorical this will fail (at least on Python 3.7.4). I think we run into this issue a lot in core.

FWIW I'm not an expert here so maybe simple I'm overlooking, but generally just prefer to go with what Guido suggests on language elements :-)

h-vetinari

h-vetinari commented on Oct 28, 2019

@h-vetinari
ContributorAuthor

FWIW I'm not an expert here so maybe simple I'm overlooking, but generally just prefer to go with what Guido suggests on language elements :-)

That's kind of what left me confused with your original comment - first you say you'd prefer import pandas as pd and then you quote Guido as supporting evidence to use from pandas import ... :)

WillAyd

WillAyd commented on Oct 28, 2019

@WillAyd
Member

There was a typo in my original comment that I've since fixed, but may have confused us both. Quoting directly from the article:

Guido van Rossum recommends avoiding all uses of from <module> import ...

SaturnFromTitan

SaturnFromTitan commented on Oct 29, 2019

@SaturnFromTitan
Contributor

I think @WillAyd's reasoning is sound and convinced me. It seems like using import pandas as pd has some real advantages, whereas all arguments for from pandas import ... seem to be about personal preference.

As there's no clear prefered version in the current test suite I'd rather convert everything to import pandas as pd.

If no one has further objections, I'd already start writing a follow-up ticket for the import pandas.util.testing as tm imports. By the time this is resolved, we hopefully came to a conclusion on the other case ✌️

jreback

jreback commented on Oct 29, 2019

@jreback
Contributor

As there's no clear prefered version in the current test suite I'd rather convert everything to import pandas as pd.

let's hold off on this one

use the import pandas.util.testing as tm is fine though

mroeschke

mroeschke commented on Apr 14, 2020

@mroeschke
Member

Looks like #33203 is the more up to date discussion on this topic. Closing favor of that thread

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    CleanCode StyleCode style, linting, code_checksTestingpandas testing functions or related to the test suite

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @WillAyd@jreback@TomAugspurger@pambot@datapythonista

        Issue actions

          CLN: Use consistent imports in tests · Issue #23018 · pandas-dev/pandas