Skip to content

Proposal: Shorter default Series/DataFrame repr when truncated #27000

Closed
@jorisvandenbossche

Description

@jorisvandenbossche
Member

There haven been previous attempts to reduce the length of the Series/DataFrame repr (pandas.options.display.max_rows), eg #20514. Related pandas-dev email: https://mail.python.org/pipermail/pandas-dev/2018-March/000732.html

In that discussion, I once made the following proposal to introduce two thresholds:

  • We have 2 thresholds instead of 1 (the current 'max_rows'): a number of
    rows to show in a truncated repr, and a max number of rows to show
    without truncating
  • For 'big' dataframes, we show a truncated repr. And then I would go even
    lower than 20 and only show first/last 5 (so like a max_rows of 10)
  • For 'small' dataframes, we show the full dataframe without truncating, up
    to the threshold.

We would still need to define those two thresholds. But for example, using the current max_rows of 60: we could show a full repr up to 60 rows, and once the number of rows > 60, we only show 10 (first/last 5).

You can then still set both thresholds at the same number (like 20, as in the linked PR above) to not get this variable behaviour.

This is actually similar to what numpy arrays do (but with a bigger threshold: eg np.random.randn(1000) shows all 1000 elements, np.random.randn(1001) shows the first/lst 3).
And it is also very similar to what R tibbles do: they have a "print_min" and "print_max" options with exactly this behaviour, only their "print_max" is lower (it's 10 and 20, respectively):

options(tibble.print_max = n, tibble.print_min = m): if there are more than
n rows, print only the first m rows. Use options(tibble.print_max = Inf)
to always show all rows.

Activity

simonjayhawkins

simonjayhawkins commented on Jun 23, 2019

@simonjayhawkins
Member

so this would add two display options pandas.options.display.min_rows and pandas.options.display.min_columns

and two arguments, to to_string and to_html(notebook=True); min_rows and min_cols?

personally, since this proposal is display related and not so relevant to IO, I would prefer not to see the additional arguments to to_string and to_html

jorisvandenbossche

jorisvandenbossche commented on Jun 23, 2019

@jorisvandenbossche
MemberAuthor

I would personally only start with min_rows (we could always add the columns one later if there is demand for it).

And also personally for me, I am fine with only adding it as a general display option for now, and not necessarily to to_string / to_html.

simonjayhawkins

simonjayhawkins commented on Jun 23, 2019

@simonjayhawkins
Member

+1 in that case.

TomAugspurger

TomAugspurger commented on Jun 23, 2019

@TomAugspurger
Contributor

+1 from me as well.

added this to the 0.25.0 milestone on Jun 28, 2019
jorisvandenbossche

jorisvandenbossche commented on Jul 3, 2019

@jorisvandenbossche
MemberAuthor

cc @pandas-dev/pandas-core we might still include this in 0.25.0. Any concerns about the above proposal?

shoyer

shoyer commented on Jul 3, 2019

@shoyer
Member

+1 sounds great to me

toobaz

toobaz commented on Jul 3, 2019

@toobaz
Member

+1 for me too

topper-123

topper-123 commented on Jul 3, 2019

@topper-123
Contributor

I'm +1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Output-Formatting__repr__ of pandas objects, to_string

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      Participants

      @jreback@jorisvandenbossche@shoyer@toobaz@TomAugspurger

      Issue actions

        Proposal: Shorter default Series/DataFrame repr when truncated · Issue #27000 · pandas-dev/pandas