New DataFrame feature: listify() and unlistify()

While building up a DataFrame in several steps, I found it difficult to add a new "perpendicular" column, i.e. a column that adds another dimension to already existing columns. To solve this problem I got the idea that this may be done in two steps:
1. Add a new column whose values at each cell is a python list of the values the new column takes on.
2. Unlistify the column thereby creating a new row for each element in the above lists.

I.e. I propose two new DataFrame methods, listify() and unlistify().

`listify(df, column)`: Takes as input a dataframe and the name of a column. It will do a groupby of the df for all columns except column and generate a single row where the values in the column cell is a list of the column `column` values.

`unlistify(df, column)`: Takes as input a dataframe and the name of a column. It will iterate over the values of the contents of `column` for each row and generate a new row for each value.

The functions may be expanded to support multiple columns. `listify()` may e.g. support a post processing function, that will be applied on the list.

The following python code illustrates these two functions. But obviously the functionality may be implemented more efficienctly on the C-level.

``` python
#!/usr/bin/python

import pandas as pd

def listify(df, column):
  matches = [i for i,n in enumerate(df.columns)
             if n==column]

  if len(matches)==0:
    raise Exception('Failed to find column named ' + column +'!')
  if len(matches)>1:
    raise Exception('More than one column named ' + column +'!')

  old_index = df.index
  col_idx = matches[0] + len(old_index.shape) # Since we will reset the index

  column_names = list(df.index.names) + list(df.columns)
  gb_cols = [c for c in column_names
             if c!= column]

  # Helper function to generate the squashed dataframe
  def fnc(d):
    row = list(d.values[0])
    return pd.DataFrame([row[:col_idx]
                         + [[v[col_idx] for v in list(d.values)]]
                         + row[col_idx+1:]])

  return (df
          .reset_index()
          .groupby(gb_cols)
          .apply(fnc)
          .rename(columns = lambda i : column_names[i])
          .set_index(old_index.names)
          )

def unlistify(df, column):
  matches = [i for i,n in enumerate(df.columns)
             if n==column]

  if len(matches)==0:
    raise Exception('Failed to find column named ' + column +'!')
  if len(matches)>1:
    raise Exception('More than one column named ' + column +'!')

  col_idx = matches[0]

  # Helper function to expand and repeat the column col_idx
  def fnc(d):
    row = list(d.values[0])
    bef = row[:col_idx]
    aft = row[col_idx+1:]
    col = row[col_idx]
    z = [bef + [c] + aft for c in col]
    return pd.DataFrame(z)

  col_idx += len(df.index.shape) # Since we will push reset the index
  index_names = list(df.index.names)
  column_names = list(index_names) + list(df.columns)
  return (df
          .reset_index()
          .groupby(level=0,as_index=0)
          .apply(fnc)
          .rename(columns = lambda i :column_names[i])
          .set_index(index_names)
          )


# Examples of how to listify and unlistify a column.

df = pd.DataFrame([[11,range(5),10],
                   [22,range(3),20]],
                   columns = ['A','B','C']).set_index('C')
print 'org'
print df
print '--'
df = unlistify(df,'B')
print 'unlistify(df,B)'
print df
print '--'
df = listify(df,'B')
print 'listify(df,B)'
print df
```

The corresponding output:

```
org
     A                B
C                      
10  11  [0, 1, 2, 3, 4]
20  22        [0, 1, 2]
--
unlistify(df,B)
     A  B
C        
10  11  0
10  11  1
10  11  2
10  11  3
10  11  4
20  22  0
20  22  1
20  22  2
--
listify(df,B)
     A                B
C                      
10  11  [0, 1, 2, 3, 4]
20  22        [0, 1, 2]
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

Uh oh!

New DataFrame feature: listify() and unlistify() #10511

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

New DataFrame feature: listify() and unlistify() #10511

Description

Activity

shoyer commented on Jul 5, 2015

dov commented on Jul 5, 2015

shoyer commented on Jul 5, 2015

shoyer commented on Jul 5, 2015

dov commented on Jul 5, 2015

jreback commented on Jul 5, 2015

dov commented on Jul 5, 2015

shoyer commented on Jul 6, 2015

libbkmz commented on Apr 22, 2016

jreback commented on Apr 22, 2016

libbkmz commented on Apr 22, 2016

jreback commented on Apr 22, 2016

NirantK commented on Jan 8, 2018

chaxor commented on Jan 17, 2018

jreback commented on Jan 17, 2018

dmarinav commented on Jan 31, 2018

jreback commented on Jan 31, 2018

TomAugspurger commented on Feb 1, 2018

summerela commented on Jun 15, 2018

orenovadia commented on Jun 12, 2019

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions