Skip to content

Pandas DataFrame is not JSON serializable #7689

@s-celles

Description

@s-celles
Contributor

Hello,

I'm trying to build a RESTful api with Flask-RESTful which will return Pandas DataFrame as JSON data.

from flask import Flask
from flask.ext import restful
from flask.ext.restful import Resource, Api
import pandas as pd
import click
import logging

app = Flask(__name__)
api = restful.Api(app)

class DataWebservice(Resource):
    def get(self, size):
        logging.info("get %d" % size)
        # get DB conn
        # df = pd.read_sql(...) # USE LIMIT
        df = pd.DataFrame({"col1": [1]*size, "col2": [2]*size})
        #return(df.to_json())
        return(df)

api.add_resource(DataWebservice, '/api/v1/solar/df/get/<int:size>')

@click.command()
@click.option('--host', default='127.0.0.1', \
    help="host (127.0.0.1 or 0.0.0.0 to accept all ip)")
@click.option('--debug/--no-debug', default=False, help="debug mode")
def main(debug, host):
    app.run(host=host, debug=debug)

if __name__ == '__main__':
    main()

I run server using
$ python server.py --debug

I run client using
$ curl http://127.0.0.1:5000/api/v1/solar/df/get/10

but I get the following error

TypeError:    col1  col2
0     1     2
1     1     2
...
8     1     2
9     1     2

[10 rows x 2 columns] is not JSON serializable

So it seems that Pandas DataFrame are not JSON serializable.

I try this using IPython

size = 10
df = pd.DataFrame({"col1": [1]*size, "col2": [2]*size})
json.dumps(df)

It raises same error.
I'm aware that DataFrame have method named to_json()
but it doesn't help me much as my server will return escaped strings such as

"{\"col1\":{\"0\":1,\"1\":1,\"2\":1,\"3\":1,\"4\":1,\"5\":1,\"6\":1,\"7\":1,\"8\":1,\"9\":1},\"col2\":{\"0\":2,\"1\":2,\"2\":2,\"3\":2,\"4\":2,\"5\":2,\"6\":2,\"7\":2,\"8\":2,\"9\":2}}"

see flask-restful/flask-restful#269

Kind regards

Activity

jreback

jreback commented on Jul 8, 2014

@jreback
Contributor

what are you asking here? pandas provides a read/write API to JSON: http://pandas.pydata.org/pandas-docs/stable/io.html#json, which should be deserializable by foreign JSON parsers. You must use .to_json() as a dataframe as many custom types.

s-celles

s-celles commented on Jul 8, 2014

@s-celles
ContributorAuthor

Hello,

Thanks I know to_json() method
I thought Pandas DataFrame could inherit an other class to become directly "JSON serializable".
So json.dumps(df) could return exactly the same result as df.to_json().
So in this Flask view we could directly return DataFrame (in fact jsonify(df)) instead of doing:

    resp = Response(response=df.to_json(),
        status=200,
        mimetype="application/json")
    return(resp)

But maybe I'm wrong and there is no way for json.dumps(df) to return a JSON string.
I thought that what I'm asking is more a syntactic sugar than a major improvement.

Kind regards

jreback

jreback commented on Jul 8, 2014

@jreback
Contributor

This is the typical way to extend the default json encoder

In [9]: class JSONEncoder(json.JSONEncoder):
   ...:     def default(self, obj):
   ...:         if hasattr(obj,'to_json'):
   ...:             return obj.to_json()
   ...:         return json.JSONEncoder.default(self, obj)
   ...:     

In [10]: json.dumps(df, cls=JSONEncoder)
Out[10]: '"{\\"col1\\":{\\"0\\":1,\\"1\\":1,\\"2\\":1,\\"3\\":1,\\"4\\":1,\\"5\\":1,\\"6\\":1,\\"7\\":1,\\"8\\":1,\\"9\\":1},\\"col2\\":{\\"0\\":2,\\"1\\":2,\\"2\\":2,\\"3\\":2,\\"4\\":2,\\"5\\":2,\\"6\\":2,\\"7\\":2,\\"8\\":2,\\"9\\":2}}"'
s-celles

s-celles commented on Jul 8, 2014

@s-celles
ContributorAuthor

So you think it's better to define our own JSONEncoder. The problem is that I don't think jsonify support cls argument for encoder... but that's an other problem

Moreover I noticed

In [20]: df.to_json()
Out[20]: '{"col1":{"0":1,"1":1,"2":1,"3":1,"4":1,"5":1I"6":1,"7":1,"8":1,"9":1},"col2":{"0":2,"1":2,"2":2,"3":2,"4":2,"5":2,"6":2,"7":2,"8":2,"9":2}}'

In [21]: json.dumps(df, cls=JSONEncoder)
Out[21]: '"{\"col1\":{\"0\":1,\"1\":1,\"2\":1,\"3\":1,\"4\":1,\"5\":1,\"6\":1,\"7\":1,\"8\":1,\"9\":1},\"col2\":{\"0\":2,\"1\":2,\"2\":2,\"3\":2,\"4\":2,\"5\":2,\"6\":2,\"7\":2,\"8\":2,\"9\":2}}"'

it's like a string inside a string...

What I want to encode is in fact

        d = {"success":1 , "return": df}

I know I can do

        d = "{\"success\":1, \"return\":%s}" % df.to_json()

but I feel that's not the right way of doing

jreback

jreback commented on Jul 8, 2014

@jreback
Contributor

HTH

their are lots of options for to_json(), see docs: http://pandas.pydata.org/pandas-docs/stable/io.html#json

not really sure what you are doing

closing this as its out of scope for pandas.

s-celles

s-celles commented on Jul 8, 2014

@s-celles
ContributorAuthor

Please try this:

d = {"success":1 , "return": df}

ser=json.dumps(d, cls=JSONEncoder)

unser=json.loads(ser)

type(unser["return"])

it returns unicode I could expect dict

jreback

jreback commented on Jul 8, 2014

@jreback
Contributor

you need to teach json how to do this by defining an object hook. http://www.yilmazhuseyin.com/blog/dev/advanced_json_manipulation_with_python/

In [13]: pd.read_json(json.loads(ser)['return'])
Out[13]: 
   col1  col2
0     1     2
1     1     2
2     1     2
3     1     2
4     1     2
5     1     2
6     1     2
7     1     2
8     1     2
9     1     2

s-celles

s-celles commented on Jul 8, 2014

@s-celles
ContributorAuthor

Thanks for this tutorial but there is a difference between the 2 serialized versions

In [74]: d = {"success":1 , "return": df}

In [75]: d
Out[75]:
{'return':    col1  col2
 0     1     2
 1     1     2
 2     1     2
 3     1     2
 4     1     2
 5     1     2
 6     1     2
 7     1     2
 8     1     2
 9     1     2

 [10 rows x 2 columns], 'success': 1}

In [76]: dat_json = "{\"success\":1, \"return\":%s}" % df.to_json()

In [77]: dat_json
Out[77]: '{"success":1, "return":{"col1":{"0":1,"1":1,"2":1,"3":1,"4":1,"5":1,"6":1,"7":1,"8":1,"9":1},"col2":{"0":2,"1":2,"2":2,"3":2,"4":2,"5":2,"6":2,"7":2,"8":2,"9":2}}}'

In [78]: dat_json2 = json.dumps(d, cls=JSONEncoder)

In [79]: dat_json2
Out[79]: '{"return": "{\\"col1\\":{\\"0\\":1,\\"1\\":1,\\"2\\":1,\\"3\\":1,\\"4\\":1,\\"5\\":1,\\"6\\":1,\\"7\\":1,\\"8\\":1,\\"9\\":1},\\"col2\\":{\\"0\\":2,\\"1\\":2,\\"2\\":2,\\"3\\":2,\\"4\\":2,\\"5\\":2,\\"6\\":2,\\"7\\":2,\\"8\\":2,\\"9\\":2}}", "success": 1}'

Is there a clean solution with a custom encoder for object with to_json method (like DataFrame) to output correctly JSON (without extra quotes)

jreback

jreback commented on Jul 8, 2014

@jreback
Contributor

this is maybe a question for SO, I don't use custom uncoders at all. It seems a whole lot simpler for you to simply call df.to_json() which returns a string no? (then since you know the structure, then just pd.read_json() when you need to.

s-celles

s-celles commented on Jul 8, 2014

@s-celles
ContributorAuthor

I understand your reply... but imagine you have several dataframes to output into the same JSON message.... doing things this way is not very clear. I think it's much more clear to have a dict structure which can contains several df (and other data) and after serialize it.

About deserialization... yes that's not a problem... I know structure and where are dataframes.

jreback

jreback commented on Jul 8, 2014

@jreback
Contributor

ok, as I said, you maybe want to do custom encoding/decoding like I showed above. You need to write that. a dataframe can be turned into json via to_json and read by read_json. when to do that is up 2 u.

s-celles

s-celles commented on Jul 8, 2014

@s-celles
ContributorAuthor

It works much better with this custom encoder (with to_dict method):

class JSONEncoder(json.JSONEncoder):
    def default(self, obj):
        if hasattr(obj,'to_dict'):
            return obj.to_dict()
        return json.JSONEncoder.default(self, obj)

In: size = 10
In: df = pd.DataFrame({"col1": [1]*size, "col2": [2]*size})
In: ser = json.dumps(d, cls=JSONEncoder)
Out: '{"return": {"col2": {"0": 2, "1": 2, "2": 2, "3": 2, "4": 2, "5": 2, "6": 2, "7": 2, "8": 2, "9": 2}, "col1": {"0": 1, "1": 1, "2": 1, "3": 1, "4": 1, "5": 1, "6": 1, "7": 1, "8": 1, "9": 1}}, "success": 1}'
unser=json.loads(ser)

In: print(unser)
Out: {u'return': {u'col2': {u'1': 2, u'0': 2, u'3': 2, u'2': 2, u'5': 2, u'4': 2, u'7': 2, u'6': 2, u'9': 2, u'8': 2}, u'col1': {u'1': 1, u'0': 1, u'3': 1, u'2': 1, u'5': 1, u'4': 1, u'7': 1, u'6': 1, u'9': 1, u'8': 1}}, u'success': 1}

In: type(unser)
Out: dict

pd.DataFrame(unser['return'])

There is no extra quotes.

Thanks

s-celles

s-celles commented on Jul 8, 2014

@s-celles
ContributorAuthor

Problem is that in fact I can't use 'orient' parameter which is very convenient to reduce message size. Maybe a to_object(orient='...') could be a good idea and could be call by to_json

wegry

wegry commented on Oct 24, 2016

@wegry

@scls19fr this is obviously an old issue, but seeing as I stumbled upon it. The easiest way to nest a dataframe in a larger JSON blob is to use

demo = {
    'key': df.to_dict(orient='record')
}

json.dump(demo)
s-celles

s-celles commented on Oct 25, 2016

@s-celles
ContributorAuthor

Thanks @wegry I'm aware of #8486 which closes #7840

Abir0802

Abir0802 commented on Oct 13, 2017

@Abir0802

you can use make_response from flask , e.g.
resp = make_response(df.to_json(orient = "records"))
and then simply return it.

shwetasit

shwetasit commented on Oct 11, 2018

@shwetasit

This saved my life. Thank you!

arpitr07

arpitr07 commented on Mar 31, 2020

@arpitr07

Thanks, Abir0802. Was stuck here for a while.

xuancong84

xuancong84 commented on Apr 21, 2020

@xuancong84

Well, there are too many Python-specific data types which JSON cannot serialize/deserialize. Take a look at the following nested data structure and you will immediately realize how many things JSON cannot handle:

[1, 3.4, 1.1+2.1j, np.nan, None, True, False, b'ab12', 'abc', int, float,
 pd.Series(), pd.DataFrame(), pd.DataFrame, type(pd.DataFrame), ['a', 1],
 {
  'a':1,
  'b':2,
  print:max,
  pd:np,
  type:0,
  int:1,
  0:pd.DataFrame(np.random.randint(0,256,[4,4]),
                 columns=['index a1', 'index a2', 'b', 'c'],
                 index=pd.date_range('2020-01-01', '2020-01-04')).set_index(['index a1', 'index a2'], append=True),
  1:pd.Series([1, 2.5, 3+1j, np.nan, 'abc'], index=pd.date_range('2020-01-01', '2020-01-05', tz='Asia/Singapore')),
  2:np.array([[1, 2.5, 'a'], [1+.5j, np.nan, 'b']]),
  3:np.matrix([[1, 2.5], [1+.5j, np.nan]])
 },
 {1, 3.4, 1+2j, np.nan, True, False, None, int, 'aa', os, sys, pd.concat}]

I have recently developed a comprehensive utility pandas-serializer, which can serialize/deserialize almost everything (exactly everything as shown above). My utility does not depend on JSON at all and it uses native Python eval/repr/str to serialize and deserialize. You are welcome to try and see what cannot be identically deserialized, and report to me. Thanks! -:)

kopeclu2

kopeclu2 commented on Nov 17, 2021

@kopeclu2

@scls19fr this is obviously an old issue, but seeing as I stumbled upon it. The easiest way to nest a dataframe in a larger JSON blob is to use

demo = {
    'key': df.to_dict(orient='record')
}

json.dump(demo)

That is the solution 👍 💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @s-celles@jreback@wegry@xuancong84@Abir0802

        Issue actions

          Pandas DataFrame is not JSON serializable · Issue #7689 · pandas-dev/pandas