Skip to content

Unable to load excel (xlsx) file in pandas #588

Closed
@do-me

Description

@do-me

Checklist

  • I added a descriptive title
    I searched for other issues and couldn't find a solution or duplication
    I already searched in Google and didn't find any good information or help

What happened?

I tried to load an excel file from my server (not locally as in #287) with pandas and pyodide's functions but it doesn't work. I know that for pyodide.open_url(url: str) → _io.StringIO

The download of binary files is not supported.

See here. (I'm using the function with success for loading simple csv files)
Pyodide's docs claim that I should look into other functions for binary files (like xlsx).

I set up a playground to quickly test it here, with test.xlsx served under /downloads/test.xlsx.

I'm coming as far as retrieving the bytes from the response object.

from pyodide.http import pyfetch
import asyncio
import pandas as pd 
import openpyxl
from io import BytesIO

response = await pyfetch(url="https://geo.rocks/downloads/test.xlsx", method="GET")
bytes_response = response.bytes()
BytesIO(bytes_response)

But it gives me

JsException(PythonError: Traceback (most recent call last): File "/lib/python3.10/asyncio/futures.py", line 201, in result raise self._exception File "/lib/python3.10/asyncio/tasks.py", line 232, in __step result = coro.send(None) File "/lib/python3.10/site-packages/_pyodide/_base.py", line 500, in eval_code_async await CodeRunner( File "/lib/python3.10/site-packages/_pyodide/_base.py", line 353, in run_async await coroutine File "", line 10, in TypeError: a bytes-like object is required, not 'coroutine' )

Afterwards I wanted to read it into a pandas df but I think I'm lacking a tiny piece in the puzzle here. Also the error seems strange as it indeed return the bytes if you leave out the BytesIO function.

bytes_response gives

b'PK\x03\x04\x14\x00\x06\x00\x08\x00\x00\x00!\x00A7\x82\xcfn\x01\x00\x00\x04\x05\x00\x00\x13\x00\x08\x02[Content_Types].xml \xa2\x04\x02(\xa0\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\ ...

Any clue how to load the response into a df?

What browsers are you seeing the problem on? (if applicable)

Firefox, Chrome, Safari, Microsoft Edge

Console info

No response

Additional Context

No response

Activity

JeffersGlass

JeffersGlass commented on Jul 8, 2022

@JeffersGlass
Contributor

You'll need to await the reponse.bytes() call - it returns a promise that resolves to the actual bytes of the response, per the fetchResponse API.

from pyodide.http import pyfetch
import asyncio
import pandas as pd 
import openpyxl
from io import BytesIO

response = await pyfetch(url="/your/URL/goes/here", method="GET")
bytes_response = await response.bytes()
df = pd.read_excel(BytesIO(bytes_response))
print(df.head())

Result:
image

The loaded data is odd (I think due to columns A, B, and C being empty in the xlsx file, at least as it downloaded for me), but the data is at least loading.

do-me

do-me commented on Jul 8, 2022

@do-me
Author

Perfect, it works, thanks a lot! So in the end I was simply missing another await.
Indeed, my test file was slightly malformatted as the columns 1-3 are empty.
image

Added some CSS and changed the test file on my blog, so anyone feel free to test here with this (slightly modified) code

from pyodide.http import pyfetch
import asyncio
import pandas as pd 
import openpyxl
from io import BytesIO

response = await pyfetch(url="/downloads/test.xlsx", method="GET")
bytes_response = await response.bytes()
df = pd.read_excel(BytesIO(bytes_response))
df

printing a nice df:

image

Closing as solved.

moved this to Closed in PyScript OSSon Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Closed

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @JeffersGlass@do-me

        Issue actions

          Unable to load excel (xlsx) file in pandas · Issue #588 · pyscript/pyscript