-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Improve documentation on how to handle binary and non-binary files (local/remote, up-/download) #595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
AFAIK we don't have a PyScript specific way to do it, so currently the best way is to use pyodide. This stackoverflow answer shows a possible solution: Yes, we should provide a more straightforward way of doing it. |
Thanks, later I'll look into it. Meanwhile I might have found a different cross-browser solution for downloading blobs, will test later and update here. Once I get this running, I'll document everything and set up minimal examples for every variant. |
Just sharing some WIP in case anyone needs it asap. Loading a remote excel file Requires a from pyodide.http import pyfetch
import asyncio
import pandas as pd
import openpyxl
from io import BytesIO
import sys
from js import alert, document, Object, window
from pyodide import create_proxy, to_js
async def load_df():
response = await pyfetch(url="/downloads/test.xlsx", method="GET")
bytes_response = await response.bytes()
df = pd.read_excel(BytesIO(bytes_response))
content = df.to_csv() # returns string when file name missing
return content
async def file_save(event):
try:
options = {
"startIn": "downloads",
"suggestedName": "test_123456.csv"
}
fileHandle = await window.showSaveFilePicker(Object.fromEntries(to_js(options)))
except Exception as e:
console.log('Exception: ' + str(e))
return
content = await load_df()
file = await fileHandle.createWritable()
await file.write(content)
await file.close()
return
def setup_button():
# Create a Python proxy for the callback function
file_save_proxy = create_proxy(file_save)
# Set the listener to the callback
document.getElementById("download").addEventListener("click", file_save_proxy, False)
setup_button() I'm working on |
Not sure if this is super related, but we created a WordPress plugin around Pyscript, and most of the examples work on the site. However, it always throws this error when we try to read a remote csv file in pandas or even just read a remote URL. Is this related to encoding? The weird thing is the other examples, including the matplotlib one works. |
Hi @hellozeyu this is not related as your URL is simply wrong. You're trying to read a csv from the GitHub landing page |
Ah sorry, thought this was pandas-related. You cannot work with the urllib or requests package in pyscript but need to use the pyodide alternatives. See this example. |
Got it. It works for me. Thanks! |
@do-me |
Hi @marimeireles! I might have found a way for the last missing piece (binary downloads like excel) via octet streams and DOM manipulation. I didn't find the time yet to test properly, but as soon as I succeed, I'll come back here! So technically the issue is not yet 100% solved I'd say. Great idea for the snippet-style docs - I think that really suits the spirit of pyscript! |
Alright! :) |
I finally found the time for testing binary downloads from the virtual file system. I wrote a simple function that takes care of everything and saves a pandas excel export to the local file system: from pyodide.http import pyfetch
import asyncio
import pandas as pd
import openpyxl
from io import BytesIO
import base64
from js import document
def pandas_excel_export(df, filename):
# save to virtual filesystem
df.to_excel(filename + ".xlsx")
# binary xlsx to base64 encoded downloadable string
data = open("test.xlsx", 'rb').read()
base64_encoded = base64.b64encode(data).decode('UTF-8')
octet_string = "data:application/octet-stream;base64,"
download_string = octet_string + base64_encoded
# create new helper DOM element, click (download) and remove
element = document.createElement('a')
element.setAttribute("href",download_string)
element.setAttribute("download",filename + ".xlsx")
element.click()
element.remove()
# import
response = await pyfetch("/downloads/test.xlsx", method="GET")
bytes_response = await response.bytes()
# read from bytes
df = pd.read_excel(BytesIO(bytes_response))
# manipulate
df["d"] = df["a"] + df["b"]
# export
pandas_excel_export(df,"test") Working example here. Coming back to the original purpose of this issue, I think we have everything we need to improve the documentation! What do you think about a dedicted `File Handling` section in the docs under Getting Started? Or would you rather think it belongs more to the How-to section? I am preparing a dedicated blog post in the spirit of the original issue description (local/remote & import/export & non-binary/binary data) that could serve as a base for further discussion. |
I am closing this for the following reasons:
|
Uh oh!
There was an error while loading. Please reload this page.
Checklist
What is the issue/comment/problem?
There are a few issues around here concerned with file handling (#588, #558, #463, #151 amongst others).
It would be nice to have a dedicated section in the docs with the recommended way of doing things for binary and non-binary files.
Summed up:
Local
Remote
Download file from browser from remote(that should hopefully be impossible)Due to the different nature of (non-) binary files (e.g. excel or genereally zip files), it would be very useful to have the differentiation included as else one stumples across missing
await
's or similar.I think most of the above points are already described somewhere but I'm missing an example of how to conveniently access the virtual file system in order to download something locally.
Let's consider this:
That's the (currently) easiest way of loading binary files. If I call
df.to_excel("test_output.xlsx")
anddf.to_csv("test_output.csv")
pandas will save the output to the virtual file system.What's the best way of automatically starting the download from the browser to local when pandas is done saving to the virtual file system or could this even be skipped in some way? Do we need to use some js proxy, js buffer for the hooks or would you simply use some pyodide function for this?
The text was updated successfully, but these errors were encountered: