Improve documentation on how to handle binary and non-binary files (local/remote, up-/download) #595

do-me · 2022-07-12T07:05:30Z

Checklist

I added a descriptive title
I searched for other issues and couldn't find a duplication
I already searched in Google and didn't find any good information or help

What is the issue/comment/problem?

There are a few issues around here concerned with file handling (#588, #558, #463, #151 amongst others).
It would be nice to have a dedicated section in the docs with the recommended way of doing things for binary and non-binary files.
Summed up:

Local

Load local file to browser (covered here or here)
Download file from browser to local (two examples here with file picker, but non-binary data only)

Remote

Load remote file to browser (covered in Unable to load excel (xlsx) file in pandas #588)
~~Download file from browser from remote~~ (that should hopefully be impossible)

Due to the different nature of (non-) binary files (e.g. excel or genereally zip files), it would be very useful to have the differentiation included as else one stumples across missing await's or similar.

I think most of the above points are already described somewhere but I'm missing an example of how to conveniently access the virtual file system in order to download something locally.

Let's consider this:

from pyodide.http import pyfetch
import asyncio
import pandas as pd 
import openpyxl
from io import BytesIO

response = await pyfetch(url="/downloads/test.xlsx", method="GET")
bytes_response = await response.bytes()
df = pd.read_excel(BytesIO(bytes_response))
df

That's the (currently) easiest way of loading binary files. If I call df.to_excel("test_output.xlsx") and df.to_csv("test_output.csv") pandas will save the output to the virtual file system.

What's the best way of automatically starting the download from the browser to local when pandas is done saving to the virtual file system or could this even be skipped in some way? Do we need to use some js proxy, js buffer for the hooks or would you simply use some pyodide function for this?

The text was updated successfully, but these errors were encountered:

marimeireles · 2022-07-20T21:38:48Z

I'm not sure if this issue was discussed last week when I wasn't around.
But maybe @antocuni has an opinion on this? Or should I ping Fabio here?
Thanks!
And thanks @do-me for opening the issue =)

antocuni · 2022-07-22T07:33:53Z

What's the best way of automatically starting the download from the browser to local when pandas is done saving to the virtual file system or could this even be skipped in some way?

AFAIK we don't have a PyScript specific way to do it, so currently the best way is to use pyodide. This stackoverflow answer shows a possible solution:
https://stackoverflow.com/questions/64669355/how-to-copy-download-file-created-in-pyodide-in-browser

Yes, we should provide a more straightforward way of doing it.
Yes, we should definitely improve the docs :).

do-me · 2022-07-22T07:44:10Z

Thanks, later I'll look into it. Meanwhile I might have found a different cross-browser solution for downloading blobs, will test later and update here. Once I get this running, I'll document everything and set up minimal examples for every variant.

do-me · 2022-07-22T15:21:52Z

Just sharing some WIP in case anyone needs it asap. Loading a remote excel file .xlsx, reading as pandas df and downloading as .csv with the file picker solution.

Requires a download HTML button on the page, e.g. <button id="download">Download</button>

from pyodide.http import pyfetch
import asyncio
import pandas as pd 
import openpyxl
from io import BytesIO
import sys
from js import alert, document, Object, window
from pyodide import create_proxy, to_js

async def load_df():
  response = await pyfetch(url="/downloads/test.xlsx", method="GET")
  bytes_response = await response.bytes()
  df = pd.read_excel(BytesIO(bytes_response))
  content = df.to_csv() # returns string when file name missing 
  return content
  
async def file_save(event):
	try:
		options = {
			"startIn": "downloads",
			"suggestedName": "test_123456.csv"
		}

		fileHandle = await window.showSaveFilePicker(Object.fromEntries(to_js(options)))
	except Exception as e:
		console.log('Exception: ' + str(e))
		return

	content = await load_df()

	file = await fileHandle.createWritable()
	await file.write(content)
	await file.close()
	return

def setup_button():
	# Create a Python proxy for the callback function
	file_save_proxy = create_proxy(file_save)

	# Set the listener to the callback
	document.getElementById("download").addEventListener("click", file_save_proxy, False)

setup_button()

I'm working on
a) cross-browser functionality as file picker isn't working in Firefox and
b) blob (= e.g. xlsx files) downloads.

hellozeyu · 2022-07-26T16:10:59Z

Not sure if this is super related, but we created a WordPress plugin around Pyscript, and most of the examples work on the site. However, it always throws this error when we try to read a remote csv file in pandas or even just read a remote URL. Is this related to encoding? The weird thing is the other examples, including the matplotlib one works.

do-me · 2022-07-27T06:24:01Z

Hi @hellozeyu this is not related as your URL is simply wrong. You're trying to read a csv from the GitHub landing page https://github.com/. Insert the real link (raw csv file, not the repo) and it should work. E.g. this one.

do-me · 2022-07-27T06:26:08Z

Ah sorry, thought this was pandas-related. You cannot work with the urllib or requests package in pyscript but need to use the pyodide alternatives. See this example.

hellozeyu · 2022-07-27T14:12:23Z

Got it. It works for me. Thanks!

marimeireles · 2022-09-12T13:18:33Z

@do-me
Do you think the solution linked in this issue fits your use case? #756
Also, I think you already found a solution? Not sure the last time we talked you said you had something almost working?
Lemme know if you need help, we can sync =)
I think it'd be really cool to have docs on it and we can do it on a style of "how to". Basically just some code snippets that work and a short explanation on why it works that way it does it'd be perfect.
Jeff Glass contributed something like this last week: https://docs.pyscript.net/latest/howtos/passing-objects.html
The one about output could be much shorter though.

do-me · 2022-09-12T18:01:13Z

Hi @marimeireles!
Thanks for coming back to this issue.
I'm not at home this week but I'll have a look at the new docs next week. Looks promising!

I might have found a way for the last missing piece (binary downloads like excel) via octet streams and DOM manipulation. I didn't find the time yet to test properly, but as soon as I succeed, I'll come back here! So technically the issue is not yet 100% solved I'd say.

Great idea for the snippet-style docs - I think that really suits the spirit of pyscript!

marimeireles · 2022-09-15T10:14:58Z

Alright! :)
I'm around just ping me.

do-me · 2022-10-03T12:40:42Z

I finally found the time for testing binary downloads from the virtual file system. I wrote a simple function that takes care of everything and saves a pandas excel export to the local file system:

from pyodide.http import pyfetch
import asyncio
import pandas as pd 
import openpyxl
from io import BytesIO
import base64
from js import document

def pandas_excel_export(df, filename):
    # save to virtual filesystem
    df.to_excel(filename + ".xlsx")

    # binary xlsx to base64 encoded downloadable string 
    data = open("test.xlsx", 'rb').read()
    base64_encoded = base64.b64encode(data).decode('UTF-8')
    octet_string = "data:application/octet-stream;base64,"
    download_string = octet_string + base64_encoded

    # create new helper DOM element, click (download) and remove 
    element = document.createElement('a')
    element.setAttribute("href",download_string)
    element.setAttribute("download",filename + ".xlsx")
    element.click()
    element.remove()

# import 
response = await pyfetch("/downloads/test.xlsx", method="GET")
bytes_response = await response.bytes()

# read from bytes
df = pd.read_excel(BytesIO(bytes_response))

# manipulate
df["d"] = df["a"] + df["b"]

# export
pandas_excel_export(df,"test")

Working example here.

Coming back to the original purpose of this issue, I think we have everything we need to improve the documentation!

What do you think about a dedicted `File Handling` section in the docs under Getting Started? Or would you rather think it belongs more to the How-to section?

I am preparing a dedicated blog post in the spirit of the original issue description (local/remote & import/export & non-binary/binary data) that could serve as a base for further discussion.

WebReflection · 2024-05-06T14:00:09Z

I am closing this for the following reasons:

we now offer a fetch(...).bytearray() to solve the conversion issue
we have documented how to write, read, upload, download files via latest PyScript
binary VS non binary is still a matter of open(..., 'rb') VS open(..., 'r') so I hope we covered it all

do-me added the needs-triage Issue needs triage label Jul 12, 2022

pyscript-bot added this to PyScript OSS Jul 12, 2022

marimeireles added tag: docs Related to the documentation and removed needs-triage Issue needs triage labels Jul 20, 2022

marimeireles mentioned this issue Sep 8, 2022

Add docs for opening/saving files #756

Closed

3 tasks

marimeireles self-assigned this Sep 15, 2022

do-me mentioned this issue Sep 19, 2022

awesome-pyscript listing review pyscript/pyscript-collective#28

Merged

This was referenced Oct 4, 2022

embed openRefine #331

Closed

Directory recognition problem #338

Closed

marimeireles added the backlog issue has been triaged but has not been earmarked for any upcoming release label Oct 4, 2022

marimeireles mentioned this issue Oct 4, 2022

How to load different files and manage your environment on the "How to session" #583

Closed

tedpatrick moved this to Next in PyScript OSS Apr 3, 2023

WebReflection closed this as completed May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve documentation on how to handle binary and non-binary files (local/remote, up-/download) #595

Improve documentation on how to handle binary and non-binary files (local/remote, up-/download) #595

do-me commented Jul 12, 2022 •

edited

Loading

marimeireles commented Jul 20, 2022

Uh oh!

antocuni commented Jul 22, 2022

Uh oh!

do-me commented Jul 22, 2022

Uh oh!

do-me commented Jul 22, 2022

Uh oh!

hellozeyu commented Jul 26, 2022

Uh oh!

do-me commented Jul 27, 2022

Uh oh!

do-me commented Jul 27, 2022

Uh oh!

hellozeyu commented Jul 27, 2022

Uh oh!

marimeireles commented Sep 12, 2022

Uh oh!

do-me commented Sep 12, 2022

Uh oh!

marimeireles commented Sep 15, 2022

Uh oh!

do-me commented Oct 3, 2022 •

edited

Loading

Uh oh!

WebReflection commented May 6, 2024

Uh oh!

Improve documentation on how to handle binary and non-binary files (local/remote, up-/download) #595

Improve documentation on how to handle binary and non-binary files (local/remote, up-/download) #595

Comments

do-me commented Jul 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

What is the issue/comment/problem?

Local

Remote

marimeireles commented Jul 20, 2022

Uh oh!

antocuni commented Jul 22, 2022

Uh oh!

do-me commented Jul 22, 2022

Uh oh!

do-me commented Jul 22, 2022

Uh oh!

hellozeyu commented Jul 26, 2022

Uh oh!

do-me commented Jul 27, 2022

Uh oh!

do-me commented Jul 27, 2022

Uh oh!

hellozeyu commented Jul 27, 2022

Uh oh!

marimeireles commented Sep 12, 2022

Uh oh!

do-me commented Sep 12, 2022

Uh oh!

marimeireles commented Sep 15, 2022

Uh oh!

do-me commented Oct 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WebReflection commented May 6, 2024

Uh oh!

do-me commented Jul 12, 2022 •

edited

Loading

do-me commented Oct 3, 2022 •

edited

Loading