Skip to content

BUG: data access issue for the DL tutorial #254

Closed
@bsipocz

Description

@bsipocz
Member

CI cron run into data access issues with the DL notebook. A series of restarts fixed most of the jobs, but not all, and anyway it should not be a problem when there is only a handful runners trying to grab the same data (e.g. the problem will be way more present when it's a full room of workshop attendees).

So I open this issue as a reminder, and if this is not a one-off problem, to do something about it.

Activity

added
bugSomething isn't working
infrastructureIssues relevant to infrasructure, rather than content
on May 19, 2025
rossbar

rossbar commented on May 19, 2025

@rossbar
Collaborator

Yeah looks like 429 errors, i.e. server request limits. Right now the data is hosted on github (in a personal repo of mine 😱 ) which was our "temporary" solution to the 503 errors when we were pinging the server on which the data was originally hosted.

I would be very surprised if the MNIST data weren't already hosted somewhere publicly and more sustainably, so we should investigate + switch to that!

melissawm

melissawm commented on May 19, 2025

@melissawm
Member

Total overcomplication but if we can't find another alternative, torchvision packages a version of mnist: https://docs.pytorch.org/vision/stable/generated/torchvision.datasets.MNIST.html

bsipocz

bsipocz commented on May 19, 2025

@bsipocz
MemberAuthor

If I see correctly they are still just grabbing from these two addresses:


    mirrors = [
        "https://ossci-datasets.s3.amazonaws.com/mnist/",
        "http://yann.lecun.com/exdb/mnist/",
    ]
bsipocz

bsipocz commented on May 19, 2025

@bsipocz
MemberAuthor

(but at least the s3 one should be resilient for multiple downloads, so I would be +1 for swapping over the uris. However, I would not like adding pytorch as a dependency just for making use of this function)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinginfrastructureIssues relevant to infrasructure, rather than content

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @rossbar@melissawm@bsipocz

      Issue actions

        BUG: data access issue for the DL tutorial · Issue #254 · numpy/numpy-tutorials