Skip to content

Host each crate on its own subdomain and allow user JS #1853

Open
@jsha

Description

@jsha

In #167 there is some discussion of what to do about crates providing their own JS. There are two risks suggested: cryptocurrency mining via JS and installing a ServiceWorker that could serve incorrect documentation for other crates. As far as I know the current plan of record is to prevent this using the Content-Security-Policy header to allowlist certain JS, and initial steps were taken in #1333, which implements CSP for crate pages only, with rustdoc pages left as a future exercise.

I propose that we abandon as impractical the plan to implement CSP for rustdoc pages. Instead, we should explicitly allow bring-your-own-JS, and we should plan on a separate subdomain per crate. This aligns docs.rs security boundaries (crates' documentation should not be able to affect each other) with the web's natural security boundaries. Specifically, the Same-origin Policy is the foundation of web security and states that scripts on foo.example.com cannot affect bar.example.com (without specific opt-in from bar.example.com, other caveats apply, etc etc).

Allowing KaTeX and other useful libraries

With each crate on its own subdomain, we can unreservedly allow crates to include whatever JS they want. This resolves a long-standing uncertainty about what is/will be allowed on docs.rs, particularly as regards the popular KaTeX library used to render LaTeX inline on web pages.

Aligning docs.rs with rustdoc

Allowing crates to bring their own JS (and styles, and even fonts) aligns docs.rs with rustdoc's philosophy: rustdoc has a variety of flags that allow adding arbitrary HTML (including script tags). Also, rustdoc implements Markdown, which is defined to allow arbitrary HTML. Since docs.rs relies so heavily on rustdoc, it would be challenging to enforce a security boundary that rustdoc does not participate in enforcing. To further underscore that: rustdoc has no systematic XSS defense in its HTML generation.

Also, since docs.rs hosts all historic versions of a crate as they were documented at the time, docs.rs needs to deal with output from many historical versions of rustdoc. So even to the extent rustdoc is updated to participate in enforcing this security boundary, we would face the problem of what to do with old versions, and what to do with modern versions that were emitted by a buggy version of rustdoc.

Crates control their own execution environment

With build.rs, crates can do a lot to modify their environment at build time. For instance the xss-probe build.rs takes the simple expedient of writing a .html and a .js file into the docs/ directory before rustdoc runs. Even if we blocked that behavior (for instance, by clearing the doc directory at some strategic moment), there are potential tricks: overwriting the rustdoc binary, setting PATH or LD_LIBRARY_PATH, or other unknown shenanigans. To make a defensible security boundary of "thou shalt not write unauthorized files during doc builds," we would have to invent and enforce a lot of other security boundaries that are not even currently considered boundaries in the Rust ecosystem.

Script-nonce won't work for rustdoc output

For templated output from the docs.rs web server, we can use script-nonce, and inject the nonce at the known places where we are generating an inline script or a <script src=...> tag. But we can't inject nonces into rustdoc HTML because we don't know the known-good places. We could parse the HTML and inject the nonce on all script tags, but of course that would defeat the purpose since we would also inject the nonce on malicious script tags.

Allowlisting scripts also won't work for rustdoc output

We could allowlist the shared files (mainXXX.js, storageXXX.js), but crate-specific JS is a problem. As one example, each crate has a source-filesXXX.js that lists all the files for the source view sidebar (e.g. https://docs.rs/ureq/latest/source-files-20220709-1.64.0-nightly-6dba4ed21.js). That file is under control of the crate author (see "Crates control their own execution environment" above). So allowlisting it would pierce the security boundary we are trying to defend.

DNS and TLS wildcards

Having a separate subdomain for each crate does not require that we configure separate DNS and certificates for 75k+ crates. Instead, we should set up a wildcard DNS entry (*.docs.rs) that points all subdomains to the same set of IP addresses. And we can get a wildcard certificate to match. Then routing requests in docs.rs would just require looking at the hostname as well as the path.

We could also continue doing nothing for a while

In general it's always a good idea to compartmentalize different users' content from each other. For instance, GitHub Pages uses *.github.io, readthedocs uses *.readthedocs.io. However, since there is no authentication on docs.rs and no cookies, the issues we're facing are not particularly serious and we can continue to postpone a systematic fix.

We can disable ServiceWorkers via the CSP worker-src directive, without blocking scripts in general.

Cryptocurrency mining via JS is annoying, but has such tiny yields you need a massive amount of visitor traffic to be worthwhile. I don't know what the current state of the problem is, but I suspect you would need to either distribute your JS via an ad network or via a large number of compromised websites to make it worthwhile. And it's pretty noisy. If someone starts using the documentation of a popular crate to mine cryptocurrency, it would be spotted quickly and the docs.rs team could take it down and take any necessary followup actions. This seems like a purely hypothetical problem at this point.

Even if we don't decide to move forward with per-crate subdomains, I think it's very worthwhile to make the decision now that crates are allowed to embed JavaScript, and they will continue to be allowed to do so. The status quo creates unnecessary uncertainty for crate authors, and stumbling blocks for docs.rs developers.

Why the existing approach causes problems

Some issue threads where CSP came up as causing trouble (presumably the combination of default-src 'none'; and `script-src 'nonce-XYZabc123'):

#1387
#302
#1552
#1255
#568

One last cute thing

If each crate has its own subdomain, each crate can have its own favicon logo, so you can better identify different crates' docs in your tabs! ❤️

Metadata

Metadata

Assignees

No one assigned

    Labels

    E-hardEffort: This will require a lot of workS-needs-designStatus: There's a problem here, but no obvious solution; or the solution raises other questions

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions