Skip to content

Host site URL without index.html is intercepted by service worker #24

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eliot-akira opened this issue Sep 28, 2022 · 11 comments
Closed

Comments

@eliot-akira
Copy link
Collaborator

eliot-akira commented Sep 28, 2022

When deploying a demo site, I noticed it works fine when I visit the static file specifically, like example.com/index.html, but it has some unpredictable behavior with the root domain only, like example.com. Sometimes it takes a long time to load, or doesn't load at all.

I tracked down the cause (I think) to the service worker intercepting the visit to the host site and serving the WordPress site via the web worker, outside/without the iframe.

I'm still wrapping my head around how all of this works, but I figured I'd make a note of it. Will report back if I'm able to solve it.

@gziolo
Copy link
Member

gziolo commented Sep 28, 2022

That might be related to what I reported in #1.

@eliot-akira
Copy link
Collaborator Author

eliot-akira commented Sep 28, 2022

I forgot to mention, I'm seeing this issue on Firefox.

It seems specifying the static HTML file in the URL is consistently more reliable than without it. I'm guessing it's because the service worker skips the former URL, but intercepts the latter request since it doesn't have a file extension. (In my case it's using NGINX's index directive.)

That might be something to test for in #1, if the issue on Chrome still happens when visiting 127.0.0.1:8777/index.html.

As for a possible solution, I wonder if the service worker can recognize whether a fetch request is coming from inside an iframe, or from visiting the host site itself (in which case it should be bypassed completely, I'm assuming).

@adamziel
Copy link
Collaborator

adamziel commented Sep 29, 2022

The culprit is the service worker as you have found. Somewhere here it decides to serve the root domain request:

https://github.com/WordPress/wordpress-wasm/blob/31d97aced6655a740d075198b9faebc3dfa7ea5f/src/web/service-worker.js#L10-L18

In the short term, we could improve the path matching. In the longer term, that will only lead to more corner cases. I like your idea much more:

I wonder if the service worker can recognize whether a fetch request is coming from inside an iframe, or from visiting the host site itself

I wonder if iframe sandboxing could somehow help here. If not, perhaps it's possible to attach something unique to each request intended to go to WordPress, e.g. a cookie or a header. It would also help solve #9

@adamziel
Copy link
Collaborator

This StackOverflow answer offers some inspiration:

https://stackoverflow.com/questions/63848494/how-to-differ-regular-and-iframe-requests-through-a-service-worker

The service worker has access to self.clients and each client has a unique id and a frameType. There should be a way to only route requests coming from the client id associated with a specific iframe – this would also solve the multitab scenario.

@eliot-akira
Copy link
Collaborator Author

eliot-akira commented Sep 30, 2022

I explored the technique described in the StackOverflow link, and was able to make the service worker recognize if a fetch request is coming from the host window or an iframe.

However, long story short, I couldn't solve the original issue due to a kind of tautological loop: when the service worker wrongly intercepts a request to the host site, at that point nothing is loaded yet - so it's impossible for the worker to get the host window's client ID. At least that's what I think is happening.

I'll describe my attempt below, maybe some of it is still useful.


In the host window, the app calls the service worker when it's ready.

In src/web/app.mjs:

async function init() {

  const registration = await serviceWorkerReady;

  // Send message to get host window client ID
  registration.active.postMessage({ code: 'get-client-id' });

  ...
}

In the service worker, the event callback gets the host's client ID. It was tricky to achieve, because the worker's Clients.matchAll() method is asynchronous.

In src/web/service-worker.mjs:

/**
 * Get host window's client ID to distinguish fetch requests from inside iframe
 */

let windowClientId

const windowClientReady = new Promise((resolve, reject) => {
  self.addEventListener('message', async (event) => {

    if (event.data.code !== 'get-client-id') return;

    const clients = await self.clients.matchAll();

    for (const client of clients) {
      if (client.frameType !== 'top-level') continue;
      windowClientId = client.id;
      resolve();
      console.log( `[ServiceWorker] Got host window client ID: ${ windowClientId }` );
      break;
    }
  });  
});

The fetch event handler must await until the host window client ID is ready, then check it against the event's client ID. But the event callback is expected to be synchronous - so it was no longer possible to skip the intercept with a simple return statement. To get around this, it was necessary to intercept all requests, and manually call fetch( event.request ) for the "skipped" requests coming from host window.

self.addEventListener( 'fetch', async ( event ) => {

  const url = new URL( event.request.url );

  if (!windowClientId) await windowClientReady;

  // Don't process main window resources
  if (event.clientId === windowClientId) {
    console.log( `[ServiceWorker] Ignoring host window request: ${ url.pathname }` );
    return event.respondWith( fetch(event.request) );
  }
  
  ...
});	

Here's where I ran into the Catch-22 situation. When a user visits the host site URL without index.html, the service worker's fetch event callback is triggered - but at this point there is no document loaded, and no app.js to send the get-client-id message, so the worker cannot know the host window's client ID - as far as I can tell.


Perhaps another way to solve this: instead of host window, to get the iframe's client ID and only intercept fetch events with matching client ID.

@adamziel
Copy link
Collaborator

adamziel commented Oct 3, 2022

This is gold, thank you for these great explorations @eliot-akira!

Here's where I ran into the Catch-22 situation. When a user visits the host site URL without index.html, the service worker's fetch event callback is triggered - but at this point there is no document loaded, and no app.js to send the get-client-id message, so the worker cannot know the host window's client ID - as far as I can tell.

I would assume WordPress isn't running when there is no app.js loaded. If so, calling the fetch() and allowing it to return 404 sounds reasonable – what do you think?

@eliot-akira
Copy link
Collaborator Author

eliot-akira commented Oct 3, 2022

This is a tough nut to crack, I'm still trying different angles. The specific situation I haven't been able to solve yet, is distingushing between when the browser fetches the route / to get the host site, compared to when the iframe fetches / for the home page of the WordPress site. So neither of these cases should return 404, I think.

I've observed that sometimes the fetch event has no client ID, which makes it impossible to recognize where it's coming from. I read through the MDN docs, especially about FetchEvent.clientId, but there's no mention of such behavior.

At this point, my thinking is to not depend on event client ID since it's unreliable as far as I can tell. I'm trying a more manual way, by establishing a hand-shake exchange between host app and service worker (and after that the app sets iframe src), so the worker can know which fetch events are coming from the iframe.

@adamziel
Copy link
Collaborator

adamziel commented Oct 13, 2022

@eliot-akira This is a tough one indeed! I propose we bale out of resolving this entirely and just don't the main HTML index.html – perhaps wordpress.html would do? In this scenario requesting the / would:

  • Return a 404 if no service worker was registered – that's fine, we can funnel all activity through wordpress.html.
  • Return WordPress index.php if service worker was registered at some point.

The wordpress.html file could have an "iframe mode" where WP would be rendered in an iframe, and a "standalone mode" where it would only register the service worker and redirect to /. I imagine requesting one of the two modes with a query parameter such as /wordpress.html?mode=standalone.

Why? Because resolving this with index.html adds a lot of complexity that can be avoided. See below.


My perfect flowchart for resolving the root URL / looks like below.

  • Is service worker already registered?
    • No -> request index.html from the server
    • Yes -> is the request coming from a WordPress context (like an iframe)?
      • Yes -> render WordPress's index.php
      • No -> request index.html from the server

However, the implementation quickly gets unwieldy:

  • FetchEvent.clientId is indeed null at times as you mentioned, but we can use FetchEvent.resultingClientId
  • The service would need to check whether the client is a nested-frame or is it top-level
  • However, the check is async and the fetch event listener needs return something synchronous
  • Maybe we can make it wait with FetchEvent.waitUntil()? But we'd need to synchronize that with the actual request handler.
  • Or we could return event.respondWith() and decide there whether to forward the request to the server or to WordPress?

Either way it's more code branches to maintain and more opportunities for a mistake.

adamziel added a commit that referenced this issue Oct 13, 2022
…n the `/` route is requested

Before this PR, requesting `/` could mean two different things:

* Display index.html, or
* Display WordPress's index.php

Deciding which option is correct is hard. This commit removes the need
to decide.

Solves #24
adamziel added a commit that referenced this issue Oct 13, 2022
…n the `/` route is requested

Before this PR, requesting `/` could mean two different things:

* Display index.html, or
* Display WordPress's index.php

Deciding which option is correct is hard. This commit removes the need
to decide.

Solves #24
adamziel added a commit that referenced this issue Oct 13, 2022
…n the `/` route is requested

Before this PR, requesting `/` could mean two different things:

* Display index.html, or
* Display WordPress's index.php

Deciding which option is correct is hard. This commit removes the need
to decide.

Solves #24
adamziel added a commit that referenced this issue Oct 13, 2022
…n the `/` route is requested

Before this PR, requesting `/` could mean two different things:

* Display index.html, or
* Display WordPress's index.php

Deciding which option is correct is hard. This commit removes the need
to decide.

Solves #24
@adamziel
Copy link
Collaborator

I just renamed index.html to wordpress.html in e1235f8

I'm marking this as resolved for now, but I'm happy to reopen if you think it requires more thought @eliot-akira .

@adamziel
Copy link
Collaborator

adamziel commented Oct 13, 2022

Actually, I just realized there can be no standalone mode where the user navigates to /wp-login.php without any wrapping iframe or a script. The WASM instance itself is initialized and kept alive by JavaScript code loaded in an HTML document. Without a tab holding an HTML document there is no running WASM WordPress to generate the response.

@eliot-akira
Copy link
Collaborator Author

Thank you for looking into this question - I think changing the file name from index.html to anything else is a good enough solution to differentiate from the root route /.

If I wanted, I could redirect from a simpler demo site URL to append the HTML file name. But it does "break the illusion" a little, showing the static file's existence outside the WP site's permalinks. Ideally I'd love it to be seamless, like running the original.

In any case, it was a good learning experience for me to dig into this issue, to understand better how the service worker works, how the fetch event handles requests, how to communicate between host and worker process, etc.

It still feels surprising and magical, seeing the whole WordPress stack running in the browser. I'm looking forward to what creative ideas it makes possible, like interactive plugin documentation and tutorials.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants