Skip to content

Conversation

@sisp
Copy link
Contributor

@sisp sisp commented Jul 16, 2024

I've fixed the blog readtime calculation to ignore non-content text, i.e. any text nodes that are inside the following tags are omitted from calculating the readtime:

  • <object>
  • <script>
  • <style>
  • <svg>

The implementation is inspired by the search plugin parser.

Note that I've duplicated the void set from the search plugin. It might be cleaner to deduplicate it, but I wasn't sure where to best move it to import from. If you'd prefer deduplication, I'd appreciate your guidance on where to factor void out, @squidfunk.

Fixes #7367.

@squidfunk
Copy link
Owner

Note that I've duplicated the void set from the search plugin. It might be cleaner to deduplicate it, but I wasn't sure where to best move it to import from. If you'd prefer deduplication, I'd appreciate your guidance on where to factor void out, @squidfunk.

There's no need to integrate the void tags at all, as for them, the data handler is not called. Void tags are tags without children, i.e., leaf components. Please remove the void handling logic, as it does not influence the result.

@squidfunk
Copy link
Owner

Ah wait, no, we might need them to maintain the context. Can you please just import it from the search plugin for now and make a note in the code that we might refactor it at some point? Please don't do the refactoring now, we're heavily working behind the curtains right now and there will be fundamental changes to Material for MkDocs, which is why we currently try to keep things as stable and unchanging as possible.

@squidfunk
Copy link
Owner

... you might check if the _endtag handler is called for void tags. If it is, remove the void logic. If not, keep it. It's been too long since I've worked with Python's HTML SAX parser.

@sisp sisp force-pushed the fix/blog-readtime branch from 22054eb to 4c9c02f Compare July 16, 2024 13:42
@sisp
Copy link
Contributor Author

sisp commented Jul 16, 2024

I've checked: The _endtag is not called for void tags. Now, the void set is imported from the search plugin as you suggested.

@squidfunk
Copy link
Owner

Okay great, thanks for investigating. Note that this is an HTML4 parser, not HTML5, which we chose for speed and package size (as no further dependency is needed), so as I remember, this is why we need to check for void tags by ourselves. We'll check in the future if we can swap this out for an approach that directly operates on the HTML AST, which would be much more efficient, but for now, I think we can keep it.

@squidfunk
Copy link
Owner

From what I can see, this looks great, ready to merge! Thanks for your time!

@squidfunk squidfunk merged commit 6b13c56 into squidfunk:master Jul 16, 2024
@sisp
Copy link
Contributor Author

sisp commented Jul 16, 2024

Perfect, thanks for the helpful review and feedback! 🙏

@sisp sisp deleted the fix/blog-readtime branch July 16, 2024 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Blog readtime includes inline SVG text content

2 participants