-
-
Notifications
You must be signed in to change notification settings - Fork 163
Floki is less lenient with nested comments than browsers #612
Copy link
Copy link
Open
Labels
Description
HTML doesn’t allow nested comments. However, both Firefox and Chromium are somewhat lenient about that which can result in surprising issues when you parse a document with Floki (I tried this with 0.37.0):
raw_html = """
<!doctype html>
<body>
Before the comment<br>
<!--[if mso | IE]>
<div>
<!-- this is a nested comment -->
</div>
<![endif]-->
After the comment.
</body>
"""
parsed_html = raw_html |> Floki.parse_document!() |> Floki.raw_html()
File.write!("raw-html.html", raw_html)
File.write!("parsed-html.html", parsed_html)raw-html.html looks exactly like the original string:
<!doctype html>
<body>
Before the comment<br>
<!--[if mso | IE]>
<div>
<!-- this is a nested comment -->
</div>
<![endif]-->
After the comment.
</body>But parsed-html.html looks like this:
<body>
Before the comment<br/><!--[if mso | IE]>
<div>
<!-- this is a nested comment -->
<![endif]-->
After the comment.
</body>Floki escapes the > of the outer comment to >. And because browsers are lenient when handling nested comments, this changes the way this file is displayed:
I’m not sure if this could be considered a bug but I did find it somewhat unexpected.
Reactions are currently unavailable

