Skip to content

The strip_document option works on BeautifulSoup objects but not Tag objects #209

Open
@chrispy-snps

Description

@chrispy-snps

The default value of the strip_document option is STRIP, which strips leading and trailing whitespace from the Markdown.

If I have a BeautifulSoup object:

import bs4
import markdownify

html = """
<html>
 <body>
  <div>
   <p>hello</p>
  </div>
 </body>
</html>"""
soup = bs4.BeautifulSoup(html, "lxml")

strip_document is applied for the BeautifulSoup object:

print(repr(markdownify.MarkdownConverter().convert_soup(soup)))
# 'hello'

but not for a Tag object:

print(repr(markdownify.MarkdownConverter().convert_soup(soup.find("html"))))
# '\n\n\nhello\n\n'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions