htmst is a python library for parsing html into AST with positions.
uv add htmst
or
pip install htmst
from htmst import HtmlAst
html = """<span foo="bar">hi</span>"""
ast = HtmlAst(html)
print(ast.root.children[0].tag) # span
print(ast.root.children[0].start.row) # 0
print(ast.root.children[0].start.col) # 0
print(ast.root.children[0].end.row) # 0
print(ast.root.children[0].end.col) # 25
print(ast.root.children[0].attrs[0].name) # foo
print(ast.root.children[0].attrs[0].value) # bar
print(ast.root.children[0].attrs[0].start.row) # 0
print(ast.root.children[0].attrs[0].start.col) # 6
print(ast.root.children[0].attrs[0].end.row) # 0
print(ast.root.children[0].attrs[0].end.col) # 15
DoubleTagNode
: represents double tagsSingleTagNode
: represents single tagsAttrNode
: represents attributesTextNode
: represents textsCommentNode
: represents commentsDoctypeNode
: represents doctypes
Each node has a start
and end
position.
Contributions are welcome! Please read the contributing guidelines for more information.
This project is licensed under the MIT License.