Prolog is ideally suited for reasoning about HTML, XML and SGML documents, because such tree-shaped markup documents can be directly mapped to Prolog terms.
An HTML element with tag T, attributes As and children Cs could be mapped to the Prolog term element(T, As, Cs), and thus become amenable to fast and convenient Prolog-based reasoning.
For instance, the following HTML file, represented as a list of characters:
<html>
<head>
<title>
Hello!
</title>
</head>
<body style="padding-left: 5%; padding-right: 5%">
Hello.
</body>
</html>
can be directly mapped to the Prolog term:
[[element(html, [],
[element(head, [],
[element(title, [], [" Hello!\n "])]),
element(body, [style="padding-left: 5%; padding-right: 5%"],
[" Hello.\n "])])]
roxmltree looks like a useful Rust component to parse XML files and convert them to Prolog terms.
Another approach would be to use Tree-Sitter-based parsers for various languages at once. And it has HTML grammar already, sadly though, no XML yet.
I have filed #596 for HTML.
This is now available via library(sgml).
Most helpful comment
This is now available via
library(sgml).