HTML

HTML
Prev	Chapter 7. XML DataSource	Next

The XML DataSource can also be used to read HTML. In this mode, the HTML is parsed into well-formed XML that can be queried using XPath expressions. The HTML parser can handle HTTPS pages, pages that are not well-formed - e.g. with missing close tags, or unquoted attributes and will clean the tree so that it can be used for effective data acquisition. HTTPS is supported.

HTML parsing is not available when Subtree Optimization is enabled.