Diffbot is a suite of ML-based products that make it easy to structure and integrate web data.
Diffbot Extract
doesn’t require any rules to read the content on a page. It uses a computer vision model to classify a page into one of 20 possible types, and then transforms raw HTML markup into JSON. The resulting structured JSON follows a consistent type-based ontology, which makes it easy to extract data from multiple different web sources with the same schema.
See a usage example.