GROBID is a machine learning library for extracting, parsing, and re-structuring raw documents. It is designed and expected to be used to parse academic papers, where it works particularly well. Note: if the articles supplied to Grobid are large documents (e.g. dissertations) exceeding a certain number of elements, they might not be processed. This page covers how to use the Grobid to parse articles for LangChain.Documentation Index
Fetch the complete documentation index at: https://docs.langchain.com/llms.txt
Use this file to discover all available pages before exploring further.
Installation
The grobid installation is described in details in https://grobid.readthedocs.io/en/latest/Install-Grobid/. However, it is probably easier and less troublesome to run grobid through a docker container, as documented in the Grobid Docker guide.Use grobid with LangChain
Once grobid is installed and up and running (you can check by accessing it http://localhost:8070), you’re ready to go. You can now use the GrobidParser to produce documentsConnect these docs to Claude, VSCode, and more via MCP for real-time answers.

