arXiv is an open-access archive for 2 million scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics.
arxiv
python package.
PyMuPDF
python package which transforms PDF files downloaded from the arxiv.org
site into the text format.