Querying arXiv preprints using Airflow
Feb 2, 2020
Alexander Junge
1 minute read

Querying arXiv preprints using Apache Airflow

I experimented with Apache Airflow to schedule hourly workflows fetching recent preprint articles from different arXiv categories via the public arXiv.org REST API. These articles are then stored in a PostgreSQL database via a custom-built fastAPI-based REST API.

The setup looks like this:

The code is fully dockerized and available on GitHub along with more detailed documentation.

comments powered by Disqus