Category

Data Integrity

Using Apache Airflow and the Snowflake Data Warehouse to ingest Flume S3 data

Do you use Apache Flume to stage event-based log files in Amazon S3 before ingesting them in your database? Have you noticed .tmp files scattered throughout S3? Have you wondered what they are and how to deal with them? This article describes a simple solution to this common problem, using the Apache Airflow workflow manager and the Snowflake Data Warehouse.

Senior Staff Engineer