Using Apache Airflow and the Snowflake Data Warehouse to ingest Flume S3 data
Do you use Apache Flume to stage event-based log files in Amazon S3 before ingesting them in your database? Have you noticed .tmp
files scattered throughout S3? Have you wondered what they are and how to deal with them? This article describes a simple solution to this common problem, using the Apache Airflow workflow manager and the Snowflake Data Warehouse.