Data ingestion and workflow
Description: Lecture 5 introduces batch and stream ingestion approaches. Once our data is harmonised on a lakehouse, we need to create data artefacts and views to feed our analytic tasks. These artefacts can be created and processed offline following a schedule (i.e., batch) or in real-time (i.e., streaming) depending on the data nature. This lecture introduces both concepts and the production platforms and tools that support them.Department: Departamento de Matemáticas y Estadística - Facultad de Ciencias Exactas y Naturales
Institution: Universidad de Nariño
Date: June 20, 2026
Hours: 5
From: 07:00 am
To: 01:00 pm
Week 3 links
Resources
- Introductory Python course (optional video)
- Apache Kafka documentation (reference)
References
- Zaharia, M., et al. (2016). Apache Spark: a unified engine for big data processing. CACM, 59(11), 56–65.
- Jarrahi, M. H., et al. (2023). The Principles of Data-Centric AI. (Course PDF.)
- Kreps, J., Narkhede, N., & Rao, J. (2011). Kafka: a distributed messaging system for log processing. (Recommended async.)