Data processing and analysis
Description: Lecture 4 introduces data processing. Once data is stored in a harmonised structure we need methods for efficient access and analysis. This lecture presents big data processing approaches and engines that scale like the MapReduce algorithm working on top of pandas, DuckDB, and Polars.Department: Departamento de Matemáticas y Estadística - Facultad de Ciencias Exactas y Naturales
Institution: Universidad de Nariño
Date: June 13, 2026
Hours: 5
From: 07:00 am
To: 01:00 pm
Week 2 links
Resources
- Introductory Python course (optional video)
- Introduction to data management (optional video)
References
- Zuboff, S. (2019). The age of surveillance capitalism (Chapter 3). PublicAffairs. (Course PDF — same as week 1.)
- Dwork, C. (2006). Differential privacy. ICALP 2006 (LNCS 4052).
- Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1), 107–113.
- Armbrust, M., et al. (2021). Lakehouse: A new generation of open platforms. CIDR ’21.