-
Introduction to Big Data, methodology, and ecosystems
Date: June 06, 2026
Description: Lecture 1 introduces the course and how we will work along it. It also introduces the Big Data concept, history, context, applications, and ecosystems. We present the problem first principle and a data science methodology that will drive our work. We start implementing the first stage of the methodology, making data available.
-
Ethics, privacy, and foundations of data governance
Date: June 06, 2026
Description: Lecture 2 follows the first access to GEIH by asking what responsible use of that data requires. We treat harm, consent, and fairness not as a final checklist but as requirements that shape every technical choice in Big Data projects. The readings and a guided ethics audit on survey and map data prepare the accountable-practice frame for storage, processing, and analytics ahead.
-
Data storage and management
Date: June 13, 2026
Description: Lecture 3 introduces data storage and its evolution. We reflect about the storage need and move from relational and non-relational storage concepts to current data architectures that support big data processing. We present the lakehouse concept and how to build it.
-
Data processing and analysis
Date: June 13, 2026
Description: Lecture 4 introduces data processing. Once data is stored in a harmonised structure we need methods for efficient access and analysis. This lecture presents big data processing approaches and engines that scale like the MapReduce algorithm working on top of pandas, DuckDB, and Polars.
-
Data ingestion and workflow
Date: June 20, 2026
Description: Lecture 5 introduces batch and stream ingestion approaches. Once our data is harmonised on a lakehouse, we need to create data artefacts and views to feed our analytic tasks. These artefacts can be created and processed offline following a schedule (i.e., batch) or in real-time (i.e., streaming) depending on the data nature. This lecture introduces both concepts and the production platforms and tools that support them.
-
Analytics and visualisation
Date: June 20, 2026
Description: Lecture 6 presents analytics and visualisation techniques. Once we have processed, harmonised, and ingested our data, the next step is to use to solve the data problems at hand (i.e., address). Visualisation tools support decision makers by presenting data in formats that are easier to understand and analyse in context (e.g., dashboards). More advanced analytics are possible when probabilistic models support prediction or classification processes.