Data Quality
Description: This lecture explores the crucial aspects of data quality in machine learning. We will cover data cleaning, preprocessing, and transformation techniques essential for preparing datasets for various ML algorithms. The lecture emphasizes practical approaches to handle missing values, outliers, data normalization, feature engineering, and data validation to ensure high-quality input for ML models.Department: Centro de Estudios y Asesorías en Estadística (CEASE)
Institution: Universidad de Nariño
Date: June 07, 2025
Hours: 4
From: 10:00 am
To: 12:00 am
Resources
Books
- Bishop, C. (2009). Pattern Recognition and Machine Learning. Springer
- Deisenroth M. P. et. al. (2020). Mathematics for Machine Learning - Chapter 10
Papers and Reports
- Hoteling, H. (1933). Analysis of a Complex of Statistical Variables into Principal Components
- Tipping, M. E., Bishop C. (1999). Probabilistic Principal Component Analysis
Web
- Meet the Data Quality Dimensions
- Advanced Data Science - Visualisation I
- Advanced Data Science - Visualisation II
- Scikit-learn Documentation
- TensorFlow Data Validation
- Registry of Research Data Repositories
- Potato Disease Dataset
- Chronic Disease Indicators