The Transformer Architecture
Description: This lecture introduces the transformer neural network architecture, which is the architecture of novel Large Language Models (LLMs). We will start formalising the architecture and its training. We will then introduce how to use and tailor LLMs into our ML projects and daily activities for different purposes.Department: Centro de Estudios y Asesorías en Estadística (CEASE)
Institution: Universidad de Nariño
Date: July 12, 2025
Hours: 4
From: 10:00 am
To: 12:00 am
Resources
Books
Papers and Reports
- Vaswani, A., et al. (2017). Attention is all you need. Advances in neural information processing systems
- Devlin, J., et al. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Radford, A., et al. (2019). Language models are unsupervised multitask learners. OpenAI blog
- OpenAI. (2023). GPT-4 Technical Report
- Paleyes. (2025). LLM Performance for Code Generation on Noisy Tasks
- Sendyka. (2025). Prompt Variability Effects On LLM Code Generation