Session 10 - AI Systems

AI Systems

Christian Cabrera Jojoa

Senior Research Associate and Affiliated Lecturer

Department of Computer Science and Technology

University of Cambridge

chc79@cam.ac.uk

Session 10 - AI Systems

Last Time

Session 10 - AI Systems

The Data Science Process

Data Science Process
Session 10 - AI Systems

Supervised Learning


Given a training set of N example input-output pairs



Each pair was generated by an unknown function


The goal is to discover a function that approximates the true function .

Session 10 - AI Systems

Regression Models

Regression Fit
Linear regression fit.

Linear Regression

Training Dataset

Hypothesis Space: All possible linear functions of continuous-valued inputs and outputs.

Hypothesis:

Loss Function:

Cost Function:

Session 10 - AI Systems

Regression Models

Gradient Descent Algorithm
Gradient Descent Algorithm - Jacopo Bertolotti, CC0, via Wikimedia Commons.

Linear Regression

Analytical Solution:

Gradient Descent Algorithm:


Initialize w randomly
repeat
    for each w[i] in w
        Compute gradient: g = ∇Loss(w[i])
        Update weight:   w[i] = w[i] - α * g
until convergence

Hyperparameters: Learning rate, number of epochs, and batch size.

Session 10 - AI Systems

Multivariate Linear Regression


In these problems, each example is a n-element vector. The hypotheses space H now includes linear functions of multiple continuous-valued inputs and a single continuous output.



We want to find the that best fits the data.


In vector notation:


Where is the feature vector, is the parameters vector, and is a bias term.

Session 10 - AI Systems

Probabilistic Interpretation of Linear Regression


The regression problem is considered as:


We want to maximise the likelihood function of the training data given the model parameters.


Closed-form solution for linear regression:


Where is the design matrix as the collection of training inputs and is a vector of all targets.

Probabilistic Interpretation
Probabilistic Interpretation - (Bishop, 2006).
Session 10 - AI Systems

Linear Basis Function Models


The model becomes:


Where is a vector of basis functions.

Basis Functions
Basis Functions - (Bishop, 2006).
Session 10 - AI Systems

Linear Basis Function Models


If we apply a probabilistic interpretation, we need to maximise the likelihood of:



After a similar process (See Chapter 3 in Bishop, 2006), the loss function becomes:



And the normal equation solution:

Session 10 - AI Systems

Linear Basis Function Models



The design matrix is constructed as:


By using basis functions , we can capture complex, non-linear relationships between the input features and the target variable. The design matrix essentially acts as a bridge, allowing us to apply linear techniques to problems that are inherently non-linear in nature. We can design the matrix and evaluate which design works better using cross-validation. This is essentially feature engineering.

Session 10 - AI Systems

Linear Classifiers

SVM Separating Hyperplanes
SVM Separating Hyperplanes - Cyc, Public domain, via Wikimedia Commons.

A decision boundary is a line (or a surface in higher dimensions) that separates data into classes.


The hypothesis is the result of passing a linear function through a threshold function:


Regression Fit
Threshold Function.

Session 10 - AI Systems

The Perceptron

Perceptron Architecture
Perceptron Architecture.

The perceptron is a linear classifier model (i.e., linear discriminant), with hypothesis space defined by all the functions of the form:


The function is similar to the function previously defined. is given by a step function of the form:


We want to find :

Session 10 - AI Systems

Neural Networks

Feedforward Neural Network:

The overall network function combines these stages. For sigmoidal output unit activation functions, takes the form:


The bias parameters can be absorbed:


are continuous functions. The neural network is differentiable with respect to the parameters .

Two-layer Neural Network
Two-layer Neural Network - (Bishop, 2006).
Session 10 - AI Systems

Neural Networks

Sigmoid Function:


Range:

Derivative:


Logistic Curve

Logistic Curve - Qef, Public domain, via Wikimedia Commons.

Hyperbolic Tangent:


Range:

Derivative:


Hyperbolic Tangent

Hyperbolic Tangent - Geek3, CC BY-SA 3.0 , via Wikimedia Commons.

ReLU (Rectified Linear Unit):


Range:

Derivative:


Ramp Function

Ramp Function - Qef, Public domain, via Wikimedia Commons.
Session 10 - AI Systems

Neural Networks

Training Process:


Given a training set of N example input-output pairs



Each pair was generated by an unknown function :


We want to find a hypothesis that minimises the error function:

Two-layer Neural Network
Two-layer Neural Network - (Bishop, 2006).
Session 10 - AI Systems

Neural Networks

Two-layer Neural Network
Two-layer Neural Network - (Bishop, 2006).

Error Backpropagation Algorithm:

1. Forward Pass: Compute all activations and outputs for an input vector.

2. Error Evaluation: Evaluate the error for all the outputs using:

3. Backward Pass: Backpropagate errors for each hidden unit in the network using:

4. Derivatives Evaluation: Evaluate the derivatives for each parameter using:

Session 10 - AI Systems

Neural Networks

Two-layer Neural Network
Two-layer Neural Network - (Bishop, 2006).

Gradient Descent Update Rule:




Where is the learning rate.

Session 10 - AI Systems

Deep Learning

Deep Neural Network
Deep Neural Network with multiple hidden layers - QuantuMechaniX8, CC0, via Wikimedia Commons
Session 10 - AI Systems

Deep Learning


Vanishing and Exploding Gradients: In deep networks, gradients can become very small (vanishing) or very large (exploding) during backpropagation:


  • Proper Weight Initialization: Xavier/Glorot
  • Batch Normalization: Normalize inputs/outputs
  • Modern Optimizers: Adam, RMSprop with adaptive learning rates
  • Regularisation: Dropout, L2, early stopping, and augmentation techniques
  • Network Architectures: Different architectures for different problems
Deep Neural Network
Deep Neural Network with multiple hidden layers - QuantuMechaniX8, CC0, via Wikimedia Commons
Session 10 - AI Systems

Reinforcement Learning

RL Framework
Reinforcement Learning Framework - Megajuice, CC0, via Wikimedia Commons.

The RL framework is composed of:


  • Agent: The learner and decision maker
  • Environment: The world in which the agent operates
  • State: Current situation of the environment
  • Action: What the agent can do
  • Reward: Feedback from the environment

The environment is stochastic, meaning that the outcomes of actions taken by the agent in each state are not deterministic.

Session 10 - AI Systems

Reinforcement Learning

MDP Process
Markov Decision Process - waldoalvarez, CC BY-SA 4.0 , via Wikimedia Commons.

Markov Decision Process (MDP):

A mathematical framework for modeling sequential decisions problems for fully observable, stochastic environments. The outcomes are partly random and partly under the control of a decision maker.

Session 10 - AI Systems

Reinforcement Learning

RL Framework
Reinforcement Learning Framework - Megajuice, CC0, via Wikimedia Commons.

A MDP is a 4-tuple:



Where:


is a set of states with initial state
is a set of actions in each state
is a transition model that tells the probability of reaching , if the agent is in and performs action
is the reward function that tells the reward for every transition from to through

Session 10 - AI Systems

Reinforcement Learning

RL Framework
Reinforcement Learning Framework - Megajuice, CC0, via Wikimedia Commons.

Model-Based RL Agent

Model-Free RL Agent

  • Knows transition model and reward function
  • Can simulate outcomes before taking actions
  • Value Iteration, Policy Iteration
  • Unknown transition model and reward function
  • Cannot simulate outcomes
  • Q-Learning, DQN
Session 10 - AI Systems

Reinforcement Learning

Policy Iteration

Q-Value

DQN

Value-function

Model-based with guaranteed convergence for finite and discrete problems.

MDP Process
Markov Decision Process - waldoalvarez, CC BY-SA 4.0, via Wikimedia Commons.

Q-function

Model-free and simple for small and discrete problems.

Q-Matrix
Transition Matrix (Q).

Neural Network

Model-free and complex for large and continuous problems.



Deep Neural Network
Deep Neural Network with multiple hidden layers - QuantuMechaniX8, CC0, via Wikimedia Commons
Session 10 - AI Systems

The Transformer Architecture

The main idea is to pay attention to the context of each word in a sentence when modelling language. For example, if context is "Thanks for all the" and we want to know how likely the next word is "fish":



We want to discover the probability distribution over a vocabulary for the next word in a sequence:


where is the sequence of words previous to .

Transformer Architecture
Transformer Architecture - dvgodoy, CC BY 4.0 , via Wikimedia Commons
Session 10 - AI Systems

The Transformer Architecture

The transformer architecture solves this problem by:

  1. Tokenisation: Convert sentence into tokens.
  2. Input and Positional Embedding: Convert input tokens into ordered embedded vectors.
  3. Self-Attention: Determine the relevance of each word to others in the sequence.
  4. Feed-Forward Neural Network: Pass the attention outputs through a feed-forward neural network to consolidate learnt patterns.
  5. Residual Connections and Layer Normalisation: Apply residual connections and layer normalisation to stabilise and improve training.
  6. Output Layer: Use a linear layer followed by a softmax function to generate the final output probabilities.
Transformer Decoder
Transformer Decoder Architecture - (Jurafsky et al., 2025)
Session 10 - AI Systems

The Transformer Architecture

Transformer Decoder
Transformer Decoder Architecture - (Jurafsky et al., 2025)
Session 10 - AI Systems

Large Language Models (LLMs)

  • GPT-3: Known for generating human-like text, it can perform tasks such as translation, question answering, and text completion.
  • BERT: Excels in understanding the context of words in a sentence, making it ideal for tasks like sentiment analysis and named entity recognition.
  • T5 (Text-to-Text Transfer Transformer): Converts all NLP tasks into a text-to-text format, enabling it to handle tasks like summarisation and translation.
  • RoBERTa: An optimised version of BERT, it improves performance on tasks like text classification and language inference.
  • ...
Transformer Architecture
Transformer Architecture - dvgodoy, CC BY 4.0 , via Wikimedia Commons
Session 10 - AI Systems

Large Language Models (LLMs)

Training FLOP
Large-Scale AI Models Training - Epoch AI, CC BY 4.0 , via Wikimedia Commons
Session 10 - AI Systems

Large Language Models (LLMs)

Training Cost
Estimated Cost - Stanford Institute for Human-Centered Artificial Intelligence (permission obtained by email from the AI index research manager), CC BY-SA 4.0 , via Wikimedia Commons
Session 10 - AI Systems

Large Language Models (LLMs)

It is similar to a neural network training:

  1. Data Collection: Gather a large and relevant dataset for the specific domain or task.
  2. Pre-processing: Clean and pre-process the data to ensure it is in a suitable format for training.
  3. Model Selection: Choose a pre-trained LLM that is most suitable for the task at hand.
  4. Supervised Learning: Prompt engineering, error calculation, and adjusting weights using gradient descent and backpropagation.
  5. Evaluation: Assess the performance of the fine-tuned model using appropriate metrics and validation datasets.
  6. Deployment: Deploy the fine-tuned model for use in real-world applications.
Fine Tuning Process
Fine Tuning Process
Session 10 - AI Systems

Large Language Models (LLMs)

The process combines approaches from symbolic AI and databases:

  1. Data Collection: Gather a knowledge base that the RAG system can query.
  2. Pre-processing: Organise the knowledge base to ensure efficient retrieval and LLM integration.
  3. Model Selection: Choose a pre-trained LLM that can integrate with the retrieval system.
  4. Retrieval Integration: Using the knowledge base and the LLM in response to queries.
  5. Evaluation: Assess the performance of the RAG system by using appropriate metrics.
  6. Deployment: Deploy the RAG system for real-time applications, ensuring it can access and retrieve information efficiently.
RAG Process
RAG Process
Session 10 - AI Systems

Large Language Models (LLMs)

Typical prompt-engineering workflow:

  1. Task definition: Specify what output format and style you need.
  2. Baseline prompt: Write a clear instruction (zero-shot) or add 1-5 examples (few-shot).
  3. Iterate and test: Evaluate outputs, add system messages, or reorder examples to reduce errors and bias.
  4. Guardrails: Include refusals, safety clauses, or value alignment statements.
  5. Automation: Use prompt templates or tools like LangChain/LlamaIndex to inject dynamic context.
  6. Deployment: Store the prompt with version control and monitor performance over time.
Prompt Engineering Process
Prompt Engineering Process
Session 10 - AI Systems

The Data Science Process

Data Science Process
Session 10 - AI Systems

Machine Learning Pipeline

Data Assess Pipeline
Session 10 - AI Systems

AI Systems

Session 10 - AI Systems

AI Systems

Session 10 - AI Systems

AI Systems

Session 10 - AI Systems

AI Systems

AI System
Session 10 - AI Systems

AI Systems

AI System
Session 10 - AI Systems

AI Systems

AI System
Session 10 - AI Systems

AI Systems

Service Placement Problem

Dynamic Service Placement in Edge Computing


Session 10 - AI Systems

AI Systems

Service Placement Problem

Dynamic Service Placement in Edge Computing


  • Edge servers are located close to end users, allowing for local data processing.
  • Services run on edge servers, which have limited resources.
  • The challenge is to determine the optimal allocation of services and edge servers to minimize latency while considering resource constraints.
  • This challenge is referred to as the Service Placement Problem.
Session 10 - AI Systems

AI Systems

Service Placement Problem

Dynamic Service Placement in Edge Computing


Objective Functions:


Subject to:

Session 10 - AI Systems

AI Systems

Pareto

Dynamic Service Placement in Edge Computing


Objective Functions:


Subject to:

Session 10 - AI Systems

AI Systems

Ant-Colony Optimisation

Dynamic Service Placement in Edge Computing


Objective Functions:


Subject to:

Session 10 - AI Systems

AI Systems

Ant-Colony Optimisation

Ant Colony Optimization Algorithm



Where: is the probability of moving from node to node , is the pheromone level on edge at time , is the heuristic information (e.g., inverse of distance), and are parameters to control the influence of pheromone and heuristic information, is the pheromone evaporation rate, is change in pheromone level.

Session 10 - AI Systems

AI Systems

Ant-Colony Optimisation
Session 10 - AI Systems

AI Systems

Ant-Colony Optimisation
Session 10 - AI Systems

AI Systems

Ant-Colony Optimisation
Session 10 - AI Systems

AI Systems

Ant-Colony Optimisation
Session 10 - AI Systems

AI Systems

Ant-Colony Optimisation
Session 10 - AI Systems

AI Systems

Ant-Colony Optimisation
Session 10 - AI Systems

AI Systems

Ant-Colony Optimisation
Session 10 - AI Systems

AI Systems

Ant-Colony Optimisation
Session 10 - AI Systems

AI Systems

Ant-Colony Optimisation
Session 10 - AI Systems

AI Systems

Ant-Colony Optimisation
Session 10 - AI Systems

AI Systems

Ant-Colony Optimisation
Session 10 - AI Systems

AI Systems

Ant-Colony Optimisation
Session 10 - AI Systems

AI Systems

Ant-Colony Optimisation
Session 10 - AI Systems

AI Systems

Ant-Colony Optimisation
Session 10 - AI Systems

AI Systems

Ant-Colony Optimisation
Session 10 - AI Systems

AI Systems

Ant-Colony Optimisation

This execution time does not suit low-latency requirements, but that is how ACO is designed.

Session 10 - AI Systems

AI Systems

Ant-Colony Optimisation

This execution time does not suit low-latency requirements, but that is how ACO is designed.

Session 10 - AI Systems

AI Systems


We should analyse the problem first:

Session 10 - AI Systems

AI Systems


We should analyse the problem first:

Variables we cannot reduce

  • Number of services
  • Number of iterations
  • Number of ants

Session 10 - AI Systems

AI Systems


We should analyse the problem first:

Variables we cannot reduce

  • Number of services
  • Number of iterations
  • Number of ants

We can reduce the number of servers, how?

We can pre-select edge servers by predicting user locations.

Session 10 - AI Systems

AI Systems


We should analyse the problem first:

Variables we cannot reduce

  • Number of services
  • Number of iterations
  • Number of ants

We can reduce the number of servers, how?

We can pre-select edge servers by predicting user locations.

ACO Smart City
Session 10 - AI Systems

AI Systems

ACO Smart City
ACO Smart City
Session 10 - AI Systems

AI Systems

MAACO Algorithm
Session 10 - AI Systems

AI Systems

Selecting edge servers close to current and future users' location. We used two approaches that cluster historical trips and use these clusters to predict the next link in the user's path:

  • Bayesian Classifier
  • Hidden Markov Model
MAACO Algorithm
Session 10 - AI Systems

AI Systems

MAACO results
Session 10 - AI Systems

AI Systems


Bayesian Classifier


Hidden Markov Model

Session 10 - AI Systems

AI Systems


Bayesian Classifier

  • Transition matrix depends on the number of streets in a city.

Hidden Markov Model

  • Frequency matrix depends on the number of streets in a city.
Session 10 - AI Systems

AI Systems


Bayesian Classifier

  • Transition matrix depends on the number of streets in a city.
  • A lot of data (i.e., trips) are needed to train the model.
  • Training time is now an issue!
  • We assumed a limited number of streets in our work.

Hidden Markov Model

  • Frequency matrix depends on the number of streets in a city.
  • A lot of data (i.e., trips) are needed to train the model.
  • Training time is now an issue!
  • We assumed a limited number of streets in our work.
Session 10 - AI Systems

AI Systems


Bayesian Classifier

  • Transition matrix depends on the number of streets in a city.
  • A lot of data (i.e., trips) are needed to train the model.
  • Training time is now an issue!
  • We assumed a limited number of streets in our work.

Hidden Markov Model

  • Frequency matrix depends on the number of streets in a city.
  • A lot of data (i.e., trips) are needed to train the model.
  • Training time is now an issue!
  • We assumed a limited number of streets in our work.

Again, new design decisions are needed to deploy these algorithms in the real-world.

Session 10 - AI Systems

AI Systems

Decentralised Deployment
Session 10 - AI Systems

AI Systems


Hardware considerations:

Decentralised Deployment
Session 10 - AI Systems

AI Systems


Hardware considerations:

  • Data Collection: Ensure sufficient storage capacity for large datasets and high-speed data transfer capabilities.
  • Model Training: Invest in powerful GPUs or TPUs to handle intensive computations and reduce training time.
  • Model Deployment: Consider edge devices for real-time processing and scalability of the deployment infrastructure.
  • Maintenance and Updates: Plan for hardware upgrades and maintenance to accommodate evolving model requirements.
Decentralised Deployment
Session 10 - AI Systems

AI Systems


Software considerations:

Decentralised Deployment
Session 10 - AI Systems

AI Systems


Software considerations:

  • Data Management: Implement efficient data preprocessing and cleaning pipelines to ensure high-quality input for models.
  • Model Development: Utilize frameworks like TensorFlow or PyTorch for building and experimenting with different model architectures.
  • Version Control: Use tools like Git to manage code versions and collaborate effectively with team members.
  • Continuous Integration/Continuous Deployment (CI/CD): Set up automated testing and deployment pipelines to streamline updates and ensure reliability.
  • Scalability: Design software architecture to support scaling, such as using microservices or serverless computing for flexible resource management.
  • Security: Implement robust security measures to protect data privacy and model integrity.
Decentralised Deployment
Session 10 - AI Systems

AI as a Service

Session 10 - AI Systems

AI as a Service

Session 10 - AI Systems

AI as a Service

AI System
Session 10 - AI Systems

AI as a Service

AI System
Session 10 - AI Systems

AI as a Service


SOA is a design pattern in which services are provided between components, through a communication protocol over a network.

AI System
Session 10 - AI Systems

AI as a Service


SOA is a design pattern in which services are provided between components, through a communication protocol over a network.


Microservices are an architectural style that structures an application as a collection of small, autonomous services. Each microservice is self-contained and implements a business capability.

AI System
Session 10 - AI Systems

AI as a Service


SOA is a design pattern in which services are provided between components, through a communication protocol over a network.


Microservices are an architectural style that structures an application as a collection of small, autonomous services. Each microservice is self-contained and implements a business capability.


The concept of "Everything as a Service" (XaaS) extends the principles of SOA and microservices by offering comprehensive services over the internet. XaaS encompasses a wide range of services, including infrastructure, platforms, and software.

AI System
Session 10 - AI Systems

AI as a Service


AI as a Service (AIaaS) enables us to access and expose AI capabilities over the internet. We can integrate AI tools such as machine learning models, natural language processing, and computer vision into our applications leveraging SOA and microservices features.

AI System
Session 10 - AI Systems

AI as a Service


from flask import Flask, request, jsonify
app = Flask(__name__)
class SentimentAnalysisService:
    def __init__(self, model):
        self.model = model

    def analyze_sentiment(self, text):
        sentiment_score = self.model.predict(text)
        if sentiment_score > 0.5:
            return "Positive"
        elif sentiment_score < -0.5:
            return "Negative"
        else:
            return "Neutral"
...
@app.route('/analyze', methods=['POST'])
def analyze():
    data = request.get_json()
    text_to_analyze = data.get('text', '')
    sentiment = service.analyze_sentiment(text_to_analyze)
    return jsonify({'sentiment': sentiment})
...
AI System
Session 10 - AI Systems

MLOps

Session 10 - AI Systems

MLOps

Session 10 - AI Systems

MLOps

Data Assess Pipeline
Session 10 - AI Systems

MLOps

MLOps
MLOps - Cmbreuel, CC BY-SA 4.0 , via Wikimedia Commons.

MLOps is a set of practices and tools that support deploying and maintaining ML models in production reliably and efficiently. The goal is to automate and streamline the ML pipeline. These practices and tools include all the pipeline stages from data collection, model training, and deployment to monitoring and governance. We aim to ensure that ML models are robust, scalable, and continuously delivering value.

Session 10 - AI Systems

MLOps

MLOps
MLOps - Cmbreuel, CC BY-SA 4.0 , via Wikimedia Commons.
  • Automated data collection
  • Automated model training and validation
  • Continuous integration and continuous deployment
  • Monitoring and logging
  • Governance and compliance
  • Scalability and reliability
Session 10 - AI Systems

MLOps

MLOps
MLOps - Cmbreuel, CC BY-SA 4.0 , via Wikimedia Commons.
import mlflow
import logging
from datetime import datetime

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class ModelMonitor:
    def __init__(self, model_name):
        self.model_name = model_name
        mlflow.set_tracking_uri("http://localhost:5000")
    def log_prediction(self, input_data, prediction, 
                      actual=None, model_version="1.0"):
        """Log model predictions for monitoring"""
        with mlflow.start_run():
            mlflow.log_params({
                "input_size": len(input_data),
                "model_version": model_version,
                "timestamp": datetime.now().isoformat()
            })
        
            mlflow.log_metric("prediction", prediction)
        
            if actual is not None:
                mlflow.log_metric("actual", actual)
                mlflow.log_metric("error", abs(prediction - actual))
                
            logger.info(f"Prediction logged: {prediction}")
            
    def monitor_drift(self, current_stats, baseline_stats):
        """Monitor for data drift"""
        drift_score = self.calculate_drift(current_stats, baseline_stats)
        mlflow.log_metric("drift_score", drift_score)
        if drift_score > 0.1:  # Threshold
            logger.warning(f"Data drift detected: {drift_score}")
Session 10 - AI Systems

MLOps

Deep Neural Network
Deep Neural Network with multiple hidden layers - QuantuMechaniX8, CC0, via Wikimedia Commons
import mlflow
import logging
from datetime import datetime

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class ModelMonitor:
    def __init__(self, model_name):
        self.model_name = model_name
        mlflow.set_tracking_uri("http://localhost:5000")
    def log_prediction(self, input_data, prediction, 
                      actual=None, model_version="1.0"):
        """Log model predictions for monitoring"""
        with mlflow.start_run():
            mlflow.log_params({
                "input_size": len(input_data),
                "model_version": model_version,
                "timestamp": datetime.now().isoformat()
            })
        
            mlflow.log_metric("prediction", prediction)
        
            if actual is not None:
                mlflow.log_metric("actual", actual)
                mlflow.log_metric("error", abs(prediction - actual))
                
            logger.info(f"Prediction logged: {prediction}")
            
    def monitor_drift(self, current_stats, baseline_stats):
        """Monitor for data drift"""
        drift_score = self.calculate_drift(current_stats, baseline_stats)
        mlflow.log_metric("drift_score", drift_score)
        if drift_score > 0.1:  # Threshold
            logger.warning(f"Data drift detected: {drift_score}")
Session 10 - AI Systems

MLOps

Deep Neural Network
Deep Neural Network with multiple hidden layers - QuantuMechaniX8, CC0, via Wikimedia Commons
DOA Architecture
Session 10 - AI Systems

MLOps

RAG Process
RAG Process
DOA Architecture
Session 10 - AI Systems

MLOps

DOA Debugger
Session 10 - AI Systems

Conclusions

Session 10 - AI Systems

Conclusions

Data Assess Pipeline
Session 10 - AI Systems

Conclusions

Overview

  • AI Systems
  • AI as a Service
  • MLOps
Session 10 - AI Systems

Conclusions

Overview

  • AI Systems
  • AI as a Service
  • MLOps

Course Summary

  • Machine Learning Context
  • Machine Learning Definition
  • The Problem First
  • Data Orientation
  • Data Quality
  • Supervised and Reinforcement Learning
  • Large Language Models
  • AI Systems
Session 10 - AI Systems

AI History - Machine Learning Age (2001 - present)

1940 1950 1960 1970 1980 1990 2000 2010 2020 2030 First AI Winter (1974-1980) Second AI Winter (1987-1994) Big Data (2000-2012) Artificial Neuron (McCulloch & Pitts, 1943) Information Theory (Shannon, 1948) Cybernetics (Wiener, 1948) Updating Rule (Hebbian, 1949) Computing Machinery and Intelligence (Turing, 1950) SNARC (Minsky, 1951) AI Term (Dartmouth Workshop, 1956) GPS (Newell & Simon, 1957) Advice Taker (McCarthy, 1958) Back-Propagation (Kelley, 1960) Perceptrons (Rosenblatt, 1962) ELIZA (MIT, 1966) ALPAC Report (USA, 1966) The DENDRAL (Buchanan, 1969) Perceptrons Book (Minsky & Papert, 1969) PROLOG (1972) MYCIN (Stanford, 1972) Lighthill Report (UK, 1973) FRAMES (1975) Hopfield net (1982) R1 (McDermott, 1982) Parallel Distributed Processing (Rumelhart & McClelland, 1986) Bayesian Networks (Pearls, 1988) Reinforcement Learning (Sutton, 1988) Image Recognition (LeCun et al., 1990) Deep Blue beats Kasparov (IBM, 1997) Deep Learning (Hinton, 2006) Watson wins Jeopardy (2011) AlexNet (Krizhevsky, 2012) GANs (Goodfellow, 2014) AlphaGo beats Lee Sedol (DeepMind, 2016) Transformer (Vaswani, 2017) AlphaFold (DeepMind, 2018) GPT-1 (OpenAI, 2020) BERT (Google, 2019) Chinchilla (DeepMind, 2022) ChatGPT (OpenAI, 2022) LLaMA (Meta AI, 2023) Claude 2 (Anthropic, 2023) phi-3 (Microsoft, 2024) Gemini 1.5 (Google DeepMind, 2024) Qwen3 (Alibaba, 2025) R1 (DeepSeek) 2025)
Session 10 - AI Systems

ML Definition

Our ML projects must have a purpose...


Risks
Session 10 - AI Systems

The ML Adoption Process

AI Adoption
Session 10 - AI Systems

Many Thanks!

chc79@cam.ac.uk

_script: true

This script will only execute in HTML slides

_script: true