The transition model describes the outcome of each action in each state. Since the outcome is stochastic, we write:
Transitions are Markovian: The probability of reaching
Uncertainty once again brings MDPs closer to reality when compared against deterministic approaches.
From every transition the agent receives a reward:
The agent wants to maximise the sum of the received rewards (i.e., utility function):
The utility function
The solution for this problem is called a policy, which specifies what the agent should do for any state that the agent might reach. A policy is a mapping from states to actions that tells the agent what to do in each state:
Deterministic Policy:
Stochastic Policy:
where
The quality of the policy in a given state is measured by the expected utility of the possible environment histories generated by that policy. We can compute the utility of state sequences using additive (discounted) rewards as follows:
where
The expected utility of executing the policy
We can compare policies at a given state using their expected utilities:
The goal is to select the policy
The policy
The utility function allows the agent to select actions by using the principle of maximum expected utility. The agent chooses the action that maximises the reward for the next step plus the expected discounted utility of the subsequent step:
The utility of a state is the expected reward for the next transition plus the discounted utility of the next state, assuming that the agent chooses the optimal action. The utility of a state
is given by:
This is called the Bellman Equation, after Richard Bellman (1957).
Another important quantity is the action-utility function or Q-function, which is the expected utility of taking a given action in a given state:
The Q-function tells us how good it is to take action
The optimal policy can be extracted from
Model-Based RL Agent
Model-Free RL Agent
Policy Iteration
Q-Value
DQN
Value-function
Model-based with guaranteed convergence for finite and discrete problems.
Q-function
Model-free and simple for small and discrete problems.
Neural Network
Model-free and complex for large and continuous problems.
DQN Exercise Scenario
The Transformer is a deep neural network architecture based on the multi-head attention mechanism introduced by researchers at Google (Vaswani et al., 2017). The original goal was to improve machine learning translation tasks based on language modelling.
In Natural Language Processing (NLP), language modelling includes machine learning models (i.e., deep learning) to predict the next token in a sentence.
The main idea is to pay attention to the context of each word in a sentence when modelling language. For example, if context is "Thanks for all the" and we want to know how likely the next word is "fish":
The main idea is to pay attention to the context of each word in a sentence when modelling language. For example, if context is "Thanks for all the" and we want to know how likely the next word is "fish":
We want to discover the probability distribution over a vocabulary
where
The transformer architecture solves this problem by:
1. Tokenisation: Convert sentence into tokens:
1. Tokenisation: Convert sentence into tokens:
For example, consider the sentence: "So long and thanks for".
Tokenisation of this sentence would result in the following tokens:
Each word in the sentence is treated as an individual token.
2. Input and Positional Embedding: Convert input tokens into ordered embedded vectors:
2. Input and Positional Embedding: Convert input tokens into ordered embedded vectors:
Consider the tokens from the previous example: "So", "long", "and", "thanks", "for". Each token is converted into a vector using an embedding matrix. For instance:
The embedding matrix has as many rows as words in a predefined vocabulary and as many columns as dimensions describing a word.
2. Input and Positional Embedding: Convert input tokens into ordered embedded vectors:
Positional encoding is then added to these vectors to incorporate the order of the tokens. For example:
Positional embeddings can be defined randomly. Each position having a random representation.
The resulting vectors are used as input to the transformer model, capturing both the meaning and position of each word. These are updated at training.
3. Self-Attention: Determine the relevance of each word to others in the sequence:
3. Self-Attention: Determine the relevance of each word to others in the sequence:
The meaning of a word represented by the embeddings is influenced by previous words. We need a mechanism (i.e., head) to transform the initial meaning of the words accordingly:
3. Self-Attention: Determine the relevance of each word to others in the sequence:
The meaning of a word represented by the embeddings is influenced by previous words. We need a mechanism (i.e., head) to transform the initial meaning of the words accordingly:
In the self-attention mechanism, each input embedding can play three distinct roles: query, key, and value.
We define three matrices to project each input into a representation of its role:
3. Self-Attention: Determine the relevance of each word to others in the sequence:
The meaning of a word represented by the embeddings is influenced by previous words. We need a mechanism (i.e., head) to transform the initial meaning of the words accordingly:
In the self-attention mechanism, each input embedding can play three distinct roles: query, key, and value.
We define three matrices to project each input into a representation of its role:
3. Self-Attention: Determine the relevance of each word to others in the sequence:
The meaning of a word represented by the embeddings is influenced by previous words. We need a mechanism (i.e., head) to transform the initial meaning of the words accordingly:
3. Self-Attention: Determine the relevance of each word to others in the sequence:
This is a multi-head attention mechanism where each head has its own set of key, query, and value matrices:
Each head focuses on different aspects of the language. One head can focus on the relationship between adjectives and nouns, another head can focus on the relation between verbs and subjects. These relationships transform the meaning of the input and are learnt from the data as model's parameters. The additional meaning is added to the original input as a residual connection.
4. Feed-Forward Neural Network: Pass the attention outputs through a feed-forward neural network to consolidate learnt patterns:
4. Feed-Forward Neural Network: Pass the attention outputs through a feed-forward neural network to consolidate learnt patterns:
Fully connected two-layer network
The input
5. Residual Connections and Layer Normalisation:
5. Residual Connections and Layer Normalisation:
Residual connections are used at different stages of the process to retain what the word originally meant whilst enriching it with context.
5. Residual Connections and Layer Normalisation:
Residual connections are used at different stages of the process to retain what the word originally meant whilst enriching it with context.
Layer normalisation is applied to keep the parameter values in a range that facilitate gradient descent.
In the equation above,
6. Output Layer: Use a linear layer followed by a softmax function to generate the final output probabilities:
6. Output Layer: Use a linear layer followed by a softmax function to generate the final output probabilities:
The linear layer applies a learnt weight matrix to the final hidden state, decoding the high-dimensional representation of the input sequence to a vector of logits, one for each possible output token.
6. Output Layer: Use a linear layer followed by a softmax function to generate the final output probabilities:
The linear layer applies a learnt weight matrix to the final hidden state, decoding the high-dimensional representation of the input sequence to a vector of logits, one for each possible output token.
The softmax function is applied to these logits to convert them into probabilities. The softmax function ensures that the output values are between 0 and 1 and that they sum up to 1, making them interpretable as probabilities. This step creates a probability distribution over the vocabulary.
6. Output Layer: Use a linear layer followed by a softmax function to generate the final output probabilities:
The linear layer applies a learnt weight matrix to the final hidden state, decoding the high-dimensional representation of the input sequence to a vector of logits, one for each possible output token.
The softmax function is applied to these logits to convert them into probabilities. The softmax function ensures that the output values are between 0 and 1 and that they sum up to 1, making them interpretable as probabilities. This step creates a probability distribution over the vocabulary.
Training Process:
The training process of a Transformer model involves:
Training Process:
The training process of a Transformer model involves:
Training Process:
The training process of a Transformer model involves:
Throughout this process, various techniques such as dropout and learning rate scheduling may be employed to improve model performance and prevent overfitting.
Training Process:
The training process of a Transformer model can involve RL techniques for specific purposes:
Large Language Models (LLMs) are AI models designed to understand, generate, and manipulate human language. They are built using deep learning techniques and are usually based on the Transformer architecture and are trained on vast amounts of data to capture human language complexity. LLMs can perform a wide range of language tasks (e.g., text generation, classification, etc.).
Fine-tuning continues the training of a pre-trained LLM (e.g., GPT-3, BERT, etc.) to perform tasks on a particular domain (e.g., healthcare).
It is similar to a neural network training:
It is similar to a neural network training:
Retrieval-Augmented Generation (RAG) is an alternative to fine-tuning that combines pre-trained LLMs with external knowledge sources. Instead of adapting the model to a specific domain, RAG retrieves relevant information from a database or knowledge base to enhance the model's responses in real-time.
The process combines approaches from symbolic AI and databases:
The process combines approaches from symbolic AI and databases:
Prompt Engineering is a lightweight alternative to fine-tuning and RAG. Instead of changing model weights or building a retrieval pipeline, we craft instructions, examples, and constraints (the “prompt”) so that a frozen LLM performs the desired task.
Typical prompt-engineering workflow:
expected_format = """
Return your answer in JSON with the following keys:
{
"title": string, # concise headline (≤ 12 words)
"summary": [string, ...] # 3–5 bullet points
}
"""
article = """<ARTICLE TEXT HERE>"""
prompt = f"""
You are a helpful assistant.
TASK: Summarise the article below.
OUTPUT FORMAT (baseline)
{expected_format}
ARTICLE
""" + article
import gemini
gemini.api_key = "YOUR_API_KEY"
response = gemini.Completion.create(
engine="gemini-001",
prompt=prompt,
max_tokens=150,
temperature=0.7
)
print(response.choices[0].text.strip())
Agentic AI refers to systems that possess the capability to make autonomous decisions and take actions to achieve specific goals. This concept appeared with the emergence of LLMs and Generative AI.
AI Agents have existed for decades, with active research communities and open challenges:
AI Agents have existed for decades, with active research communities and open challenges:
The difference is that LLM-based agents make the decision in Agentic AI frameworks.
AI Agents have existed for decades, with active research communities and open challenges:
The inclusion of LLM-based agents exacerbates some of these challenges.
How sensitive are LLMs to changes in prompts when generating code?
How sensitive are LLMs to changes in prompts when generating code?
'''
Write Python code to solve the following problem:
Given two arrays nums1 and nums2 of size m and n
respectively, return the medians of the two
arrays. The overall run time complexity should be
O(log (m+n)).
Constraints:
0 <= m <= 1000
0 <= n <= 1000
1 <= m + n <= 2000
-106 <= nums1[i], nums2[i] <= 106
'''
How sensitive are LLMs to changes in prompts when generating code?
'''
Write Python code to solve the following problem:
Given two arrays nums1 and nums2 of size m and n
respectively, return the medians of the two
arrays. The overall run time complexity should be
O(log (m+n)).
Constraints:
0 <= m <= 1000
0 <= n <= 1000
1 <= m + n <= 2000
-106 <= nums1[i], nums2[i] <= 106
'''
One change in the prompt was obfuscating the text by adding random noise:
'''
Write Python code to solve the following problem:
hJFFm two a8Gh4E se5Zuz GIJe! and mj<Cq of q&Xs m
and n %FE)2Xt(G#Oy, t$5 tDh the k$rJQH of the two
EPEˆ@W xGeX %E. The (gSFq<: run F(K@ DkN(;ss9r7 W
Bij>v be O (log (m + n) ).
K@jb$T = = n
0 <= m <= w000
0 <= n <= w000
1 <= m + n <= 1000
- 106 <= GHnZ@ [i ], jk,e@ [i] <= 106
'''
One change in the prompt was obfuscating the text by adding random noise:
'''
Write Python code to solve the following problem:
hJFFm two a8Gh4E se5Zuz GIJe! and mj<Cq of q&Xs m
and n %FE)2Xt(G#Oy, t$5 tDh the k$rJQH of the two
EPEˆ@W xGeX %E. The (gSFq<: run F(K@ DkN(;ss9r7 W
Bij>v be O (log (m + n) ).
K@jb$T = = n
0 <= m <= w000
0 <= n <= w000
1 <= m + n <= 1000
- 106 <= GHnZ@ [i ], jk,e@ [i] <= 106
'''
'''
Write Python code to solve the following problem:
hJFFm two a8Gh4E se5Zuz GIJe! and mj<Cq of q&Xs m
and n %FE)2Xt(G#Oy, t$5 tDh the k$rJQH of the two
EPEˆ@W xGeX %E. The (gSFq<: run F(K@ DkN(;ss9r7 W
Bij>v be O (log (m + n) ).
K@jb$T = = n
0 <= m <= w000
0 <= n <= w000
1 <= m + n <= 1000
- 106 <= GHnZ@ [i ], jk,e@ [i] <= 106
'''
'''
Write Python code to solve the following problem:
hJFFm two a8Gh4E se5Zuz GIJe! and mj<Cq of q&Xs m
and n %FE)2Xt(G#Oy, t$5 tDh the k$rJQH of the two
EPEˆ@W xGeX %E. The (gSFq<: run F(K@ DkN(;ss9r7 W
Bij>v be O (log (m + n) ).
K@jb$T = = n
0 <= m <= w000
0 <= n <= w000
1 <= m + n <= 1000
- 106 <= GHnZ@ [i ], jk,e@ [i] <= 106
'''
'''
Write Python code to solve the following problem:
hJFFm two a8Gh4E se5Zuz GIJe! and mj<Cq of q&Xs m
and n %FE)2Xt(G#Oy, t$5 tDh the k$rJQH of the two
EPEˆ@W xGeX %E. The (gSFq<: run F(K@ DkN(;ss9r7 W
Bij>v be O (log (m + n) ).
K@jb$T = = n
0 <= m <= w000
0 <= n <= w000
1 <= m + n <= 1000
- 106 <= GHnZ@ [i ], jk,e@ [i] <= 106
'''
What can we conclude?
What can we conclude?
What can we conclude?
What can we conclude?
The actual conclusions are a bit more boring:
The actual conclusions are a bit more boring:
The actual conclusions are a bit more boring:
The actual conclusions are a bit more boring:
_script: true
This script will only execute in HTML slides
_script: true