Policy Factorisation decomposes the agent's policy into reasoning and action (Wei et al., 2026).
LLM-based policies produce z in natural language. Is that a particular feature of Agentic AI?
{
"actor": "agent_3",
"kind": "agent_update",
"payload": {
"decision": {
"response": {
"choice": {"status": "act", "action": 1},
"reasoning": {
"confidence": 0.8,
"source": "llm",
"text": "Moving left is the least
explored direction...",
"tool_steps": [{"kind": "tool",
"name": "llm", "elapsed_s": 1.24}]
}
},
"explanation": "Moving left — least explored."
}
}
}
LLM record: action + observable reasoning (z).
{
"actor": "agent_3",
"kind": "agent_update",
"payload": {
"decision": {
"response": {
"choice": {
"status": "abstain",
"action": null
},
"reasoning": {
"confidence": 0.0,
"source": "llm",
"text": "All surrounding cells explored.
Cannot determine best move."
}
},
"explanation": "Abstained: low confidence."
}
}
}
The "I Don't Know" Function
The Consistent Reasoning Paradox (CRP): a trustworthy intelligent system cannot simultaneously maintain consistent reasoning and always produce an answer. The resolution is the ability to say "I don't know"(Bastounis et al., 2024).
_script: true
This script will only execute in HTML slides
_script: true