ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

Mitigating Intellectual Debt in AI Systems

Christian Cabrera Jojoa

Assistant Research Professor

Department of Computer Science and Technology

University of Cambridge

chc79@cam.ac.uk

ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

Outline




  • The AI Adoption Process
  • AI-based Software Systems
    1. Intellectual Debt
  • AI as a Service
    1. The Data Dichotomy
  • Data-Oriented Architectures (DOAs)
  • Data-Oriented Debugger
  • DOAgent Library
  • Conclusions
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

The AI Adoption Process

ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

The AI Adoption Process

AI Puzzle
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

The AI Adoption Process

AI Adoption
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

The AI Adoption Process

AI Adoption
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

The AI Adoption Process

AI Adoption
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

The AI Adoption Process

AI Adoption
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

The AI Adoption Process

AI Adoption

Software systems are the interfaces between AI technologies and our socio-technical systems.

Socio-technical systems include people, institutions, infrastructure, and digital technologies that cooperate to serve our society

  • Government agencies
  • Hospitals
  • Industries
  • Universities
  • Research institutes
  • ...

ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

AI-based Software Systems

ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

AI-based Software Systems

AI System
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

AI-based Software Systems

AI System
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

AI-based Software Systems

AI System
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

AI-based Software Systems

AI-based software systems are data-driven. Unlike in traditional systems, developers cannot fully predefine their behaviour. ML components learn such behaviour from data, operating as black boxes that propagate uncertainty into complex software.

AI System
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

AI-based Software Systems

AI-based software systems are data-driven. Unlike in traditional systems, developers cannot fully predefine their behaviour. ML components learn such behaviour from data, operating as black boxes that propagate uncertainty into complex software.

AI System
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

AI-based Software Systems

Intellectual Debt: Practitioners deploy data-driven systems that work in practice, but do not fully understand their inner workings. This threatens transparency, safety, and trust, increasing risks of AI's negative social impact (Zittrain, 2022).

ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

AI-based Software Systems

The "Technocentric" View

Single Model
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

AI-based Software Systems

The "Technocentric" View

Single Model
ML System?
https://xkcd.com/1838/, CC BY-NC 2.5 , via XKCD
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

AI-based Software Systems

Threat to AI applications and promises


  • Mundane applications for problems we did not know we had
  • Disregard for social and environmental implications
  • Unrealistic expectations and hype
  • Exclusion of diverse perspectives and voices
  • Unsustainable technologies
  • Increased inequality and digital divide
  • Security and privacy concerns
  • ...
ML System?
https://xkcd.com/1838/, CC BY-NC 2.5 , via XKCD
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

AI-based Software Systems

The Systems View
AI System
AI System
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

AI-based Software Systems

The Systems View
AI System
AI System
How are software systems currently designed, developed, and deployed?
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

AI as a Service

ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

AI as a Service

AI System
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

AI as a Service

AI System
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

AI as a Service

SOA is a design pattern in which services are provided between components, through a communication protocol over a network.

AI System
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

AI as a Service

SOA is a design pattern in which services are provided between components, through a communication protocol over a network.


Microservices are an architectural style that structures an application as a collection of small, autonomous services. Each microservice is self-contained and exposes a business capability, which is implemented by an object (i.e., OOP).

AI System
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

AI as a Service

SOA is a design pattern in which services are provided between components, through a communication protocol over a network.


Microservices are an architectural style that structures an application as a collection of small, autonomous services. Each microservice is self-contained and exposes a business capability, which is implemented by an object (i.e., OOP).


The concept of "Everything as a Service" (XaaS) extends the principles of SOA and microservices by offering comprehensive services over the internet. XaaS encompasses a wide range of services, including infrastructure, platforms, and software.

AI System
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

AI as a Service

AI as a Service (AIaaS) enables us to access and expose AI capabilities over the internet. We can integrate AI tools such as machine learning models, natural language processing, and computer vision into our applications leveraging SOA and microservices features.

AI System
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

AI as a Service


from flask import Flask, request, jsonify
app = Flask(__name__)
class SentimentAnalysisService:
    def __init__(self, model):
        self.model = model

    def analyze_sentiment(self, text):
        sentiment_score = self.model.predict(text)
        if sentiment_score > 0.5:
            return "Positive"
        elif sentiment_score < -0.5:
            return "Negative"
        else:
            return "Neutral"
...
@app.route('/analyze', methods=['POST'])
def analyze():
    data = request.get_json()
    text_to_analyze = data.get('text', '')
    sentiment = service.analyze_sentiment(text_to_analyze)
    return jsonify({'sentiment': sentiment})
...
AI System
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

AI as a Service


from flask import Flask, request, jsonify
app = Flask(__name__)
class SentimentAnalysisService:
    def __init__(self, model):
        self.model = model

    def analyze_sentiment(self, text):
        sentiment_score = self.model.predict(text)
        if sentiment_score > 0.5:
            return "Positive"
        elif sentiment_score < -0.5:
            return "Negative"
        else:
            return "Neutral"
...
@app.route('/analyze', methods=['POST'])
def analyze():
    data = request.get_json()
    text_to_analyze = data.get('text', '')
    sentiment = service.analyze_sentiment(text_to_analyze)
    return jsonify({'sentiment': sentiment})
...

Focus on Operations:

  • Separation of concerns
  • High availability
  • Scalability
  • Low latency

Data is secondary and hidden behind services' interfaces.

ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

AI as a Service


from flask import Flask, request, jsonify
app = Flask(__name__)
class SentimentAnalysisService:
    def __init__(self, model):
        self.model = model

    def analyze_sentiment(self, text):
        sentiment_score = self.model.predict(text)
        if sentiment_score > 0.5:
            return "Positive"
        elif sentiment_score < -0.5:
            return "Negative"
        else:
            return "Neutral"
...
@app.route('/analyze', methods=['POST'])
def analyze():
    data = request.get_json()
    text_to_analyze = data.get('text', '')
    sentiment = service.analyze_sentiment(text_to_analyze)
    return jsonify({'sentiment': sentiment})
...

The Data Dichotomy: “While data-driven systems are about exposing data, service-oriented architectures and object-oriented programming are about hiding data.” (Stopford, 2016).

ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

AI as a Service

The Data Dichotomy: “While data-driven systems are about exposing data, service-oriented architectures and object-oriented programming are about hiding data.” (Stopford, 2016). We need to design systems prioritising data!

ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

Data-Orientation

ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

Data-Orientation


Data-Oriented Architecture (DOA) is an architectural style developed to address the requirements of data-intensive systems that work in real-time without centralised servers (Vorhemus, 2017).

DOA Architecture

Data-First Systems

  • Data is available by design
  • Traceability and monitoring
  • Interpretability
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

Data-Orientation


Data-Oriented Architecture (DOA) is an architectural style developed to address the requirements of data-intensive systems that work in real-time without centralised servers (Vorhemus, 2017).

Prioritise Decentralisation

  • Super-low latency requirements
  • Privacy by design
Decentralisation
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

Data-Orientation


Data-Oriented Architecture (DOA) is an architectural style developed to address the requirements of data-intensive systems that work in real-time without centralised servers (Vorhemus, 2017).

Openness

Openness

  • Sustainable solutions
  • Data ownership
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

Data-Orientation


Data-Oriented Architecture (DOA) is an architectural style developed to address the requirements of data-intensive systems that work in real-time without centralised servers (Vorhemus, 2017).

DOA Survey

Most of the surveyed works partially adopt the DOA principles to handle data-intensive requirements. The survey results also show that diverse tools can support adopting DOA principles: Apache Kafka, Spark Streaming, Hadoop Distributed File System, MQTT, and RabbitMQ.

ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

Data-Orientation

Data-Orientated Architectures make data available by design facilitating monitoring and maintenance. Decentralisation supports local data processing, reducing latency and improving privacy by respecting data ownership. Openness enables managing resource-constrained environments by exploiting the computing power of everyday devices (Cabrera et al., 2025).



How can we exploit these properties to address the intellectual debt problem in AI-based systems?

ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

Data-Oriented Debugger

ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

Data-Oriented Debugger

Deep Neural Network
Deep Neural Network with multiple hidden layers - QuantuMechaniX8, CC0, via Wikimedia Commons
AI System
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

Data-Oriented Debugger

RAG Process
RAG Process
DOA Architecture
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

Data-Oriented Debugger

DOA Debugger
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

Data-Oriented Debugger

DOA Debugger
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

Data-Oriented Debugger

DOA Debugger
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

Data-Oriented Debugger

DOA Debugger
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

DOAgent Library

ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

DOAgent Library

Multi-agent
Two rival teams of agents - Jordan K. Terry, CC BY-SA 4.0, via Wikimedia Commons
AI System
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

DOAgent Library


from doagent.core import FileSharedData
from doagent.core import StubAgent

shared_data = FileSharedData()
agent = StubAgent("agent-1", shared_data)
agent.write(kind="note", payload={"text": "Hello"})
for r in shared_data.listen("note"):
    print(r.id, r.payload)

Data-first Principle

Agents communicate through a data-medium

  • Shared data adapter with CRUD and listen operations
  • Adapters: For now, InMemory, File (JSONL), and MongoDB
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

DOAgent Library


{
  "id": "rec-abc123",
  "timestamp": "2026-02-22T10:00:00Z",
  "actor": "agent_0",
  "kind": "agent_update",
  "payload": { "action": 2, "round": 1 },
  "provenance": { "created_by": "agent_0", "derived_from": ["out-1"] },
  "accountability": { "owner": "team-a", "policy_id": "pol-1" }
}

Data-first Principle

Agents communicate through a data-medium

  • Data Model storing agents updates, environment outcomes, and traces
  • Logging levels controls trace, explanation, provenance, and accountability writes
  • Provenance: created_by, derived_from, used_tools
  • Accountability: owner, policy_id, responsibility_scope
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

DOAgent Library


from doagent.core import Topology
from doagent.core import TopologyConfig
from doagent.core import select_routing

config = TopologyConfig(mode=Topology.FEDERATED)
decision = select_routing(config)

Decentralisation Principle

Support for heterogeneous communication schemas

  • Topology: Centralised, federated, p2p
  • TopologyConfig and select_routing: Visibility filters which records each agent sees
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

DOAgent Library


from doagent.core import InMemoryParticipationRegistry
from doagent.core import ParticipationRecord

registry = InMemoryParticipationRegistry()
registry.register(ParticipationRecord(
    agent_id="agent-1",
    capabilities=["compute"]))

Openness Principle

Agents can join and leave at any time

  • ParticipationRegistry: Register and query which agents are present to support join/leave and resource exchange
  • ParticipationRecord: agent identifier, capabilities, and resources
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

DOAgent Library

DOAgent Architecture
DOAgent Architecture
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

DOAgent Library

Multi-agent
Two rival teams of agents - Jordan K. Terry, CC BY-SA 4.0, via Wikimedia Commons

What users provide

  • Environments: Use built-in (e.g. PettingZoo) or customised environments. The library wraps it so outcomes and traces are recorded
  • Agents: Define which agents take part of the system. The library creates them from config files and connects them to shared data
  • Policies: Plug in decision logic via policy adapters (LLM, rules, RL, or custom). The library records decisions and optional explanations
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

DOAgent Library


env = make_grid_env(width=6, height=6, agent_ids=[...])
registry = PolicyRegistry()
registry.register("pol", pol_fn)
configs = [GridAgentConfig(...)]
session = Session(shared_data, RunConfig(), topology_config=...)
env = session.wrap_env(raw_env, env_actor="env")
agents = session.create_agents(env, configs, registry)
obs = env.reset()
while not done:
    actions = [a.decide(obs) for a in agents]
    obs, rewards, done = env.step(actions)

What users provide

  • Environments: Use built-in (e.g. PettingZoo) or customised environments. The library wraps it so outcomes and traces are recorded
  • Agents: Define which agents take part of the system. The library creates them from config files and connects them to shared data
  • Policies: Plug in decision logic via policy adapters (LLM, rules, RL, or custom). The library records decisions and optional explanations
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

DOAgent Library

Grid-world

GridWorld validation example

  • Dependency-free grid-world mapping scenario: agents discover cells and landmarks under partial observations
  • Each round agents publish an agent_update. They read the shared map (from visible records) and choose a move according to its policy
  • Configurable topology (centralised, federated, p2p) and visibility. Optional energy-based participation (join/leave)
  • Run from YAML config. Session records outcomes, traces, and agent_updates transparently
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

DOAgent Library


{"id":"out-1","kind":"outcome",
"actor":"env","payload":{...}}
{"id":"au-1","kind":"agent_update",
"actor":"agent_0","payload":{"action":2,"round":1}}
{"id":"tr-1","kind":"trace",
"payload":{"from_id":"out-0","to_id":"out-1"
"enabled_by_id":"au-1","round":1}}

Stored records

  • Outcome: env state after each step (observations per agent, done flags). One per distinct state when dedup is on
  • Agent_update: per-agent decision and action each round. Links to the outcome it enabled
  • Trace: from_id, to_id, enabled_by_id. Links outcome-to-outcome via the agent_update that caused the transition
  • Collection-per-kind (e.g. outcome.jsonl, agent_update.jsonl, trace.jsonl with FileSharedData)
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

DOAgent Library

Trace graph
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

DOAgent Library


traces = load_jsonl(records_dir / "trace.jsonl")
outcomes = load_jsonl(records_dir / "outcome.jsonl")
agent_updates = load_jsonl(records_dir / "agent_update.jsonl")
attribution = compute_attribution(traces, outcomes, agent_updates)

Causal attribution analysis

  • Given the trace graph (from_id, to_id, enabled_by_id) and outcome payloads (observations per agent), attribute each state transition to the enabling agent
  • Per agent: cumulative cells discovered over time, total discovery, productive vs redundant moves (decision effectiveness)
  • DOAgent enables it: analysis uses only shared records (trace, outcome, agent_update). No access to policy or env internals. Same script for any run at logging level 2
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

DOAgent Library

Causal attribution results

Causal attribution results: Left presents per-agent cumulative discovery over rounds. Centre shows total cells discovered per agent. Right presents decision effectiveness (productive vs redundant transitions per agent). All derived from shared records. No policy or environment internals required.

ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

DOAgent Library

Topology comparison results
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

Conclusions

  • AI Software Systems are the interface between socio-technical systems and AI technologies.
  • We do not always understand how the inners of these novel systems work, generating Intellectual Debt.
  • One cause of Intellectual Debt is the Data Dichotomy generated by current software architecture paradigms.
  • Data-Oriented Architectures (DOAs) offer and alternative to avoid the dichotomy and address Intellectual Debt by facilitating traceability and interpretability.
ICMS, Bayes Centre - Mitigating Intellectual Debt in AI Systems

Many Thanks!

chc79@cam.ac.uk

_script: true

This script will only execute in HTML slides

_script: true