Methodology

About the
Observatory

This page explains how we turn annual reports from UK-listed public companies into the data powering the dashboard, and the decisions behind each step in the pipeline. For a deeper dive, the full technical report is coming soon.

Overview

The AI Risk Observatory processes annual reports from UK-listed public companies through a two-stage AI classification pipeline. The dataset spans all annual reports published between 2020 and 2026 by 1,362 companies, totalling 9,821 filings. Of these, 4,637 filings contain at least one AI-relevant mention, and after quality filters 4,084 carry meaningful AI signal. Because annual reports can run to hundreds of pages, we extract only the relevant AI mentions and their surrounding context — giving us 24,189 annotated text chunks in total.

The pipeline follows three stages:

  1. Extract all relevant AI mentions from each filing.
  2. Broadly classify the type of AI mentioned into six categories: Adoption, Risk, Harm, Vendor, General or ambiguous, or False Positive.
  3. For each Adoption, Risk, and Vendor mention, classify it into a detailed sub-taxonomy.

We also run a substantiveness classifier to measure the depth of each mention, rating it on a scale from boilerplate to substantive.

The pipeline is illustrated below. Phase 1 labels are not mutually exclusive, so those counts sum to more than the number of extracted reports.

9,821

Filings examined

4,637

Reports that mention AI

47% of filings

5,184

Don't mention AI

53% of filings

Phase 1: Classify the Type of AI Mention

AI mentions are classified into six categories.

General, other, or ambiguous

3,052 reports

66% of extracted

None (including false positive)

553 reports

12% of extracted

Harm

7 reports

0% of extracted

Adoption

2,916 reports

63% of extracted

Risk

1,783 reports

38% of extracted

Vendor

1,001 reports

22% of extracted

Phase 2: Detailed Taxonomies

From Phase 1, only Adoption, Risk, and Vendor are processed further into the following subcategories.

Adoption

Traditional AI (non-LLM)2,484
LLM1,158
Ambiguous AI adoption type862
Agentic AI563

Risk

Regulatory/compliance936
Operational/technical1,110
Strategic/competitive1,174
Cybersecurity1,042
Reputational/ethical732
Third party/supply chain505
Information integrity388
Workforce impacts332
National security96
Environmental impact84

Vendor

Other505
Microsoft322
Internal198
OpenAI164
Google162
Undisclosed156
Amazon / AWS121
Nvidia127
Meta54
Salesforce34
Databricks14
IBM23
UK AI Vendors22
Anthropic12
Snowflake11
Arm9
Palantir6
xAI / Grok5
Open-Source Model5
Mistral2
Cohere1
Hugging Face0
Pinecone0

1. Data

Scope

To measure AI risk, adoption, and vendor dependence across the UK economy, we process all annual reports published by all public companies in the UK. There are 1,660 public companies listed on UK markets (LSE Main Market, AIM Market, and AQSE). After excluding companies not registered in the UK (e.g. Irish or Canadian companies listed on these exchanges) and firms without filings available via Companies House, our working universe is approximately 1,362 companies. Each company files, on average, one annual report per year.1

The current report universe breaks down across exchange segments as follows.

SegmentNumber of companiesNumber of reports
Main Market7767,827
Main Market (FTSE 350)2893,638
Main Market (FTSE 100)851,359
Main Market (FTSE 250)2042,279
AIM4891,462
Aquis Exchange3374

Decisions & Rationale

Why annual reports? Unlike earnings calls, press releases, or public media, annual reports are audited, structured, and published on a consistent cadence — making them a reliable, high-signal source of information. UK public companies must publish annual accounts, a strategic report, a directors' report, and an auditor's report under the Companies Act 2006. All listed companies share that statutory core, but Main Market issuers face tighter deadlines and more detailed disclosure rules than AIM and AQSE companies.5

This makes annual reports well suited to tracking trends across the UK economy over time. There are two primary limitations: (1) they are inherently backward-looking, often with a significant delay; and (2) their highly regulated nature means many statements are boilerplate and contain little real information.2

Why 2020–2026? We chose this window to capture a pre-ChatGPT baseline (before the late-2022 inflection) and the rapid adoption cycle that followed.

How do we map to CNI? The Critical National Infrastructure in the UK has 13 distinct sectors. Each company in our database has an ISIC sector code that only partially maps to CNI sectors. We take a conservative approach, using an LLM classifier to assign CNI sectors to companies that do not map directly from ISIC; when no assignment can be made, we use an “Other” CNI category.3 A major limitation of CNI analysis via annual reports is that some sectors — such as Space, Emergency Services, or Civil Nuclear — have few public companies or suppliers represented.4

Data Provider Acknowledgment

Converting PDFs to clean, structured text is technically demanding, and doing so at that scale would have exceeded our compute budget. We partnered with FinancialReports.eu, a third-party financial data provider, to obtain all annual reports in our scope in Markdown format. Their filings API and generous support made this project possible.

2. Pre-processing

Chunking Approach

Once each annual report is in structured Markdown text, we split it into chunks using a sliding-window approach that respects paragraph and section boundaries, with generous padding around each AI mention. An AI keyword filter isolates sections that explicitly mention AI or closely related techniques; only those sections are retained for further annotation as AI mentions. Each chunk carries metadata: company identifier, reporting year, release month, report section (e.g. Risk Factors, Strategy), and a stable chunk ID for traceability.

Chunking Results

The table below shows filings with AI mentions and the number of AI mentions extracted per year.

YearNumber of FilingsFilings with AI Mention (% of total)Count of AI mentions
20201,007199(20%)492
20211,328364(27%)1,095
20221,853526(28%)1,856
20231,905701(37%)2,310
20241,8281,009(55%)5,091
20251,5611,023(66%)6,560
2026339262(77%)3,253
Total9,8214,084(42%)20,657

3. Processing

Phase 1: Mention-Type Classification

First, each chunk is passed to an LLM classifier that decides whether the text contains a genuine AI mention and, if so, assigns one or more mention-type labels. Chunks assigned only the None label are filtered out as false positives before Phase 2.

The Phase 1 classifier uses the following taxonomy:

LabelDefinition
AdoptionReal current deployment, implementation, rollout, pilot, or use of AI by the company or for its clients.
RiskAI directly attributed as the source of a risk or downside to the firm, another party, or society at large.
HarmAI described as causing an actual past or ongoing injury, damage, or loss.
Vendor referenceA provider of AI technology, model, platform, compute infrastructure, or AI hardware is referenced.
General, other, or ambiguousAI mentioned but too high-level, vague, or otherwise outside the adoption, risk, harm, and vendor categories.
NoneNo real AI mention / false positive. Exclusive — cannot co-occur with others.

Phase 1 Label Distribution Over Time

Distribution of Phase 1 mention-type labels across all AI-mentioning filings, by year. Labels are not mutually exclusive, so a single filing can contribute to multiple categories.

AI Mention Types Over Time

Adoption
AI Risk Mentioned
Vendor
General / Other / Ambiguous
Harm
None / False Positive

Each bar shows how many reports per year were tagged with each mention type (threshold: at least one mention in the report with a confidence score ≥ 0.2). p = partial year; 2026 data is not a full-year sample.

Phase 2: Deep-Taxonomy Classification

Chunks that passed Phase 1 are processed by dedicated classifiers depending on their mention types. We process three of the Phase 1 mention types — adoption, risk, and vendor — each through its own LLM classifier. Chunks tagged as Risk are also scored for substantiveness. The taxonomies used are as follows:

Adoption Taxonomy

LabelDefinition
Traditional AI/MLAI that is not LLM or agentic AI, such as computer vision, predictive analytics, fraud detection, recommendation engines, anomaly detection, or ML-enabled robotic process automation.
LLM/GenAILarge language models and GenAI, including GPT, ChatGPT, Gemini, Claude, Copilot, text generation, NLP chatbots, document summarization, or code generation.
Agentic systemsAI systems or agents that autonomously execute tasks and take actions with limited human oversight. AI assistants, copilots, and decision-support tools are not agentic unless autonomous execution is clear.
AmbiguousCurrent AI adoption is present, but too vague to classify as traditional AI, LLM, or agentic without guessing.

Risk Taxonomy

LabelDefinition
Strategic / competitiveAI-driven competitive disadvantage, displacement, failure to adapt, or pricing and margin erosion.
Operational / technicalAI reliability, accuracy, safety, or model-risk failures that degrade decisions or operations, including unsafe employee AI use.
CybersecurityAI-enabled attacks, fraud, breach pathways, or adversarial abuse.
Workforce impactsAI-driven displacement or skills gaps.
Regulatory / complianceAI-specific legal, regulatory, privacy, or IP liability, compliance burden, or enforcement exposure.
Information integrityAI-enabled misinformation, deepfakes, content authenticity manipulation, or similar information integrity failures.
Reputational / ethicalTrust, fairness, ethics, or rights concerns.
Third-party / supply chainDependency on AI vendors, concentration risk, or exposure to failures or misuse of AI in the company supply chain.
Environmental impactEnergy, carbon, or resource-burden risk.
National securityAI-linked geopolitical or security destabilisation, or exposure of critical systems.
NoneNo attributable risk category (or too vague to assign one).

Vendor Taxonomy

Vendors are tagged against a predefined list of named providers: OpenAI, Microsoft, Google, Amazon / AWS, Nvidia, Salesforce, Databricks, IBM, Snowflake, Meta, Anthropic, xAI / Grok, Palantir, Arm, Mistral, Cohere, Hugging Face, Pinecone, and UK AI vendors (Darktrace, Quantexa, Featurespace, Faculty AI, BenevolentAI). Additional categories cover open-source models, internal AI model development or deployment, undisclosed third-party AI vendors, and other named providers outside the predefined list.

Substantiveness

LevelDefinition
BoilerplateGeneric AI language; could appear in many reports unchanged.
ModerateA specific area is identified, but without concrete mechanisms, metrics, or mitigation steps.
SubstantiveConcrete mechanism, tangible action, commitment, metric, or timeline.

4. Quality Assurance

We enforce structured outputs and explicit validation rules to reduce noise and improve reproducibility. We apply the following checks:

  • Structured outputs — classifiers write to strict JSON response schemas; malformed or labels outside the permitted set are retried or flagged.
  • Conservative prompting — prompts require explicit AI attribution and discourage over-labelling; the default outcome is none or general_other_or_ambiguous.
  • Temperature zero — all classifier calls use temperature zero for deterministic, reproducible outputs.
  • Chunk-level traceability — every annotation maps back to a company, year, and report section via a stable chunk ID.
  • QA scripts — we run QA tests across each pipeline stage, checking primarily for anomalies and out-of-distribution outputs:
    • Document size, length, duplication, fiscal-year-match, and text anomalies (non-Markdown formatting, unexpected characters).
    • Outlier analysis on the distribution of Phase 1 and Phase 2 labels per company, report, and year; AI mentions extracted per report; and chunk creation keywords.

    All flagged outputs were manually reviewed.

  • Human review — the dataset is vast, and while we have made every effort to audit anomalies arising from data processing, some errors and misclassifications may remain. Our data is available for download. If you spot an issue, please file it on the repository.
Annotation

Labeled Examples

Browse a sample of annotated text chunks. Select one to view the excerpt, its metadata, and the taxonomy labels applied by both phases of the classifier.

Phase 1

Mention type

Adoption

Phase 2

Adoption

Non Llm

Full Excerpt Text

![img-14.jpeg](img-14.jpeg) # onfido Onfido is building the new identity standard for the internet. Its AI-based technology assesses whether a user's government-issued ID is genuine or fraudulent, and then compares it against their facial biometrics. Using computer vision and a number of other AI technologies, Onfido can verify against 4,500 different types of identity documents across 195 countries, using techniques like "facial liveness" to see patterns invisible to the human eye. Onfido was founded in 2012 and has offices in London, San Francisco, New York, Lisbon, Paris, New Delhi and Singapore. The company has attracted over 1,500 customers in 60 countries worldwide, including industry leaders such as GoCardless, Nutmeg, Bitstamp and Revolut. These customers are choosing Onfido over others because of its ability to scale, speed in on-boarding new customers (15 seconds for flash verification), preventing fraud, and its advanced biometric technology. Augmentum invested an additional £3.7 million in a convertible loan note ("CLN") in December 2019 as part of a £4.7 million round. This converted into equity when Onfido raised an additional £64.7 million in April 2020.

Footnotes

  1. Some companies have multiple subsidiaries with separate filings, while others were recently listed or spun off and therefore have fewer years of filings available. This means the per-company filing count is not uniform across the dataset.
  2. To address the boilerplate problem we apply a substantiveness classifier (see Phase 2 above) that rates each mention on a scale from boilerplate to substantive, allowing users to filter to high-signal disclosures.
  3. The ISIC-to-CNI mapping follows two steps: a direct lookup for ISIC codes that clearly correspond to a CNI sector, followed by an LLM classifier for ambiguous cases. Companies that cannot be assigned to any CNI sector are labelled “Other”.
  4. The following CNI sectors have particularly low public-company representation in our dataset: Space (0), Emergency Services (0), Civil Nuclear (2), Water (18), Defence (20), Government (20), Data Infrastructure (22), Communications (28), Chemicals (34). Conclusions drawn about these sectors should be treated with caution.
  5. Main Market issuers are generally subject to FCA disclosure and listing rules, including a four-month reporting deadline, while AIM and AQSE companies typically have up to six months. The auditor's formal opinion covers the financial statements, not the annual report narrative as a whole.