AI Risk Observatory

Overview

The AI Risk Observatory processes annual reports from UK-listed public companies through a two-stage AI classification pipeline. The dataset spans all annual reports published between 2020 and 2026 by 1,362 companies, totalling 9,821 filings. Of these, 4,637 filings contain at least one AI-relevant mention, and after quality filters 4,084 carry meaningful AI signal. Because annual reports can run to hundreds of pages, we extract only the relevant AI mentions and their surrounding context — giving us 24,189 annotated text chunks in total.

The pipeline follows three stages:

Extract all relevant AI mentions from each filing.
Broadly classify the type of AI mentioned into six categories: Adoption, Risk, Harm, Vendor, General or ambiguous, or False Positive.
For each Adoption, Risk, and Vendor mention, classify it into a detailed sub-taxonomy.

We also run a substantiveness classifier to measure the depth of each mention, rating it on a scale from boilerplate to substantive.

The pipeline is illustrated below. Phase 1 labels are not mutually exclusive, so those counts sum to more than the number of extracted reports.

9,821

Filings examined

4,637

Reports that mention AI

47% of filings

5,184

Don't mention AI

53% of filings

Phase 1: Classify the Type of AI Mention

AI mentions are classified into six categories.

General, other, or ambiguous

3,052 reports

66% of extracted

None (including false positive)

553 reports

12% of extracted

Harm

7 reports

0% of extracted

Adoption

2,916 reports

63% of extracted

Risk

1,783 reports

38% of extracted

Vendor

1,001 reports

22% of extracted

Phase 2: Detailed Taxonomies

From Phase 1, only Adoption, Risk, and Vendor are processed further into the following subcategories.

Adoption

Traditional AI (non-LLM)2,484

LLM1,158

Ambiguous AI adoption type862

Agentic AI563

Risk

Regulatory/compliance936

Operational/technical1,110

Strategic/competitive1,174

Cybersecurity1,042

Reputational/ethical732

Third party/supply chain505

Information integrity388

Workforce impacts332

National security96

Environmental impact84

Vendor

Other505

Microsoft322

Internal198

OpenAI164

Google162

Undisclosed156

Amazon / AWS121

Nvidia127

Meta54

Salesforce34

Databricks14

IBM23

UK AI Vendors22

Anthropic12

Snowflake11

Arm9

Palantir6

xAI / Grok5

Open-Source Model5

Mistral2

Cohere1

Hugging Face0

Pinecone0

1. Data

Scope

To measure AI risk, adoption, and vendor dependence across the UK economy, we process all annual reports published by all public companies in the UK. There are 1,660 public companies listed on UK markets (LSE Main Market, AIM Market, and AQSE). After excluding companies not registered in the UK (e.g. Irish or Canadian companies listed on these exchanges) and firms without filings available via Companies House, our working universe is approximately 1,362 companies. Each company files, on average, one annual report per year.¹

The current report universe breaks down across exchange segments as follows.

Segment	Number of companies	Number of reports
Main Market	776	7,827
Main Market (FTSE 350)	289	3,638
Main Market (FTSE 100)	85	1,359
Main Market (FTSE 250)	204	2,279
AIM	489	1,462
Aquis Exchange	33	74

Decisions & Rationale

Why annual reports? Unlike earnings calls, press releases, or public media, annual reports are audited, structured, and published on a consistent cadence — making them a reliable, high-signal source of information. UK public companies must publish annual accounts, a strategic report, a directors' report, and an auditor's report under the Companies Act 2006. All listed companies share that statutory core, but Main Market issuers face tighter deadlines and more detailed disclosure rules than AIM and AQSE companies.⁵

This makes annual reports well suited to tracking trends across the UK economy over time. There are two primary limitations: (1) they are inherently backward-looking, often with a significant delay; and (2) their highly regulated nature means many statements are boilerplate and contain little real information.²

Why 2020–2026? We chose this window to capture a pre-ChatGPT baseline (before the late-2022 inflection) and the rapid adoption cycle that followed.

How do we map to CNI? The Critical National Infrastructure in the UK has 13 distinct sectors. Each company in our database has an ISIC sector code that only partially maps to CNI sectors. We take a conservative approach, using an LLM classifier to assign CNI sectors to companies that do not map directly from ISIC; when no assignment can be made, we use an “Other” CNI category.³ A major limitation of CNI analysis via annual reports is that some sectors — such as Space, Emergency Services, or Civil Nuclear — have few public companies or suppliers represented.⁴

Data Provider Acknowledgment

Converting PDFs to clean, structured text is technically demanding, and doing so at that scale would have exceeded our compute budget. We partnered with FinancialReports.eu, a third-party financial data provider, to obtain all annual reports in our scope in Markdown format. Their filings API and generous support made this project possible.

2. Pre-processing

Chunking Approach

Once each annual report is in structured Markdown text, we split it into chunks using a sliding-window approach that respects paragraph and section boundaries, with generous padding around each AI mention. An AI keyword filter isolates sections that explicitly mention AI or closely related techniques; only those sections are retained for further annotation as AI mentions. Each chunk carries metadata: company identifier, reporting year, release month, report section (e.g. Risk Factors, Strategy), and a stable chunk ID for traceability.

Chunking Results

The table below shows filings with AI mentions and the number of AI mentions extracted per year.

Year	Number of Filings	Filings with AI Mention (% of total)	Count of AI mentions
2020	1,007	199(20%)	492
2021	1,328	364(27%)	1,095
2022	1,853	526(28%)	1,856
2023	1,905	701(37%)	2,310
2024	1,828	1,009(55%)	5,091
2025	1,561	1,023(66%)	6,560
2026	339	262(77%)	3,253
Total	9,821	4,084(42%)	20,657

3. Processing

Phase 1: Mention-Type Classification

First, each chunk is passed to an LLM classifier that decides whether the text contains a genuine AI mention and, if so, assigns one or more mention-type labels. Chunks assigned only the None label are filtered out as false positives before Phase 2.

The Phase 1 classifier uses the following taxonomy:

Label	Definition
Adoption	Real current deployment, implementation, rollout, pilot, or use of AI by the company or for its clients.
Risk	AI directly attributed as the source of a risk or downside to the firm, another party, or society at large.
Harm	AI described as causing an actual past or ongoing injury, damage, or loss.
Vendor reference	A provider of AI technology, model, platform, compute infrastructure, or AI hardware is referenced.
General, other, or ambiguous	AI mentioned but too high-level, vague, or otherwise outside the adoption, risk, harm, and vendor categories.
None	No real AI mention / false positive. Exclusive — cannot co-occur with others.

Phase 1 Label Distribution Over Time

Distribution of Phase 1 mention-type labels across all AI-mentioning filings, by year. Labels are not mutually exclusive, so a single filing can contribute to multiple categories.

AI Mention Types Over Time

Adoption

AI Risk Mentioned

Vendor

General / Other / Ambiguous

Harm

None / False Positive

Each bar shows how many reports per year were tagged with each mention type (threshold: at least one mention in the report with a confidence score ≥ 0.2). p = partial year; 2026 data is not a full-year sample.

Phase 2: Deep-Taxonomy Classification

Chunks that passed Phase 1 are processed by dedicated classifiers depending on their mention types. We process three of the Phase 1 mention types — adoption, risk, and vendor — each through its own LLM classifier. Chunks tagged as Risk are also scored for substantiveness. The taxonomies used are as follows:

Adoption Taxonomy

Label	Definition
Traditional AI/ML	AI that is not LLM or agentic AI, such as computer vision, predictive analytics, fraud detection, recommendation engines, anomaly detection, or ML-enabled robotic process automation.
LLM/GenAI	Large language models and GenAI, including GPT, ChatGPT, Gemini, Claude, Copilot, text generation, NLP chatbots, document summarization, or code generation.
Agentic systems	AI systems or agents that autonomously execute tasks and take actions with limited human oversight. AI assistants, copilots, and decision-support tools are not agentic unless autonomous execution is clear.
Ambiguous	Current AI adoption is present, but too vague to classify as traditional AI, LLM, or agentic without guessing.

Risk Taxonomy

Label	Definition
Strategic / competitive	AI-driven competitive disadvantage, displacement, failure to adapt, or pricing and margin erosion.
Operational / technical	AI reliability, accuracy, safety, or model-risk failures that degrade decisions or operations, including unsafe employee AI use.
Cybersecurity	AI-enabled attacks, fraud, breach pathways, or adversarial abuse.
Workforce impacts	AI-driven displacement or skills gaps.
Regulatory / compliance	AI-specific legal, regulatory, privacy, or IP liability, compliance burden, or enforcement exposure.
Information integrity	AI-enabled misinformation, deepfakes, content authenticity manipulation, or similar information integrity failures.
Reputational / ethical	Trust, fairness, ethics, or rights concerns.
Third-party / supply chain	Dependency on AI vendors, concentration risk, or exposure to failures or misuse of AI in the company supply chain.
Environmental impact	Energy, carbon, or resource-burden risk.
National security	AI-linked geopolitical or security destabilisation, or exposure of critical systems.
None	No attributable risk category (or too vague to assign one).

Vendor Taxonomy

Vendors are tagged against a predefined list of named providers: OpenAI, Microsoft, Google, Amazon / AWS, Nvidia, Salesforce, Databricks, IBM, Snowflake, Meta, Anthropic, xAI / Grok, Palantir, Arm, Mistral, Cohere, Hugging Face, Pinecone, and UK AI vendors (Darktrace, Quantexa, Featurespace, Faculty AI, BenevolentAI). Additional categories cover open-source models, internal AI model development or deployment, undisclosed third-party AI vendors, and other named providers outside the predefined list.

Substantiveness

Level	Definition
Boilerplate	Generic AI language; could appear in many reports unchanged.
Moderate	A specific area is identified, but without concrete mechanisms, metrics, or mitigation steps.
Substantive	Concrete mechanism, tangible action, commitment, metric, or timeline.

4. Quality Assurance

We enforce structured outputs and explicit validation rules to reduce noise and improve reproducibility. We apply the following checks:

Structured outputs — classifiers write to strict JSON response schemas; malformed or labels outside the permitted set are retried or flagged.
Conservative prompting — prompts require explicit AI attribution and discourage over-labelling; the default outcome is none or general_other_or_ambiguous.
Temperature zero — all classifier calls use temperature zero for deterministic, reproducible outputs.
Chunk-level traceability — every annotation maps back to a company, year, and report section via a stable chunk ID.
QA scripts — we run QA tests across each pipeline stage, checking primarily for anomalies and out-of-distribution outputs:
- Document size, length, duplication, fiscal-year-match, and text anomalies (non-Markdown formatting, unexpected characters).
- Outlier analysis on the distribution of Phase 1 and Phase 2 labels per company, report, and year; AI mentions extracted per report; and chunk creation keywords.
All flagged outputs were manually reviewed.
Human review — the dataset is vast, and while we have made every effort to audit anomalies arising from data processing, some errors and misclassifications may remain. Our data is available for download. If you spot an issue, please file it on the repository.

Download Dataset GitHub Repository

Annotation

Labeled Examples

Browse a sample of annotated text chunks. Select one to view the excerpt, its metadata, and the taxonomy labels applied by both phases of the classifier.

Phase 1

Mention type

Adoption

Phase 2

Adoption

Non Llm

Full Excerpt Text

“![img-14.jpeg](img-14.jpeg) # onfido Onfido is building the new identity standard for the internet. Its AI-based technology assesses whether a user's government-issued ID is genuine or fraudulent, and then compares it against their facial biometrics. Using computer vision and a number of other AI technologies, Onfido can verify against 4,500 different types of identity documents across 195 countries, using techniques like "facial liveness" to see patterns invisible to the human eye. Onfido was founded in 2012 and has offices in London, San Francisco, New York, Lisbon, Paris, New Delhi and Singapore. The company has attracted over 1,500 customers in 60 countries worldwide, including industry leaders such as GoCardless, Nutmeg, Bitstamp and Revolut. These customers are choosing Onfido over others because of its ability to scale, speed in on-boarding new customers (15 seconds for flash verification), preventing fraud, and its advanced biometric technology. Augmentum invested an additional £3.7 million in a convertible loan note ("CLN") in December 2019 as part of a £4.7 million round. This converted into equity when Onfido raised an additional £64.7 million in April 2020.”

Footnotes

Some companies have multiple subsidiaries with separate filings, while others were recently listed or spun off and therefore have fewer years of filings available. This means the per-company filing count is not uniform across the dataset.
To address the boilerplate problem we apply a substantiveness classifier (see Phase 2 above) that rates each mention on a scale from boilerplate to substantive, allowing users to filter to high-signal disclosures.
The ISIC-to-CNI mapping follows two steps: a direct lookup for ISIC codes that clearly correspond to a CNI sector, followed by an LLM classifier for ambiguous cases. Companies that cannot be assigned to any CNI sector are labelled “Other”.
The following CNI sectors have particularly low public-company representation in our dataset: Space (0), Emergency Services (0), Civil Nuclear (2), Water (18), Defence (20), Government (20), Data Infrastructure (22), Communications (28), Chemicals (34). Conclusions drawn about these sectors should be treated with caution.
Main Market issuers are generally subject to FCA disclosure and listing rules, including a four-month reporting deadline, while AIM and AQSE companies typically have up to six months. The auditor's formal opinion covers the financial statements, not the annual report narrative as a whole.

About theObservatory

Overview

1. Data

Scope

Decisions & Rationale

Data Provider Acknowledgment

2. Pre-processing

Chunking Approach

Chunking Results

3. Processing

Phase 1: Mention-Type Classification

AI Mention Types Over Time

Phase 2: Deep-Taxonomy Classification

Adoption Taxonomy

Risk Taxonomy

Vendor Taxonomy

Substantiveness

4. Quality Assurance

Labeled Examples

Footnotes

About the
Observatory