Your AI Got Worse. It Did not Tell You.
2026-03-23 ·
Your AI Got Worse. It Didn't Tell You.
Agora — Temporal Drift Detection for Enterprise AI
Validated on SEC EDGAR filings. 186 confirmed filers. Exact filing dates.
The Problem
You deploy a financial AI model. It scores 90% on your evaluation set. You ship it.
Eighteen months later, it's processing M&A disclosure sentences with entirely different vocabulary than it was trained on — SPAC mergers, pandemic-era disposal structures, CARES Act terminology — and producing confident wrong answers. Binary accuracy looks fine. Nothing alerts. Your model is quietly failing on exactly the transactions that matter most.
This is not a hypothetical.
What We Measured
We trained a financial entity detector on SEC EDGAR filings from 2017–2018 — using exact filing dates from the EDGAR API, confirmed across 186 public companies. We tested it forward in time.
Overall accuracy degraded 7.6 points over four years.
That's unremarkable. Here's what isn't:
The Killer Chart
| Time Gap | M&A Entity F1 | Drop | |----------|--------------|------| | ID (2017–18, train period) | 0.614 | — | | 1 year out (2019) | 0.463 | −24.6% | | 2 years out (2020) | 0.281 | −54.2% | | 4+ years out (2022–24) | 0.182 | −70.4% |
M&A-related entity types — business combinations, acquisitions, disposal groups — collapsed 70% in relative F1.
Core financial entities (Revenue, InterestExpense, Depreciation) held longer, then also collapsed. The M&A signal went first because financial vocabulary turns over at M&A/SPAC rate. The model never knew.
This result held across two model families (TF-IDF and BERT) and three separate temporal methodologies. It's not a measurement artifact. The drift is real and it's method-agnostic.
Why It Happens
EDGAR filing dates let us pinpoint exactly when vocabulary shifted:
- Feb–Mar 2020: Exact EDGAR timestamps show a dense cluster of pandemic-driven disclosures — CARES Act transactions, force majeure-adjacent filings — introducing constructions a 2017–18 model never saw.
- 2022–2024: Post-SPAC merger hangover, crypto firm consolidations, EV startup M&A. Entire new XBRL entity vocabulary the model cannot generalize to.
The model doesn't alert. It keeps producing answers. Confidence stays high. The drift is invisible from the outside.
What Agora Detects
Not just "accuracy went down."
Agora surfaces:
- When drift started — mapped to real filing dates, not vague "model age"
- Which entity types are affected — so you know whether it's core performance or edge cases
- Why — correlated to economic events and vocabulary turnover periods
- Before it costs you — catch the degradation curve early, not after it's compounded
We have the receipts: 94,770 EDGAR-exact sentences, exact filing metadata, decomposed by entity category.
The Stakes
M&A disclosure entities degraded 70%. These are the sentences describing deal structures, acquisition targets, disposal groups — the exact language that drives deal risk decisions, compliance review, and diligence workflows.
If your AI processes financial documents, contract language, or regulatory filings — and you haven't measured temporal drift by entity type — you don't know what it's getting wrong today.
Agora
Built to answer: what's degraded, when did it start, and why.
Purpose-built for enterprise AI buyers who need provable accuracy over time — not just point-in-time benchmarks.
Methodology: FiNER-139 dataset (1.1M sentences from SEC EDGAR). Temporal splits using exact EDGAR API filing dates. 186 confirmed filers. Validated across TF-IDF and BERT model families, three independent temporal methodologies.