Working notes from sandbox exploration. Not yet folded into the final recommendation. Revisit when drafting deliverables.
Attribution note: This file mixes two sources.
- Sections 1–10, 11, 12, 13, 14 (observations and synthesis): Andrea's own findings from Tuesday 2026-04-28 sandbox work + her interpretive moves (e.g., dedupe→drift hypothesis, "subtract not add," "D&B's slowness IS the audit," "confidence as unifying primitive"). Claude summarized and structured them.
- Sunday 2026-04-26 framings carried in (e.g., the "three concrete surfaces" structure, signal confidence tiering, the "substrate vs surface" terminology): these came from the prior Claude session, not Andrea. They're useful scaffolds but not yet endorsed positions — treat as inputs to the deliverables, not premises.
Terminology note: Where Claude wrote "substrate vs surface" it meant "what TC captures internally" vs "what TC shows the customer." Andrea found the term jargon-y and didn't fully buy it. Recommended replacement language for the deliverables: capture / confidence / drill-down.
The 6 PDFs linked from the brief:
- Reference Perplexity Sonar as a model option — not in the dropdown
- Show a model lineup that doesn't match the UI (no GPT-5 family, no Reasoning Effort or Verbosity controls in docs)
- Oldest article: 2021. Product has clearly evolved past its own documentation.
- The article specifically about History Logs — the substrate-verification surface my recommendation depends on — is also from 2021. The doc telling customers how to "verify what the AI returned for each record" is older than most of the AI features being verified.
Why it matters for the demo:
- Same shape of gap as Signal 2's customer pain: "I'm not sure if this is a data quality issue or if we set something up wrong." If the docs can't help an attentive candidate figure out the product, customers can't self-diagnose either.
- Concrete fodder for the "Walk Us Through Your Process" bonus question: "I tried the docs first; they're 2–5 years out of date. I had to navigate the UI to figure out the actual product."
- Corroborates that the transparency gap is systemic, not just per-record. The pattern: TC ships features, doesn't maintain the explanation/transparency layer around them.
The "AI Provider" dropdown offers two options. Both route to OpenAI's models — Azure is just OpenAI-on-Microsoft-infrastructure for compliance reasons.
Asymmetric naming surface: OpenAI direct shows GPT-5 Nano / Mini / 4o Search Preview / 4o Mini. Azure shows gpt-5.3-chat / gpt-5.2-chat. Customers must know that 5.3-chat is OpenAI on Azure — UI doesn't unify or annotate.
Currency lag: Both providers are behind frontier (GPT-5.5 exists; product tops out at GPT-5 Mini direct, gpt-5.3-chat on Azure). No "last refreshed on X date" indicator. RevOps configuring a flow trusts the dropdown is current.
Demo angle:
- Don't fold into recommendation — scope creep, separate problem from per-record transparency.
- Q&A material if asked about extensibility or "why isn't this a multi-model decision?"
- One-liner usable in deck: "TC's model abstraction is narrower than what most enterprise buyers expect from Bedrock-style services. That's an H2 conversation."
Cross-check against Bryan's stated growth direction (from 4/9 interview):
Bryan said TC's expansion target is non-profits and hospitality — verticals with messy data. He explicitly noted "tech companies struggling right now" — pulling away from tech, not toward it. So:
| Segment | Cloud reflex | Bedrock relevance |
|---|---|---|
| Existing tech-company customers | Often AWS | Real — Bedrock would help retain |
| Microsoft-shop enterprises | Azure | Already covered |
| Non-profits (Bryan's target) | Microsoft + Google nonprofit grants | Low priority |
| Hospitality (Bryan's target) | Microsoft / legacy on-prem | Not the gating constraint |
Bedrock is a retention / friction-reduction lever for existing tech customers, not an expansion lever for Bryan's stated verticals. Argues against folding into recommendation. If Q&A surfaces it: "Bedrock would help retain tech-company customers on AWS, but it's not aligned with the non-profit/hospitality expansion you mentioned in our 4/9 conversation. Different verticals, different cloud reflexes. So I'd defer it."
The thought: evals (golden test sets, calibration scoring, per-category accuracy) are a structured way to make AI quality legible. Where do they fit in the recommendation?
Where they could help:
- TC publishing per-model accuracy benchmarks across the 8 industry categories
- Customers running evals on their seed accounts to validate confidence before scaling enrichment
- Eval reports as a component of the exportable transparency artifact (the D&B-style report Signal 8 references)
Where they fall short:
- Evals serve engineering credibility more than RevOps daily clarity. RevOps doesn't want a benchmark; they want per-record explanation.
- Eval-as-marketing risks becoming theater (vanity numbers without the substrate to back them up).
- Eval cost / setup complexity may exclude SMB customers entirely.
Where this lands:
- Evals are more relevant to the technical-buyer audience (Ernesto-flavored) than the daily-user audience (RevOps).
- Probably belongs in the Validation Plan deliverable ("how would you test this works before scaling?") rather than the core recommendation.
- Worth raising in Q&A with Ernesto if he asks about engineering rigor.
Two distinct users with different needs from the same underlying substrate:
| User | What they need | Trigger context |
|---|---|---|
| Legal / compliance / procurement | Auditable, point-in-time, exportable documentation | External: lawsuit, audit, RFP, deal review (Signals 1, 3, 5, 8) |
| RevOps / data team | Diagnostic clarity — which model? what sources? can I trust this row? | Internal: campaign quality, territory routing, CRM hygiene (Signals 2, 4, 6) |
Same substrate, different surfaces:
- Legal wants the artifact (PDF, formal, can be handed off)
- RevOps wants in-Salesforce, real-time, per-record visibility
- Both rely on TC actually capturing source/confidence/model/settings per classification — that's the substrate work
Maps to the three-surface scaffold proposed in the Sunday Claude session (not yet endorsed by Andrea — keep as a candidate structure for the deliverables, not a foregone conclusion):
- Inline explainability on Account record → RevOps daily use
- Confidence-filtered bulk review queue → RevOps efficiency at scale
- Exportable transparency artifact → Legal / procurement
Demo angle:
- Lead the recommendation with the user split. Frames the transparency problem as two users, two surfaces, one substrate — not just "make it more transparent."
- Bryan will recognize the customer voice (legal vs RevOps split is exactly what Signals 1 and 4 articulate).
- Scott will recognize the UX framing (different users, different moments).
- Ernesto will recognize the architecture framing (one substrate cleanly serving multiple surfaces).
- Three panelists, one structure.
D&B is referenced twice in the brief, both compliance-flavored:
What D&B has actually won on:
- SIC code = U.S. government standard taxonomy. Familiar to legal/audit/regulatory teams. Defensible.
- Source attribution = per-record provenance.
- Data transparency report = formal exportable artifact for legal hand-off.
All three are compliance/legal surface, not RevOps daily-use surface. D&B's RevOps UX is reportedly clunky (older platform, Hoovers-era data layouts) — that's what TC could differentiate on, while still solving the legal hand-off problem.
Reinforces #4 above: D&B has solved the legal user well, ignored the RevOps user. TC has the opposite gap. Recommendation should serve both, but the framing TC uses to sell against D&B is "D&B can satisfy your lawyers; we can satisfy your lawyers AND your RevOps team in their own workflow." The transparency artifact is table stakes; the inline RevOps surface is the differentiator.
Also notable: SIC vs TC's 8 categories. D&B uses a regulator-recognized taxonomy. TC's 8 categories are internal (per the brief). If a customer needs SIC for legal, TC's classification — even if perfectly accurate — won't satisfy. Worth flagging in the deck: "TC's 8-category schema is operationally useful but not regulator-recognized. The exportable transparency artifact may need to map TC categories to SIC/NAICS for legal-grade defensibility — that's an integration point, not a model accuracy problem."
Opened Amazon Advertising LLC's record. TC's managed package exposes a panel of fields under "Other Data":
What's there: the "what" of enrichment — firmographic values, current state.
What's missing — and this IS the recommendation:
- No OCompany_Industry_Source
- No OCompany_Industry_Confidence
- No OCompany_Industry_Reasoning
- No OCompany_Last_Enriched_By_Model
The substrate captures the value. It does not capture the provenance. Recommendation closes the loop.
Bonus nuance — Primary + Secondary Industry. TC's data model has more granularity than the brief's 8-category framing acknowledges. Worth flagging in the deck or Q&A: "TC's classifier already produces Primary + Secondary tagging — that's substrate that could power richer customer-facing classification, but the surface flattens to a single field."
Several attempted runs of AI Enrichment with OpenAI as provider produced nothing:
- Active toggle stayed green
- Scheduled flows showed Deleted status without populating Next Run Time
- History Logs remained empty
- No error messages anywhere
- Azure variants (5.3-chat with and without web search) worked — same flow shape, different provider
This is Signal 2 in operation, on a candidate using the product. "I'm not sure if this is a data quality issue or if I set something up wrong" — repeated for ~45 minutes with no diagnostic surface to investigate.
What the demo gets:
- Strongest "I lived this" moment of the afternoon
- Concrete failure mode that mirrors customer-reported pain
- A failure type the recommendation specifically addresses: when classification fails, where is that recorded? Right now: nowhere visible.
Possible root causes (untriaged):
- OpenAI API key budget exhausted in the sandbox
- A specific OpenAI model deprecated / unavailable
- TC integration health issue
The root cause matters less than the diagnostic experience — silent, no logs, no clues, no path forward.
Walk Us Through Your Process angle: lead with this. "I built the same flow twice — once on OpenAI, once on Azure. Azure worked. OpenAI failed silently. No error message, no log entry, no indication of what went wrong. That's the experience your $55K customer is describing in Signal 2."
The CSV intentionally seeds name variants:
- Instagram / Instagram Inc / Instagram LLC
- Slack Technologies / Slack Technologies LLC
- GitHub / GitHub Inc
- LinkedIn / LinkedIn Corporation
- WhatsApp / WhatsApp LLC / WhatsApp International
Hypothesis: dedupe / normalization upstream changes the string AI Enrichment sees, which can change classification. Same conceptual company → different record name → different industry tag.
Test plan: with the 2 working Azure flows, compare classifications across variant pairs. Same model, same web setting. If Instagram and Instagram Inc come back differently → confirmed.
Signal connections:
- Signal 4 (data team reviews every classification): paying a tax for unpredictability across name variants
- Signal 6 (CS tickets: "Why is this account tagged X when it should be Y?"): same conceptual company → different records → different classifications
- Signal 2 (data quality vs setup confusion): two black boxes stacked — dedupe AND classification
TC's products built but not connected:
- Internal Match (dedupe), Normalized Account Name, AI Enrichment — separate features, separate logic
- AI Enrichment likely consumes raw Account Name rather than the normalized variant
- That choice isn't surfaced anywhere
Sub-recommendation candidate: AI Enrichment should consume Normalized Account Name (or at minimum expose which input it used in History Logs). Cheap, high-leverage across signals 2/4/6.
Don't make this THE recommendation — it's a supporting thread for the transparency thesis, not a replacement.
Every field dropdown across Salesforce + TC managed package + integrations exposes 100+ options:
Multiple variants of "Account Name" alone:
- Name (standard)
- Account_Name__pc (Person Account formula)
- Account_Name__p
- Account Name (Normalized) (TC custom)
- D&B Global Ultimate Account Name (D&B integration)
- Moodys Account Name (Moodys integration)
- Simplified Account Name
Configuration screens (flow context fields, field mapping, criteria builders, page layouts) all ask users to pick from these dropdowns with no contextual hints, no grouping, no "TC owns this" vs "you own this" demarcation.
This is RevOps' daily reality. They're in this forest every time they configure or troubleshoot anything. Signal 2's customer confusion is partly this — there are 100 ways to set up the same thing wrong, and the surface gives no help disambiguating.
What this means for the recommendation:
- "More fields" is not the answer. TC already has plenty.
- The inline explainability surface needs to be designed — contextual presentation, "what TC populates vs what you populate" wayfinding
- The exportable artifact should be curated — structured narrative, not a field dump
- The substrate-vs-surface thesis applies at the meta-level: TC's product is field-rich and design-poor. The recommendation adds design, not more fields.
Walk Us Through Your Process angle:
"Every dropdown click felt like a small bet on whether I'd picked the right field. Most of the time the wrong choice failed silently — I'd configure something, run it, get nothing, and have no way to tell whether the field was wrong or the enrichment was broken. That's the customer experience your Signal 2 customer is describing. The cognitive overload is part of the transparency problem."
Beyond field volume (Section 9), the configuration surfaces themselves are unforgiving:
- Flow builder asks for context fields, target fields, model choices, reasoning effort, verbosity, web search toggles, trigger settings, entry criteria, schedule timing, batch size — all in nested panels
- One wrong picklist selection (e.g., picking Account_Name__pc instead of Account.Name, or a lookup ID type instead of a string field) silently changes flow behavior
- No validation feedback at config time — the flow saves, activates, runs, produces an output, and you have no way to know whether the output reflects the model's reasoning OR your misconfigured input
- No validation feedback at run time either — History Logs (when they exist) capture input/output values but don't flag "you mapped a lookup ID type field as context, the model received an opaque ID string"
Lived experience as the candidate:
- Spent ~45 minutes on flow configuration before realizing one context field was a lookup ID instead of a string
- Spent another ~45 minutes troubleshooting OpenAI flows that silently failed
- Discovered scheduled flows showed Deleted status with Active toggle still green — UI states inconsistent
- Could not find History Logs in TC's UI without docs (and the docs are 5 years old)
What this means for customer behavior:
- A discrepancy in classification could be the model, the input, the configuration, OR the UI lying to you about state
- Customer in Signal 2 saying "not sure if it's a data quality issue or if we set something up wrong" — they're sitting in the same UI, with the same lack of feedback, blaming themselves OR TC at random
- A "bad classification" report from CS (Signal 6) might actually be a misconfigured flow producing the bad output, but no one can tell because the surface doesn't differentiate
Recommendation implication — beyond per-record explainability:
The transparency layer needs to surface not just "why did the AI pick this industry?" but "what configuration produced this enrichment?" Both belong in the History Log structure. Both belong on the record. The audit trail has to include:
1. Inputs — which fields, with which values, were sent to the model
2. Configuration — model, settings, timestamp
3. Outputs — values + sources
4. State — was this run successful, partial, or silently failed
Without (1) and (2), customers can't tell their config from TC's classifier. With them, they can isolate the variable.
Empirical finding from the 3-flow Azure A/B run (5.3-chat with web, 5.3-chat without web, 5.2-chat without web) on all 34 imported accounts:
31 unique industry strings produced. Examples:
Advertising Services, Cloud Computing, Music Streaming, Music, Consumer Electronics, Software Development, Information Technology and Services, Internet Services, Internet Services and Products, Technology, Social Media, Social Networking, E-commerce, E-Commerce, E-Commerce Retail, Software & IT Services, Software, Application Software, Software as a Service (SaaS), Internet, Telecommunications, Streaming Services, Online Video Sharing and Streaming, Internet Content & Information, Internet Messaging Services, Grocery Stores, Supermarkets & Grocery Stores, Supermarkets and Grocery Stores, Grocery Retail, Retail, Music
Three competing taxonomies coexist:
| Taxonomy | Where it lives | Categories | Visible to customer? |
|---|---|---|---|
| Brief's "8 categories" | Marketing / interview brief | 8 (Tech, FS&I, Healthcare, Manufacturing, Retail/Consumer, Media/Telecom, Prof Services, Energy) | No — only in brief |
| Salesforce standard Industry picklist | Account object (where enrichment writes) | 32 (Agriculture, Apparel, Banking, Biotechnology, Chemicals, Communications, Construction, Consulting, ...) | Yes — visible in record |
| Model output | Free-text strings from LLM | 31+ unique strings across 34 records | Yes — written to whatever target field is configured |
Zero of the 31 model outputs match either of the other two taxonomies.
Why this is the strongest demo finding of the day:
- It's empirical (34 records, 3 columns, reproducible)
- It's not a quality issue — it's a product coherence issue
- The brief itself (the artifact selling the customer on TC) describes a taxonomy that doesn't exist in the product
- Whatever field type the customer configures (Text, picklist, custom), the model output won't match the brief's framing
Demo line:
"The brief told me TC classifies into 8 categories. The standard Industry field on the Account object has 32 picklist values. The model wrote 31 distinct free-text strings across my 34 records, none of which match the 8 OR the 32. That's three competing taxonomies in one product, with no reconciliation visible to the customer. That's not an enrichment quality problem — that's a product coherence problem."
Was the picklist constraint the issue? Custom Text(255) fields were used as enrichment targets. Even if the standard Industry picklist had been used, the model's free-text outputs wouldn't have matched its 32 values. The picklist would have either silently coerced or rejected — both are failure modes the customer can't see. The product allowed Text fields with no warning. Whatever choice the customer makes, the surface doesn't reconcile to a stable taxonomy.
Even TC's own pre-built sample flow doesn't enforce the 8 categories. When inspecting Bryan's New Lead Flow: 4/22/2026 (the sample flow shipped in the demo sandbox), the AI Enrichment step's target field was the standard Salesforce Lead.Industry picklist — the same 32-value list as on the Account object. The brief's 8 categories aren't on the Lead picklist either. The 8-category framework has now failed to appear in four places it could have lived:
| Where the 8 categories could have lived | What's actually there |
|---|---|
| Standard SF Industry picklist on Account | 32 standard SF values (Agriculture, Apparel, Banking, ...) |
| Standard SF Industry picklist on Lead | Same 32 standard SF values |
| Bryan's pre-built TC sample flow target field (created 4/22/2026) | The 32-value standard picklist |
| Model output across 34 records × 3 configs | 31 unique free-text strings, none matching either taxonomy |
The 8-category framework exists in the brief and nowhere in the product — not in either object's picklist, not in TC's own demo flow that the CPO built six days ago, not in the model's output. This isn't a configuration choice, and it's not Andrea's custom field. It's the product TC ships, with the demo flow TC's own CPO built, missing the taxonomy TC's own brief describes.
Observation: TC's Account object already exposes integration fields:
- D&B Global Ultimate Account Name (Formula field) — D&B integration exists
- Moodys Account Name (custom field) — Moody's integration exists
- Normalized Account Name, Simplified Account Name — TC's normalization fields
D&B is referenced in Signal 3 (churn) and Signal 8 (RFPs). Customer pain about D&B is consistent: their compliance / source attribution / SIC code surface is what wins legal hand-off. TC has been positioned as a competitor against D&B but has the integration plumbing already.
The synthesis recommendation this opens:
LLM (TC's current) → free-text industry + sources + reasoning
↓
Map to closest SIC/NAICS via lookup
↓
Cross-reference with D&B (if account has D-U-N-S)
↓
Surface BOTH on the record + flag agreement / disagreement
Why this fits 6-8 weeks:
| Tier | Effort | Surface |
|---|---|---|
| Tier 1 — Surface what's already there | 2-3 weeks | For records with D&B data, show D&B's classification alongside TC's. Highlight agreement / disagreement visually. UI work. |
| Tier 2 — Stable taxonomy via mapping | ~1 week | Static lookup mapping TC categories ↔ SIC/NAICS. Include in exportable transparency artifact. Solves the legal hand-off problem cheaply. |
| Tier 3 (H2) | Bigger | Use D&B as active reconciliation source. Auto-prioritize records where TC and D&B disagree as review queue. |
Tiers 1 + 2 = 3-4 weeks. Fits the EM's 6-8 week constraint with margin.
Why this is competitively sharper than the original "show your work" recommendation:
"TC isn't competing with D&B on data. TC is integrating with D&B and adding the workflow layer D&B doesn't have. D&B has the compliance taxonomy. TC has the Salesforce-native UX. Together: legal gets their SIC codes, RevOps gets their inline workflow, customer doesn't choose between trust and usability."
How this restructures the three-surface scaffold (Sunday Claude proposal — still a candidate, not endorsed):
| Surface | Now powered by |
|---|---|
| Inline indicator on Account record | TC's reasoning + D&B's classification + agreement signal |
| Review queue (records where TC and D&B disagree) | Auto-prioritized — confidence is derived, not invented |
| Exportable artifact for legal | TC's classification + D&B's SIC code + sources |
The trust-transfer mechanic — D&B's "staleness" is the feature, not the bug.
Sharpened framing of why this synthesis works:
| Property | LLM (TC's current) | D&B |
|---|---|---|
| Speed | Real-time | Months stale |
| Determinism | Non-deterministic | Deterministic (analyst-verified, source-cited) |
| Trust profile | Uncertain, varies by run | Audited, regulator-accepted, decades of incumbency |
| Why it has those properties | Skips human verification → fast | Includes human verification → slow but credible |
These are complementary trade-offs, not competing axes. Cross-referencing them mechanically transfers D&B's trust to the matched LLM output:
D&B's slowness IS the audit. Customers in regulated industries (Signals 1, 3, 5, 8) intuitively know LLM is fast because none of D&B's verification is happening — that's why they keep asking for sources/audit/confidence. TC's job isn't to choose between fast and verified. TC's job is to be the bridge that transfers verification to speed.
Demo line:
"LLMs are fast and non-deterministic. D&B is slow and verified. TC's product can be the bridge that mechanically transfers D&B's trust to the LLM's speed. Where they agree, the customer gets both. Where they disagree, RevOps gets the review queue. Compliance hand-off and workflow efficiency stop being separate problems."
Caveats to acknowledge:
- Requires the customer to have a D&B subscription. Best for enterprise (which is the at-risk segment per Signals 1, 3, 5, 8). Doesn't address SMB tier.
- D&B coverage is patchy for Bryan's expansion verticals (non-profits, hospitality). Architecture needs LLM-only fallback with appropriate trust signaling.
- "No D&B record" is its own surface state — not a failure, an honest reflection of evidence available.
What this absorbs from elsewhere in the file:
- Section 5 (D&B as benchmark): D&B's compliance edge → now leveraged, not competed with
- Section 6 (OCompany_* fields are substrate without provenance): D&B provides provenance per fact
- Section 11 (three competing taxonomies): SIC/NAICS becomes the stable cross-reference
- Section 4 (legal vs UX users): both served, with one shared substrate
A meta-observation that reframes how the recommendation should be presented.
The tension running through Sections 4, 6, 7, 9, 10, 11:
- Substrate IS missing (sources, confidence, reasoning, model, config audit) — Sections 4, 6, 7
- Surface is ALREADY overloaded (100+ fields, silent misconfiguration, conflicting taxonomies, every dropdown a landmine) — Sections 9, 10, 11
The reflexive answer ("expose the substrate") is wrong. Adding 5 sidecar fields per classification (Industry_Source, Industry_Confidence, Industry_Reasoning, Industry_Model, Industry_Settings) makes Section 9's cognitive overload worse, not better. RevOps already drowns in field clutter. More substrate exposure on the same dense surface is theater — and theater that loads more cognitive cost.
The right framing: substrate enables a simpler, more trustworthy surface.
Capture all of it — sources, confidence, reasoning, model, settings — and use that substrate to power a UI that shows less by default, with depth available on demand:
| Scenario | Default surface | Drill-down (one click) |
|---|---|---|
| TC + D&B agree on Industry | One value + green ✓ "verified" | Both reasonings, sources, agreement evidence |
| TC + D&B disagree | Conflict indicator + RevOps action prompt | Both classifications, sources for each, reasoning trace |
| D&B has no record on this account | TC's value + "single source — review recommended" amber tag | TC's full reasoning, model used, sources |
| Classification failed (cf. Section 7) | Visible failure state with diagnostic surface | Error type, last attempt, suggested fix |
The default daily view gets quieter, not noisier. The substrate is doing all the work to enable that quiet.
Configuration UI gets the same treatment. The 9-flow matrix exposed Provider × Model × Reasoning Effort × Verbosity × Web Search as 5 independent dimensions with no defaults, no presets. That's admin-as-engineer, not admin-as-RevOps. Subtract here too:
- 2-3 presets ("Fast / Balanced / Comprehensive") visible by default
- Underlying config available but collapsed
- Most customers consume defaults; power users get the dials
Foundation observation (why subtract matters): user error is everywhere across this surface. Old/irrelevant docs without current screenshots. Salesforce UI's intrinsic density. Layered managed packages adding fields. Multiple objects, fields, relationships. Configuration screens with no validation feedback. Every additional surface element multiplies the chance of silent misconfiguration. The customer's "I'm not sure if it's a data quality issue or if I set it up wrong" (Signal 2) IS user error — and the UI architecture is making user error inevitable.
Why this lands with all three panelists:
The recommendation, rephrased one more time:
"The substrate work isn't to expose more. It's to enable less. Capture sources, confidence, reasoning, model — all of it — and use that substrate to support a UI that shows ONE classification value, with a verification indicator, and a one-click drill-down for the cases that warrant scrutiny. The customer's daily experience gets simpler; the audit trail gets stronger; both happen at once. That's the design principle — subtract, not add."
Sub-observation tied to data model legibility: beyond surface simplification, the data model itself needs to be legible — the customer needs to know which fields TC owns vs. they own vs. are derived. Today's UI doesn't differentiate. That's a labeling/grouping problem, not just a "fewer fields" problem.
The single derived signal that ties the whole recommendation together. Re-reading earlier sections through this lens:
| Section | What it points at | How "confidence" resolves it |
|---|---|---|
| 6 (OCompany substrate, no provenance) | Substrate captures values, not how sure | Confidence is the missing field |
| 7 (OpenAI silent failure) | No diagnostic surface when classification fails | Failed = "no confidence available," visible state distinct from "low confidence" |
| 8 (dedupe → classification drift) | Same entity, different name → different output | Variance across name variants signals low confidence |
| 11 (three competing taxonomies) | 31 unique strings across 34 records | High variance across runs/models → low confidence |
| 12 (TC + D&B synthesis) | Cross-reference creates trust | Agreement = high confidence; disagreement = low |
| 13 (subtract, not add) | Surface is overloaded | One indicator (confidence) drives all default surface decisions |
| Signal 4 ("bulk overrides with confidence filtering") | Manual review of all records is killing turnaround | High-confidence records skip review automatically |
| Signal 1, 3, 5 (compliance / legal asking RevOps) | Need defensible classification | High-confidence-with-D&B-agreement is the defensible export |
| Signal 2 (data quality vs. setup) | Customer can't diagnose | Visible failure states + confidence reasoning gives them a thread to pull |
Confidence-driven surface (the default RevOps view):
| State | Surface | Drill-down available |
|---|---|---|
| High confidence (TC + D&B agree, sources cited) | Green ✓ verified — just the value | Both reasonings, sources, model, settings |
| Medium confidence (TC alone, D&B no record) | Amber single-source — review optional | TC's full reasoning, sources, suggestion to validate |
| Low confidence (TC + D&B disagree, OR high variance across runs) | Red conflict — review prioritized | Both classifications, side-by-side reasoning, action prompt |
| Failed (Section 7) | Gray diagnostic state — error type visible | Last attempt, error context, suggested fix |
| No data | "Not yet enriched" | Trigger or schedule action |
Why this is the strongest framing for the panel:
Demo narrative arc this enables: