2026-04-28 — Demo observations & assignment threads

Working notes from sandbox exploration. Not yet folded into the final recommendation. Revisit when drafting deliverables.

Attribution note: This file mixes two sources.
- Sections 1–10, 11, 12, 13, 14 (observations and synthesis): Andrea's own findings from Tuesday 2026-04-28 sandbox work + her interpretive moves (e.g., dedupe→drift hypothesis, "subtract not add," "D&B's slowness IS the audit," "confidence as unifying primitive"). Claude summarized and structured them.
- Sunday 2026-04-26 framings carried in (e.g., the "three concrete surfaces" structure, signal confidence tiering, the "substrate vs surface" terminology): these came from the prior Claude session, not Andrea. They're useful scaffolds but not yet endorsed positions — treat as inputs to the deliverables, not premises.

Terminology note: Where Claude wrote "substrate vs surface" it meant "what TC captures internally" vs "what TC shows the customer." Andrea found the term jargon-y and didn't fully buy it. Recommended replacement language for the deliverables: capture / confidence / drill-down.

1. Support docs are badly out of date

The 6 PDFs linked from the brief:
- Reference Perplexity Sonar as a model option — not in the dropdown
- Show a model lineup that doesn't match the UI (no GPT-5 family, no Reasoning Effort or Verbosity controls in docs)
- Oldest article: 2021. Product has clearly evolved past its own documentation.
- The article specifically about History Logs — the substrate-verification surface my recommendation depends on — is also from 2021. The doc telling customers how to "verify what the AI returned for each record" is older than most of the AI features being verified.

Why it matters for the demo:
- Same shape of gap as Signal 2's customer pain: "I'm not sure if this is a data quality issue or if we set something up wrong." If the docs can't help an attentive candidate figure out the product, customers can't self-diagnose either.
- Concrete fodder for the "Walk Us Through Your Process" bonus question: "I tried the docs first; they're 2–5 years out of date. I had to navigate the UI to figure out the actual product."
- Corroborates that the transparency gap is systemic, not just per-record. The pattern: TC ships features, doesn't maintain the explanation/transparency layer around them.

2. Provider abstraction: OpenAI + Azure (both routing to OpenAI), no Anthropic / Bedrock / Google

The "AI Provider" dropdown offers two options. Both route to OpenAI's models — Azure is just OpenAI-on-Microsoft-infrastructure for compliance reasons.

No Anthropic Claude
No Google Gemini
No AWS Bedrock (which would unlock the same procurement foot-in-the-door for AWS-shop enterprises that Azure unlocks for Microsoft shops)
TC's "provider" choice is really a deployment toggle, not a model marketplace

Asymmetric naming surface: OpenAI direct shows GPT-5 Nano / Mini / 4o Search Preview / 4o Mini. Azure shows gpt-5.3-chat / gpt-5.2-chat. Customers must know that 5.3-chat is OpenAI on Azure — UI doesn't unify or annotate.

Currency lag: Both providers are behind frontier (GPT-5.5 exists; product tops out at GPT-5 Mini direct, gpt-5.3-chat on Azure). No "last refreshed on X date" indicator. RevOps configuring a flow trusts the dropdown is current.

Demo angle:
- Don't fold into recommendation — scope creep, separate problem from per-record transparency.
- Q&A material if asked about extensibility or "why isn't this a multi-model decision?"
- One-liner usable in deck: "TC's model abstraction is narrower than what most enterprise buyers expect from Bedrock-style services. That's an H2 conversation."

Cross-check against Bryan's stated growth direction (from 4/9 interview):
Bryan said TC's expansion target is non-profits and hospitality — verticals with messy data. He explicitly noted "tech companies struggling right now" — pulling away from tech, not toward it. So:

Segment	Cloud reflex	Bedrock relevance
Existing tech-company customers	Often AWS	Real — Bedrock would help retain
Microsoft-shop enterprises	Azure	Already covered
Non-profits (Bryan's target)	Microsoft + Google nonprofit grants	Low priority
Hospitality (Bryan's target)	Microsoft / legacy on-prem	Not the gating constraint

Bedrock is a retention / friction-reduction lever for existing tech customers, not an expansion lever for Bryan's stated verticals. Argues against folding into recommendation. If Q&A surfaces it: "Bedrock would help retain tech-company customers on AWS, but it's not aligned with the non-profit/hospitality expansion you mentioned in our 4/9 conversation. Different verticals, different cloud reflexes. So I'd defer it."

3. Could evals be part of the transparency answer?

The thought: evals (golden test sets, calibration scoring, per-category accuracy) are a structured way to make AI quality legible. Where do they fit in the recommendation?

Where they could help:
- TC publishing per-model accuracy benchmarks across the 8 industry categories
- Customers running evals on their seed accounts to validate confidence before scaling enrichment
- Eval reports as a component of the exportable transparency artifact (the D&B-style report Signal 8 references)

Where they fall short:
- Evals serve engineering credibility more than RevOps daily clarity. RevOps doesn't want a benchmark; they want per-record explanation.
- Eval-as-marketing risks becoming theater (vanity numbers without the substrate to back them up).
- Eval cost / setup complexity may exclude SMB customers entirely.

Where this lands:
- Evals are more relevant to the technical-buyer audience (Ernesto-flavored) than the daily-user audience (RevOps).
- Probably belongs in the Validation Plan deliverable ("how would you test this works before scaling?") rather than the core recommendation.
- Worth raising in Q&A with Ernesto if he asks about engineering rigor.

4. Compliance / legal vs. user experience / understanding

Two distinct users with different needs from the same underlying substrate:

User	What they need	Trigger context
Legal / compliance / procurement	Auditable, point-in-time, exportable documentation	External: lawsuit, audit, RFP, deal review (Signals 1, 3, 5, 8)
RevOps / data team	Diagnostic clarity — which model? what sources? can I trust this row?	Internal: campaign quality, territory routing, CRM hygiene (Signals 2, 4, 6)

Same substrate, different surfaces:
- Legal wants the artifact (PDF, formal, can be handed off)
- RevOps wants in-Salesforce, real-time, per-record visibility
- Both rely on TC actually capturing source/confidence/model/settings per classification — that's the substrate work

Maps to the three-surface scaffold proposed in the Sunday Claude session (not yet endorsed by Andrea — keep as a candidate structure for the deliverables, not a foregone conclusion):
- Inline explainability on Account record → RevOps daily use
- Confidence-filtered bulk review queue → RevOps efficiency at scale
- Exportable transparency artifact → Legal / procurement

Demo angle:
- Lead the recommendation with the user split. Frames the transparency problem as two users, two surfaces, one substrate — not just "make it more transparent."
- Bryan will recognize the customer voice (legal vs RevOps split is exactly what Signals 1 and 4 articulate).
- Scott will recognize the UX framing (different users, different moments).
- Ernesto will recognize the architecture framing (one substrate cleanly serving multiple surfaces).
- Three panelists, one structure.

5. D&B as a benchmark — note their wedge is compliance, not UX

D&B is referenced twice in the brief, both compliance-flavored:

Signal 3 (churn): "D&B has SIC codes and source attribution baked in. That made the compliance conversation a lot easier."
Signal 8 (RFPs): "D&B's pitch includes SIC code attribution and a 'data transparency report' — documentation showing where each classification came from and how confident the system is." + "come up in two RFPs in the last 60 days as a requirement."

What D&B has actually won on:
- SIC code = U.S. government standard taxonomy. Familiar to legal/audit/regulatory teams. Defensible.
- Source attribution = per-record provenance.
- Data transparency report = formal exportable artifact for legal hand-off.

All three are compliance/legal surface, not RevOps daily-use surface. D&B's RevOps UX is reportedly clunky (older platform, Hoovers-era data layouts) — that's what TC could differentiate on, while still solving the legal hand-off problem.

Reinforces #4 above: D&B has solved the legal user well, ignored the RevOps user. TC has the opposite gap. Recommendation should serve both, but the framing TC uses to sell against D&B is "D&B can satisfy your lawyers; we can satisfy your lawyers AND your RevOps team in their own workflow." The transparency artifact is table stakes; the inline RevOps surface is the differentiator.

Also notable: SIC vs TC's 8 categories. D&B uses a regulator-recognized taxonomy. TC's 8 categories are internal (per the brief). If a customer needs SIC for legal, TC's classification — even if perfectly accurate — won't satisfy. Worth flagging in the deck: "TC's 8-category schema is operationally useful but not regulator-recognized. The exportable transparency artifact may need to map TC categories to SIC/NAICS for legal-grade defensibility — that's an integration point, not a model accuracy problem."

6. The OCompany_* fields ARE TC's substrate — and they confirm the gap

Opened Amazon Advertising LLC's record. TC's managed package exposes a panel of fields under "Other Data":

OCompany Name, ID, Entity Type
OParent Name, ID
OUltimate Parent Name, ID
OCompany Primary Industry, Secondary Industry ← richer than the brief's 8-category framing
OCompany Employee Count, Revenue, Is Acquired
OCompany Website ✓ (populated from CSV import)

What's there: the "what" of enrichment — firmographic values, current state.

What's missing — and this IS the recommendation:
- No OCompany_Industry_Source
- No OCompany_Industry_Confidence
- No OCompany_Industry_Reasoning
- No OCompany_Last_Enriched_By_Model

The substrate captures the value. It does not capture the provenance. Recommendation closes the loop.

Bonus nuance — Primary + Secondary Industry. TC's data model has more granularity than the brief's 8-category framing acknowledges. Worth flagging in the deck or Q&A: "TC's classifier already produces Primary + Secondary tagging — that's substrate that could power richer customer-facing classification, but the surface flattens to a single field."

7. OpenAI flows fail silently — Signal 2 lived through

Several attempted runs of AI Enrichment with OpenAI as provider produced nothing:
- Active toggle stayed green
- Scheduled flows showed Deleted status without populating Next Run Time
- History Logs remained empty
- No error messages anywhere
- Azure variants (5.3-chat with and without web search) worked — same flow shape, different provider

This is Signal 2 in operation, on a candidate using the product. "I'm not sure if this is a data quality issue or if I set something up wrong" — repeated for ~45 minutes with no diagnostic surface to investigate.

What the demo gets:
- Strongest "I lived this" moment of the afternoon
- Concrete failure mode that mirrors customer-reported pain
- A failure type the recommendation specifically addresses: when classification fails, where is that recorded? Right now: nowhere visible.

Possible root causes (untriaged):
- OpenAI API key budget exhausted in the sandbox
- A specific OpenAI model deprecated / unavailable
- TC integration health issue

The root cause matters less than the diagnostic experience — silent, no logs, no clues, no path forward.

Walk Us Through Your Process angle: lead with this. "I built the same flow twice — once on OpenAI, once on Azure. Azure worked. OpenAI failed silently. No error message, no log entry, no indication of what went wrong. That's the experience your $55K customer is describing in Signal 2."

8. Dedupe / normalization upstream changes AI input

The CSV intentionally seeds name variants:
- Instagram / Instagram Inc / Instagram LLC
- Slack Technologies / Slack Technologies LLC
- GitHub / GitHub Inc
- LinkedIn / LinkedIn Corporation
- WhatsApp / WhatsApp LLC / WhatsApp International

Hypothesis: dedupe / normalization upstream changes the string AI Enrichment sees, which can change classification. Same conceptual company → different record name → different industry tag.

Test plan: with the 2 working Azure flows, compare classifications across variant pairs. Same model, same web setting. If Instagram and Instagram Inc come back differently → confirmed.

Signal connections:
- Signal 4 (data team reviews every classification): paying a tax for unpredictability across name variants
- Signal 6 (CS tickets: "Why is this account tagged X when it should be Y?"): same conceptual company → different records → different classifications
- Signal 2 (data quality vs setup confusion): two black boxes stacked — dedupe AND classification

TC's products built but not connected:
- Internal Match (dedupe), Normalized Account Name, AI Enrichment — separate features, separate logic
- AI Enrichment likely consumes raw Account Name rather than the normalized variant
- That choice isn't surfaced anywhere

Sub-recommendation candidate: AI Enrichment should consume Normalized Account Name (or at minimum expose which input it used in History Logs). Cheap, high-leverage across signals 2/4/6.

Don't make this THE recommendation — it's a supporting thread for the transparency thesis, not a replacement.

9. Field / label cognitive overload — every dropdown is a landmine

Every field dropdown across Salesforce + TC managed package + integrations exposes 100+ options:

Multiple variants of "Account Name" alone:
- Name (standard)
- Account_Name__pc (Person Account formula)
- Account_Name__p
- Account Name (Normalized) (TC custom)
- D&B Global Ultimate Account Name (D&B integration)
- Moodys Account Name (Moodys integration)
- Simplified Account Name

Configuration screens (flow context fields, field mapping, criteria builders, page layouts) all ask users to pick from these dropdowns with no contextual hints, no grouping, no "TC owns this" vs "you own this" demarcation.

This is RevOps' daily reality. They're in this forest every time they configure or troubleshoot anything. Signal 2's customer confusion is partly this — there are 100 ways to set up the same thing wrong, and the surface gives no help disambiguating.

What this means for the recommendation:
- "More fields" is not the answer. TC already has plenty.
- The inline explainability surface needs to be designed — contextual presentation, "what TC populates vs what you populate" wayfinding
- The exportable artifact should be curated — structured narrative, not a field dump
- The substrate-vs-surface thesis applies at the meta-level: TC's product is field-rich and design-poor. The recommendation adds design, not more fields.

Walk Us Through Your Process angle:
"Every dropdown click felt like a small bet on whether I'd picked the right field. Most of the time the wrong choice failed silently — I'd configure something, run it, get nothing, and have no way to tell whether the field was wrong or the enrichment was broken. That's the customer experience your Signal 2 customer is describing. The cognitive overload is part of the transparency problem."

10. UI is unforgiving AND feedback is absent — fat-fingering produces undiagnosable outputs

Beyond field volume (Section 9), the configuration surfaces themselves are unforgiving:
- Flow builder asks for context fields, target fields, model choices, reasoning effort, verbosity, web search toggles, trigger settings, entry criteria, schedule timing, batch size — all in nested panels
- One wrong picklist selection (e.g., picking Account_Name__pc instead of Account.Name, or a lookup ID type instead of a string field) silently changes flow behavior
- No validation feedback at config time — the flow saves, activates, runs, produces an output, and you have no way to know whether the output reflects the model's reasoning OR your misconfigured input
- No validation feedback at run time either — History Logs (when they exist) capture input/output values but don't flag "you mapped a lookup ID type field as context, the model received an opaque ID string"

Lived experience as the candidate:
- Spent ~45 minutes on flow configuration before realizing one context field was a lookup ID instead of a string
- Spent another ~45 minutes troubleshooting OpenAI flows that silently failed
- Discovered scheduled flows showed Deleted status with Active toggle still green — UI states inconsistent
- Could not find History Logs in TC's UI without docs (and the docs are 5 years old)

What this means for customer behavior:
- A discrepancy in classification could be the model, the input, the configuration, OR the UI lying to you about state
- Customer in Signal 2 saying "not sure if it's a data quality issue or if we set something up wrong" — they're sitting in the same UI, with the same lack of feedback, blaming themselves OR TC at random
- A "bad classification" report from CS (Signal 6) might actually be a misconfigured flow producing the bad output, but no one can tell because the surface doesn't differentiate

Recommendation implication — beyond per-record explainability:
The transparency layer needs to surface not just "why did the AI pick this industry?" but "what configuration produced this enrichment?" Both belong in the History Log structure. Both belong on the record. The audit trail has to include:
1. Inputs — which fields, with which values, were sent to the model
2. Configuration — model, settings, timestamp
3. Outputs — values + sources
4. State — was this run successful, partial, or silently failed

Without (1) and (2), customers can't tell their config from TC's classifier. With them, they can isolate the variable.

11. THE HEADLINE — three competing industry taxonomies, no reconciliation

Empirical finding from the 3-flow Azure A/B run (5.3-chat with web, 5.3-chat without web, 5.2-chat without web) on all 34 imported accounts:

31 unique industry strings produced. Examples:
Advertising Services, Cloud Computing, Music Streaming, Music, Consumer Electronics, Software Development, Information Technology and Services, Internet Services, Internet Services and Products, Technology, Social Media, Social Networking, E-commerce, E-Commerce, E-Commerce Retail, Software & IT Services, Software, Application Software, Software as a Service (SaaS), Internet, Telecommunications, Streaming Services, Online Video Sharing and Streaming, Internet Content & Information, Internet Messaging Services, Grocery Stores, Supermarkets & Grocery Stores, Supermarkets and Grocery Stores, Grocery Retail, Retail, Music

Three competing taxonomies coexist:

Taxonomy	Where it lives	Categories	Visible to customer?
Brief's "8 categories"	Marketing / interview brief	8 (Tech, FS&I, Healthcare, Manufacturing, Retail/Consumer, Media/Telecom, Prof Services, Energy)	No — only in brief
Salesforce standard Industry picklist	Account object (where enrichment writes)	32 (Agriculture, Apparel, Banking, Biotechnology, Chemicals, Communications, Construction, Consulting, ...)	Yes — visible in record
Model output	Free-text strings from LLM	31+ unique strings across 34 records	Yes — written to whatever target field is configured

Zero of the 31 model outputs match either of the other two taxonomies.

Why this is the strongest demo finding of the day:
- It's empirical (34 records, 3 columns, reproducible)
- It's not a quality issue — it's a product coherence issue
- The brief itself (the artifact selling the customer on TC) describes a taxonomy that doesn't exist in the product
- Whatever field type the customer configures (Text, picklist, custom), the model output won't match the brief's framing

Demo line:
"The brief told me TC classifies into 8 categories. The standard Industry field on the Account object has 32 picklist values. The model wrote 31 distinct free-text strings across my 34 records, none of which match the 8 OR the 32. That's three competing taxonomies in one product, with no reconciliation visible to the customer. That's not an enrichment quality problem — that's a product coherence problem."

Was the picklist constraint the issue? Custom Text(255) fields were used as enrichment targets. Even if the standard Industry picklist had been used, the model's free-text outputs wouldn't have matched its 32 values. The picklist would have either silently coerced or rejected — both are failure modes the customer can't see. The product allowed Text fields with no warning. Whatever choice the customer makes, the surface doesn't reconcile to a stable taxonomy.

Even TC's own pre-built sample flow doesn't enforce the 8 categories. When inspecting Bryan's New Lead Flow: 4/22/2026 (the sample flow shipped in the demo sandbox), the AI Enrichment step's target field was the standard Salesforce Lead.Industry picklist — the same 32-value list as on the Account object. The brief's 8 categories aren't on the Lead picklist either. The 8-category framework has now failed to appear in four places it could have lived:

Where the 8 categories could have lived	What's actually there
Standard SF Industry picklist on Account	32 standard SF values (Agriculture, Apparel, Banking, ...)
Standard SF Industry picklist on Lead	Same 32 standard SF values
Bryan's pre-built TC sample flow target field (created 4/22/2026)	The 32-value standard picklist
Model output across 34 records × 3 configs	31 unique free-text strings, none matching either taxonomy

The 8-category framework exists in the brief and nowhere in the product — not in either object's picklist, not in TC's own demo flow that the CPO built six days ago, not in the model's output. This isn't a configuration choice, and it's not Andrea's custom field. It's the product TC ships, with the demo flow TC's own CPO built, missing the taxonomy TC's own brief describes.

12. D&B integration as the synthesis — already partially built

Observation: TC's Account object already exposes integration fields:
- D&B Global Ultimate Account Name (Formula field) — D&B integration exists
- Moodys Account Name (custom field) — Moody's integration exists
- Normalized Account Name, Simplified Account Name — TC's normalization fields

D&B is referenced in Signal 3 (churn) and Signal 8 (RFPs). Customer pain about D&B is consistent: their compliance / source attribution / SIC code surface is what wins legal hand-off. TC has been positioned as a competitor against D&B but has the integration plumbing already.

The synthesis recommendation this opens:

LLM (TC's current) → free-text industry + sources + reasoning
                          ↓
                Map to closest SIC/NAICS via lookup
                          ↓
                Cross-reference with D&B (if account has D-U-N-S)
                          ↓
            Surface BOTH on the record + flag agreement / disagreement

Why this fits 6-8 weeks:

Tier	Effort	Surface
Tier 1 — Surface what's already there	2-3 weeks	For records with D&B data, show D&B's classification alongside TC's. Highlight agreement / disagreement visually. UI work.
Tier 2 — Stable taxonomy via mapping	~1 week	Static lookup mapping TC categories ↔ SIC/NAICS. Include in exportable transparency artifact. Solves the legal hand-off problem cheaply.
Tier 3 (H2)	Bigger	Use D&B as active reconciliation source. Auto-prioritize records where TC and D&B disagree as review queue.

Tiers 1 + 2 = 3-4 weeks. Fits the EM's 6-8 week constraint with margin.

Why this is competitively sharper than the original "show your work" recommendation:

"TC isn't competing with D&B on data. TC is integrating with D&B and adding the workflow layer D&B doesn't have. D&B has the compliance taxonomy. TC has the Salesforce-native UX. Together: legal gets their SIC codes, RevOps gets their inline workflow, customer doesn't choose between trust and usability."

How this restructures the three-surface scaffold (Sunday Claude proposal — still a candidate, not endorsed):

Surface	Now powered by
Inline indicator on Account record	TC's reasoning + D&B's classification + agreement signal
Review queue (records where TC and D&B disagree)	Auto-prioritized — confidence is derived, not invented
Exportable artifact for legal	TC's classification + D&B's SIC code + sources

The trust-transfer mechanic — D&B's "staleness" is the feature, not the bug.

Sharpened framing of why this synthesis works:

Property	LLM (TC's current)	D&B
Speed	Real-time	Months stale
Determinism	Non-deterministic	Deterministic (analyst-verified, source-cited)
Trust profile	Uncertain, varies by run	Audited, regulator-accepted, decades of incumbency
Why it has those properties	Skips human verification → fast	Includes human verification → slow but credible

These are complementary trade-offs, not competing axes. Cross-referencing them mechanically transfers D&B's trust to the matched LLM output:

Both agree → LLM speed + D&B credibility = highest-confidence enrichment available
Disagree → known unknown, surfaced for review (Signal 4's bulk-review queue gets automatic prioritization for free)
LLM-only (D&B has no record) → speed without verification, flagged so customer applies scrutiny
D&B-only (LLM failed) → graceful degradation to verified data (Section 7's silent-failure gets a fallback)

D&B's slowness IS the audit. Customers in regulated industries (Signals 1, 3, 5, 8) intuitively know LLM is fast because none of D&B's verification is happening — that's why they keep asking for sources/audit/confidence. TC's job isn't to choose between fast and verified. TC's job is to be the bridge that transfers verification to speed.

Demo line:
"LLMs are fast and non-deterministic. D&B is slow and verified. TC's product can be the bridge that mechanically transfers D&B's trust to the LLM's speed. Where they agree, the customer gets both. Where they disagree, RevOps gets the review queue. Compliance hand-off and workflow efficiency stop being separate problems."

Caveats to acknowledge:
- Requires the customer to have a D&B subscription. Best for enterprise (which is the at-risk segment per Signals 1, 3, 5, 8). Doesn't address SMB tier.
- D&B coverage is patchy for Bryan's expansion verticals (non-profits, hospitality). Architecture needs LLM-only fallback with appropriate trust signaling.
- "No D&B record" is its own surface state — not a failure, an honest reflection of evidence available.

What this absorbs from elsewhere in the file:
- Section 5 (D&B as benchmark): D&B's compliance edge → now leveraged, not competed with
- Section 6 (OCompany_* fields are substrate without provenance): D&B provides provenance per fact
- Section 11 (three competing taxonomies): SIC/NAICS becomes the stable cross-reference
- Section 4 (legal vs UX users): both served, with one shared substrate

13. Design principle — subtract, not add (the meta-thesis)

A meta-observation that reframes how the recommendation should be presented.

The tension running through Sections 4, 6, 7, 9, 10, 11:
- Substrate IS missing (sources, confidence, reasoning, model, config audit) — Sections 4, 6, 7
- Surface is ALREADY overloaded (100+ fields, silent misconfiguration, conflicting taxonomies, every dropdown a landmine) — Sections 9, 10, 11

The reflexive answer ("expose the substrate") is wrong. Adding 5 sidecar fields per classification (Industry_Source, Industry_Confidence, Industry_Reasoning, Industry_Model, Industry_Settings) makes Section 9's cognitive overload worse, not better. RevOps already drowns in field clutter. More substrate exposure on the same dense surface is theater — and theater that loads more cognitive cost.

The right framing: substrate enables a simpler, more trustworthy surface.

Capture all of it — sources, confidence, reasoning, model, settings — and use that substrate to power a UI that shows less by default, with depth available on demand:

Scenario	Default surface	Drill-down (one click)
TC + D&B agree on Industry	One value + green ✓ "verified"	Both reasonings, sources, agreement evidence
TC + D&B disagree	Conflict indicator + RevOps action prompt	Both classifications, sources for each, reasoning trace
D&B has no record on this account	TC's value + "single source — review recommended" amber tag	TC's full reasoning, model used, sources
Classification failed (cf. Section 7)	Visible failure state with diagnostic surface	Error type, last attempt, suggested fix

The default daily view gets quieter, not noisier. The substrate is doing all the work to enable that quiet.

Configuration UI gets the same treatment. The 9-flow matrix exposed Provider × Model × Reasoning Effort × Verbosity × Web Search as 5 independent dimensions with no defaults, no presets. That's admin-as-engineer, not admin-as-RevOps. Subtract here too:
- 2-3 presets ("Fast / Balanced / Comprehensive") visible by default
- Underlying config available but collapsed
- Most customers consume defaults; power users get the dials

Foundation observation (why subtract matters): user error is everywhere across this surface. Old/irrelevant docs without current screenshots. Salesforce UI's intrinsic density. Layered managed packages adding fields. Multiple objects, fields, relationships. Configuration screens with no validation feedback. Every additional surface element multiplies the chance of silent misconfiguration. The customer's "I'm not sure if it's a data quality issue or if I set it up wrong" (Signal 2) IS user error — and the UI architecture is making user error inevitable.

Why this lands with all three panelists:

Bryan (customer voice): RevOps isn't asking for more dropdowns. They're asking for clarity. Signals 1, 2, 4, 6 are all variations of "reduce my cognitive load." Subtract directly answers that.
Scott (Director of Product Design): "Subtract not add" is core Apple / Tufte / Norman product thinking. Design principle as the lever, not "more transparency" as the lever. He'll appreciate that the recommendation isn't UX-cosmetic.
Ernesto (CTO, architecture): Substrate engineering is still required — just powering a simpler surface, not surfacing itself directly. He doesn't lose the work; he gets a cleaner build target.

The recommendation, rephrased one more time:

"The substrate work isn't to expose more. It's to enable less. Capture sources, confidence, reasoning, model — all of it — and use that substrate to support a UI that shows ONE classification value, with a verification indicator, and a one-click drill-down for the cases that warrant scrutiny. The customer's daily experience gets simpler; the audit trail gets stronger; both happen at once. That's the design principle — subtract, not add."

Sub-observation tied to data model legibility: beyond surface simplification, the data model itself needs to be legible — the customer needs to know which fields TC owns vs. they own vs. are derived. Today's UI doesn't differentiate. That's a labeling/grouping problem, not just a "fewer fields" problem.

14. Confidence is the unifying primitive

The single derived signal that ties the whole recommendation together. Re-reading earlier sections through this lens:

Section	What it points at	How "confidence" resolves it
6 (OCompany substrate, no provenance)	Substrate captures values, not how sure	Confidence is the missing field
7 (OpenAI silent failure)	No diagnostic surface when classification fails	Failed = "no confidence available," visible state distinct from "low confidence"
8 (dedupe → classification drift)	Same entity, different name → different output	Variance across name variants signals low confidence
11 (three competing taxonomies)	31 unique strings across 34 records	High variance across runs/models → low confidence
12 (TC + D&B synthesis)	Cross-reference creates trust	Agreement = high confidence; disagreement = low
13 (subtract, not add)	Surface is overloaded	One indicator (confidence) drives all default surface decisions
Signal 4 ("bulk overrides with confidence filtering")	Manual review of all records is killing turnaround	High-confidence records skip review automatically
Signal 1, 3, 5 (compliance / legal asking RevOps)	Need defensible classification	High-confidence-with-D&B-agreement is the defensible export
Signal 2 (data quality vs. setup)	Customer can't diagnose	Visible failure states + confidence reasoning gives them a thread to pull

Confidence-driven surface (the default RevOps view):

State	Surface	Drill-down available
High confidence (TC + D&B agree, sources cited)	Green ✓ verified — just the value	Both reasonings, sources, model, settings
Medium confidence (TC alone, D&B no record)	Amber single-source — review optional	TC's full reasoning, sources, suggestion to validate
Low confidence (TC + D&B disagree, OR high variance across runs)	Red conflict — review prioritized	Both classifications, side-by-side reasoning, action prompt
Failed (Section 7)	Gray diagnostic state — error type visible	Last attempt, error context, suggested fix
No data	"Not yet enriched"	Trigger or schedule action

Why this is the strongest framing for the panel:

Signal 4 customer asked for "bulk overrides with confidence filtering" — that's a solution they named. The underlying problem is "review is killing turnaround." Confidence as primitive solves that problem AND adjacent ones (legal hand-off, silent failure diagnostics, taxonomy stability) — none of which a feature-shaped "bulk override + confidence filter" UI would.
PM discipline visible: building the primitive, not the asked-for feature. Bryan's brief literally rewards this — "the candidates who do best are the ones who can tell us what they chose, what they didn't, and why." This is "what I didn't choose: I didn't build the feature Signal 4 named."
Architecture clarity: confidence is derived, not invented. Inputs are LLM output variance, D&B agreement, source count, model uncertainty. Ernesto-friendly — there's no hand-waving about how the score gets computed.
Design clarity: one signal driving the surface is "subtract" in action (Section 13). Scott-friendly.

Demo narrative arc this enables:

Problem framing: customers across signals 1, 2, 3, 4, 5, 6, 8 are all asking the same question in different forms — "can I trust this classification?" — but TC's product gives them no way to answer. Today's surface conflates "TC said so" with "this is reliable."
Recommendation: capture substrate (sources, model, D&B cross-reference) and use it to compute a confidence signal that drives a simpler, more trustworthy surface.
Three concrete surfaces (Section 12 mapping): inline indicator on record, confidence-prioritized review queue, exportable artifact for legal.
Validation: can be tested before code with customer interviews ("if you saw this confidence state on this record, would you skip review or escalate?"). Falsifiable, prototype-able.

Open threads / things to come back to

Test the Instagram / Instagram Inc hypothesis — same model, same web setting, compare across variant pairs (Section 8)
Diagnose OpenAI failure — Test Authentication on the OpenAI integration; capture screenshot whether it passes or fails
Confirm what AI Data Enrichment Target Fields dropdown actually exposes (citations? sources? raw response? or only structured firmographic outputs?)
Confirm what History Logs structure preserves per execution (input/output only, or also model + settings?)
Empirical results from the (now Azure-only) flow matrix — pending