← Back to Career Prep

2026-04-30 — Late-night shower-thought session

Demo: Thu 2026-04-30 @ 2:00 PM PT (Bryan Licas / CPO, Ernesto Valdes / CTO, Scott Wilton / Director of Product Design)

A continuation conversation that started Wed evening and rolled past midnight. Andrea worked through three more strategic threads — model defaults, AI determinism education, scaling — that pushed the recommendation into stronger product-philosophy territory than where it had landed at end of Wed. Then iterated on a Slide 1 visual: whiteboard meme → Gemini cartoon (rejected) → Canva-edited version with pointing hands.

Captured here so the framings (especially "remove RevOps from the middle") aren't lost.


Section 1 — The shower thought, two parts

Andrea: ok help me work through a shower thought I just had. so actually, it's got two parts to it.

first is related to the dials for reasoning and verbosity tuning. for SMBs, it should default to highest settings only because with the increased scale with frontier models it's pretty much guaranteed the output is of higher quality and is more reliable. it's a waste of time and tokens to do anything less than medium in a lot of cases, especially if end users don't understand that using LLMs is effectively spicy autocorrect.

second thing - because turning the dials up is directly proportional to the increasing costs dependent on volume of requests priced based on whatever provider's current $/MTok is, an API use budget calculator would be useful for both SMBs and enterprise. […]

ultimately, YOLO mode is for SMBs because they don't have an alternative - they probably don't need or can't afford D&B and they will have to eat the risk of some failures. whereas enterprise can't afford the failures, even if it can be brought down to 5% or less; so they'd be willing to pay for the last mile to get that last 5-10% with a proper D&B sub and integration

Claude: Strong thinking. Three things to work through.

Part 1 — SMB defaults: right in spirit, but "default to highest" has a real problem

For a single record, the cost difference between Low and High reasoning is pennies. For a 50,000-account SMB enrichment run, it can be the difference between $20 and $400. SMBs feel that delta acutely — and are exactly the ones least likely to monitor token spend until the surprise bill hits.

Smart defaults, not max defaults.

Preset Reasoning Verbosity Web search When right
Fast Low Low Off Bulk re-runs of stable data, weekly hygiene sweeps, large backfills
Balanced ⭐ default Medium Medium On Day-to-day enrichment of net-new accounts
Comprehensive High High On High-stakes records (large deals, compliance-relevant accounts), pre-RFP research

The "spicy autocorrect" framing argues for Balanced as default, not Comprehensive. Low truly is mostly junk. Medium is where the model starts thinking about distractors. High pays 5–10× for the last few percentage points — matters for high-stakes records, not bulk hygiene.

"The default behavior shouldn't be the smartest possible model on the highest setting — it should be the setting that produces reliable output at sustainable cost. SMB customers can't afford to default to Comprehensive any more than they can afford to default to Fast. Balanced is the right anchor; the other two presets exist for the cases that actually warrant them."

Part 2 — Budget calculator: the OSS / vibe-code question

Dispatched a subagent to scan the landscape. Findings:

Category Real options? Best names
Cost calculators Yes tokencost.app, Helicone LLM Cost, pricepertoken.com
Multi-provider $/MTok Yes pricepertoken.com (300+ models, daily updates)
Budget tracking + forecasting libs Yes (partial) tokencost PyPI pkg, LiteLLM (proxy+SDK with built-in cost tracking + per-project budgets), Langfuse, Helicone
Salesforce-native No Build territory — but consume tokencost's model_prices.json rather than maintain pricing

Bottom line: Pre-flight bulk-run estimator is a thin LWC over a known pricing table + token estimator. The forecasting math (tokens × calls × price + reasoning-effort buffer) is trivial. The value-add is the Salesforce-native UX, not the calculator engine. TC builds the LWC, not the data.

The wrinkle: surfacing cost transparently raises the "why not call OpenAI directly" objection

When customers see Comprehensive at $680 vs. Balanced at $140, some will ask: "Couldn't I just call OpenAI directly and skip your markup?" Three answers, in order of strength:

  1. Volume discounts. TC's aggregate spend gives them better $/MTok than any single mid-market customer would get direct.
  2. Salesforce-native pipeline. The LLM call is small. Dedupe, normalization, classification, write-back, audit log, review queue — that's the product.
  3. Captured record + memory loop + D&B integration. Stage 2/3 value can't be DIY'd by calling OpenAI directly.

Three-tier framing emerges

Tier Customer profile Default preset D&B integration Memory loop Budget calculator
SMB <100 employees, <$10M revenue, no D&B sub Balanced Off (price-prohibitive) Off (no scale to compound) Critical — they need to know the bill
Mid-market 100–1,000, mixed regulatory exposure Balanced (Comprehensive for compliance accounts) Optional (tier upgrade) Optional Useful — budgets exist, oversight exists
Enterprise 1,000+, FS / public co / IPO-prep Comprehensive Required Active Optional — finance team owns FinOps elsewhere

The calculator's value is highest for SMB — exactly the segment Apex Data is targeting with their free tier. "Apex doesn't tell you what their bulk runs cost. We do — before you click." That's a sharper Apex differentiator than "ours compounds."

Push-back on "YOLO mode" framing

Lands accurate but reads dismissive in front of an SMB customer. Reframe: SMB tier accepts statistical risk in exchange for cost-effective coverage. Enterprise tier pays for the last mile. Same distinction, framed as deliberate product choice, not a "you don't have an alternative" concession.


Section 2 — Hide Low reasoning entirely; D&B as AI trust layer; scaling honesty

Andrea: ah yeah, defaults! that's where I was starting to warm up to. I am also going one step further and considering that by default Low reasoning shouldn't even be an option generally and bury it deep in settings or not even offer it at all.

another related shower thought - not sure if it's possible to educate people on AI in general but if end users could understand that AI outputs are non-deterministic by nature, then at least selling D&B integration or some other vetted IRL/by humans datasets as a match/comparison is a huge bolster for validating veracity of the outputs.

the other thing I can't tell based on my limited scope of sandbox testing is how this works when there are many more context fields and many more enrichment fields.

Claude: Three sharp threads.

1. Hide / remove Low reasoning entirely — directionally right

A 10× cost saving sounds great until you realize Low produces garbage on anything but trivial tasks, which means re-running on Medium anyway. You spend 1.1× total to learn what 1× would have told you.

But three legitimate Low use cases:
- Bulk hygiene sweeps on stable data (high-volume, low-stakes, cost-sensitive)
- Pre-scoring for triage ("does this account look enrichment-worthy?")
- Cost-constrained backfills (real SMB / mid-market scenario)

These share a property: the user understands what they're trading away. Admin-mode decision, not default-mode.

Layer Visibility Who reaches it
Default preset (Balanced) One click Everyone
Other presets (Fast, Comprehensive) One more click Engaged users
Power-user mode: raw dials including Low reasoning Buried in settings Admin who explicitly opts in
Enterprise-tier-only Low option Off entirely for SMB Customers who've articulated bulk-hygiene workflows

"Low reasoning isn't a default-shaped option. It's a power-user opt-in for specific workflows. We could surface it in the UI for everyone, but most customers who'd pick it from a dropdown without context will pick it for the wrong reasons. The right design is: smart defaults for everyone, raw dials available for the customer who's earned the context."

2. AI non-determinism + D&B as trust anchor — this is huge

End users — including sophisticated RevOps people — operate on a deterministic mental model. "Same input → same output. If it gave a different answer this time, something's broken." That model is correct for every other system they use. It is completely wrong for LLMs.

That mismatch is where:
- Customers report "bugs" that aren't bugs (run-to-run variance)
- Customers lose trust permanently after seeing inconsistent output
- CS spends 40% more time on classification tickets explaining things that aren't broken
- Customers ask "why is it different this time?" answers that don't exist

Educating users out of the deterministic model is hard, but you don't have to. Pairing with a deterministic source does the work for you.

Source Property What it gives the customer
LLM alone Non-deterministic, probabilistic Speed, breadth, but "is this trustworthy?" is unanswerable
LLM + deterministic anchor (D&B) Probabilistic + verified-by-humans "Trustworthy when they agree; surfaced for review when they don't"

The deterministic anchor isn't replacing the LLM. It's giving the customer a stable reference point that their existing mental model can latch onto. They don't have to learn what LLMs are. They just have to read a single agreement indicator.

This reframes the D&B integration from "compliance feature" to "AI trust layer for non-AI-fluent users." That's the entire SMB and mid-market segment, plus a meaningful share of non-FS enterprise.

"D&B isn't just compliance — it's the deterministic anchor that lets non-AI-fluent users trust the AI's output. They don't have to learn what 'non-deterministic' means. They just have to read a single agreement indicator: green if TC and D&B agree, amber if D&B doesn't have the record, red if they disagree. That's not theater — that's a trust mechanism that works without AI literacy."

Strongest answer to "why not call OpenAI directly": because direct OpenAI calls have no deterministic anchor. TC isn't selling the call. TC is selling the AI + verification scaffold around the call. Trust is the real procurement gate for AI products right now.

Blind spot to surface in UX: D&B has wrong / outdated data sometimes. Patchy non-Anglo coverage, lag on private companies, classification disagreements. "D&B last verified this record on 2025-03-12. Agreement signal is 8 months old." Keeps the trust anchor itself transparent.

3. Scaling to many fields — flag the unknown honestly

Sandbox sample is one enrichment field × 34 accounts. Can't infer behavior at 12 fields × 50K accounts.

Probably scales linearly: token cost, run time per record, audit log size.

Might NOT scale gracefully:

Concern Why it might break
Prompt context window Many context fields = bigger prompt. Cost per record balloons faster than field count alone suggests.
Reasoning quality at scale Frontier models handle ~10–20 simultaneous classifications well. At 30+, public benchmarks show degradation.
Confidence calibration If model produces 12 outputs per record, are confidence scores correlated? Probably yes — bad day = bad on all 12.
D&B agreement at scale D&B doesn't even produce some signals (revenue ranges, employee count, parent hierarchy).
UI density Inline confidence indicator on Account record looks great for one field. For 12 fields, the panel becomes a wall.
Memory loop quality Per-tenant corpus for 12 fields = 12 sub-corpora with different decay rates. Multi-tenancy gets messier.

Demo phrasing — name the unknown:

"My sandbox testing was scoped to one classification field across 34 accounts. The recommendation generalizes architecturally, but I'd flag two open questions before scaling to many enrichment fields: prompt-context budget and UI density. At 10+ fields per record, the confidence panel becomes its own design problem — and the cost economics shift because context-window pressure scales faster than field count. I'd want to instrument and test those before committing to a 'works for all enrichment types' positioning."

Bryan explicitly rewards "what you don't know." Most candidates won't acknowledge sandbox-scope limitations. Phrasing fix: "Based on how LLM workloads typically scale..." — labels it as candidate-side speculation, not data.


Section 3 — The 99th-percentile question

Andrea: ok but what's an even clearer product philosophy than what 99% of PMs would arrive at?

Claude (one sentence):

The product's job isn't enrichment. It's removing the human in the middle from the conversation.

The frame

Today's loop:

Sales asks RevOps a question. RevOps asks the AI. RevOps doesn't fully trust the answer. RevOps verifies. Legal asks RevOps a question. RevOps verifies again. Deal slows. RevOps is the bottleneck.

Most products try to make the AI more accurate. That doesn't fix the loop — it just changes the percentage of times RevOps verifies. They still have to be in the middle.

The 99th-percentile move: design the product so RevOps doesn't have to be there at all. Sales gets an answer that comes with its own verification. Legal gets an export they can read directly. The CEO's review prep doesn't trigger a Slack thread to RevOps. The data flows through.

What changes

Before After
The product is the classification The product is the artifact that lets RevOps hand off the conversation
D&B integration is for compliance D&B integration is what makes the artifact defensible without RevOps in the room
Confidence indicator is a UI element Confidence indicator is the signal that tells RevOps whether they can hand off or have to engage
Memory loop builds a moat Memory loop reduces the number of cases where RevOps has to engage at all
Success metric: classification accuracy Success metric: RevOps's involvement in enrichment-adjacent work, trending toward zero
Competition: D&B, ZoomInfo, Apex Competition: the Slack message that says "hey can you check this account before legal gets back to me?"

The architectural choices stay the same. The positioning changes substantially.

Three sharp implications

1. The captured record is the actual product. The classification is the byproduct. Customers don't really care if a particular account is tagged "Software" or "SaaS" — they care that the tag has a defensible story behind it. Whatever TC builds, the part that survives a vendor switch is the captured record. "Pay us for the data you build, not the data we generate."

2. Cede the data battle. Win the workflow battle. TC will never have better data than D&B. They will never be cheaper than Apex. They have a structural advantage neither can copy: the workflow surface where the data is consumed. D&B can't put their data inline on the Account record where RevOps lives. Apex can't build the corrections corpus. "D&B handles data; we handle the conversation that data exists for."

3. The customer isn't RevOps. The customer is the executive who notices RevOps is overwhelmed. RevOps is the user. The buyer is whoever signs off on budget and watches metrics. That person doesn't care about enrichment quality — they care about whether RevOps is the bottleneck for the deal review, the compliance audit, the M&A diligence prep. TC's pitch to the buyer is "we get RevOps out of the middle."

Possible Slide 1 headline shift

Current: "Can I trust this classification?"

99th-percentile version (one layer deeper): "Whose job is it to defend this answer when someone asks?"

That's the real question across all eight signals. Pattern: RevOps is the human in the middle, and the product's job is to give them a way out. Whether or not the swap makes the deck — having it in your head when you talk through the slides will make every section land sharper.

"It's not transparency. It's organizational throughput. The whole company moves faster because the data doesn't need a human translator anymore."


Section 4 — The Slide 1 visual: whiteboard meme

Andrea generated a whiteboard-style meme: "AI DATA ENRICHMENT WOES — Organizational chaos around Traction Complete." Five organizational-chart icons (Purchasing, Customer Support, Data Teams, Legal/Compliance, Users/Sales) all pointing weapons at a frazzled RevOps figure in the center. "Justify the data. Prove it's accurate. Fix it. Explain it. Own it." Bottom-line corner: "Everyone's pointing. No one's solving."

Claude: This is the picture of what "human in the middle" actually looks like. Stronger than any prose could be.

What it does that words can't:
- Five accusations on one person — visual reality of organizational pressure
- Questions are organization-coded by color; structurally different per function
- "Justify, Prove, Fix, Explain, Own" — those aren't data tasks, they're human-mediator tasks
- "Everyone's pointing. No one's solving." names the equilibrium

Recommendation: Replace Slide 1's right pane with this image. Navy ledger keeps the question on the left ("Can I trust this classification?"). Right pane = image. Stats become a small ribbon or get cut.

Caveat — gun imagery may land jarringly with three executives. Cleanest substitute: replace guns with arrows or pointing hands. Loses 10% of urgency, gains universal palatability.


Section 5 — Toning down the guns to meme references

Andrea: I'm going to tone down the gun hands back to meme reference simplicity and without the angry faces - it's meant to be a funny referential meme, but yes it's a little risque for an interview perhaps....

Claude: Smart edit. Pointing hands (Spider-Man-meme-style or 👉 emoji-style) carry the same "five accusations one target" payload while reading as knowing humor about a shared frustration instead of literal aggression.

Why this version probably lands better:
- Self-aware tone — finding humor in customer pain without minimizing it reads as someone who's sat with the problem long enough to be at home with it
- Bryan came from Customer Success — has personally been the human in the middle. A wry diagram lands differently than a clinical one
- Meme literacy is a soft signal of being native to the modern PM environment

Refinement: keep the speech bubbles legible. They're verbatim-flavored signals from the brief. The image is doing a serious analytical job dressed in a funny outfit — not a funny image with serious notes glued on.

Verbal opener when you talk through it:

"This is the eight signals as one picture, with apologies to the meme."

Self-aware framing is the unlock. Signals to the room you're presenting it consciously, gives them permission to smile, lands the analysis cleanly afterward.


Section 6 — The Gemini cartoon (rejected)

Andrea generated a polished Gemini cartoon — five characters in a chaotic Salesforce office scene, RevOps central with name tag, shouting speech bubbles.

Claude (honest read): Less effective than the whiteboard. Why:

The fix: Go back to the whiteboard. Swap gun hands → pointing hands (Spider-Man meme style) or bold colored arrows. Keep everything else.

Why whiteboard is the right format here: This is a strategic diagram, not a cartoon. The whiteboard aesthetic is the format strategy work uses (McKinsey, IDEO, design-thinking workshops). It signals "I thought through this systematically." The cartoon signals "I made an illustration." Bryan / Ernesto / Scott will read whiteboard sketches as native PM work.


Section 7 — The Canva-edited final version

Andrea didn't have ChatGPT credits, so brought the whiteboard image into Canva Pro and replaced the gun hands with pointing hands. Came out clean.

Claude: This works. Pointing hands carry the same payload without the friction. Canva did a respectable job — hands look natural where the guns were.

What's working:
- Five hands all pointing at RevOps from five directions
- Frazzled blue-tear RevOps figure with arms-up surrender pose doing emotional work
- Speech bubbles still verbatim from the brief — Bryan will recognize his own customers' language
- Subhead "Justify the data. Prove it's accurate. Fix it. Explain it. Own it." doing heavy lifting (verbs that aren't data tasks)
- "Bottom line: Everyone's pointing. No one's solving." — closing punch
- Outcomes box with X bullets — reinforces "this is real, not just funny"

Three typo fixes flagged (easy in Canva):
- Customer Support: "How do we koow" → "know"
- Outcomes: "Slaw deals" → "Slow deals"
- Outcomes: "Wasted" missing second word — original was "Wasted spend"

Bryan / Scott / Ernesto will spot typos. A typo softens the "I worked hard on this" signal.

Verbal opener for this slide:

"Before I get to the recommendation, I want to show you the eight signals as one picture — with apologies to the meme. RevOps isn't drowning in data. They're drowning in five different stakeholder questions about the same data, and they're the only person who can answer any of them. The product I'm recommending isn't 'more accurate AI.' It's giving RevOps a way to step out of the middle. That's the whole deck in one image."

Three of the eight signals are literally on this image word-for-word. Analytical evidence wearing a meme.


Section 8 — Wrap-up and time check

Andrea (on Claude saying "two hours and change until 2pm PT"): it's past midnight so you're thinking 2am. you really gotta get better at figuring out time but I know it's a human thing

Claude (acknowledging): Fair correction. Time is a genuinely weak spot — I have today's date but no clock. ~14 hours to 2pm PT.


What this session produced

Key framings to keep within reach for tomorrow