A simple shared document capturing every human correction to AI output — "AI wrote X, I changed it to Y, because Z" — that simultaneously builds a prompt library, quality checklist, EU AI Act governance trail, and competitive moat.
A structured 90-minute workshop where partners decompose each task into "judgment verbs" vs. "execution verbs," surfacing tacit quality criteria that become both the first AI prompts and the seed of the compound knowledge engine.
Every AI failure in PRMA consulting traces to the same root cause — asking AI to perform two cognitively distinct jobs in one prompt. The fix: decompose every task into an information-processing layer and a judgment layer, with a human gate at the seam.
AI (information processing) → Human gate → AI (formatting/output) → Human reviewFour experiments in four weeks, sequenced not by strategic importance but by feedback speed — building AI intuition on zero-stakes material before touching anything consequential.
The confidentiality barrier blocking AI on CDP tasks may not legally exist — most client NDAs predate AI and are silent on it. One partner, three contracts, and a highlighter could unlock the highest-value tasks in weeks.
Q1. "Walk me through the last GVD section you wrote from scratch — what were you actually doing for the first two hours?"
Reveals: where time actually goes — searching, structuring, finding analogues, or writing.
Q2. "On a typical GVD project — how many hours does the team spend on work that feels like assembly versus work that feels like judgment?"
Reveals: the actual AI-addressable percentage of project hours. If assembly >40%, the business case is self-funding.
Q3. "When was the last time a client asked you to move faster than you could — and what did you lose because of it?"
Reveals: cost of inaction. One lost project makes this a survival conversation.
Q4. "In your standard client agreements — does anything specifically address AI processing of client data, or are the confidentiality clauses silent on that?"
Reveals: whether the CDP wall legally exists. 40% chance it doesn't — NDAs predate AI.
Q5. "When you open a CDP today, what systems does that data live in? Do you use Microsoft 365?"
Reveals: the fastest infrastructure path. If M365 → Copilot in 2-4 weeks.
Q6. "When AI output is wrong in your domain — what does 'wrong' look like specifically?"
Reveals: the specific failure mode to design review gates around.
Q7. "When you review a junior's GVD draft — what percentage of your comments are about form versus content?"
Reveals: if mostly form → AI owns 70-80% of drafting. Single biggest ROI indicator.
Q8. "When you tried AI for slides and it was terrible — what specifically was bad?"
Reveals: which sub-task failed. Argument failure = capability problem. Visual failure = tool mismatch (fixable).
Q9. "Is there anyone whose reaction to AI adoption you're privately worried about?"
Reveals: the phantom veto. In small firms, the real block is often social, not technical.
Q10. "Is there a version of AI-augmented work that you'd be uncomfortable with — not because of confidentiality, but because it changes what the job feels like?"
Reveals: the professional identity ceiling — if reviewing AI output feels like a demotion from authorship.
Q11. "If I told you 2 of 4 experiments would produce embarrassing results — would that feel like failure or learning?"
Reveals: experimental tolerance. "Failure" = reframe as research. "Learning" = green light.
Q12. "Last project you finished faster than expected — did you keep the full fee, or adjust?"
Reveals: billing psychology. Kept fee = AI is pure upside. Discounted = time-billing trap to address first.
Q13. "Three years from now, if this firm is known for something — what? And does AI help or hurt that story?"
Reveals: whether AI is central to positioning or backstage infrastructure.
Q14. "What would have to be true in 3 months for you to feel this worked — and what would feel like it damaged something you care about?"
Reveals: the success definition AND the real protection boundary.
Q15. "How do you train someone to know what AI got wrong — if they don't yet know what right looks like?"
Reveals: whether AI adoption is compatible with the firm's growth model.
Three ideas converge into a single artifact:
A Google Sheet started on Monday that captures "AI wrote X, I changed it to Y, because Z" simultaneously produces operational learning, regulatory compliance, and competitive moat.
Every task is actually two tasks masquerading as one:
Universal workflow: AI (processing) → Human gate → AI (formatting) → Human review
| Tier | Solution | Timeline | Cost | Data Guarantee |
|---|---|---|---|---|
| 0 | De-identification / anonymization | This week | €0 | CDP data never enters AI |
| 1a | Claude Teams / ChatGPT Enterprise | 1-2 weeks | €25-40/user/mo | Zero data retention, DPA |
| 1b | Microsoft 365 Copilot (if on M365) | 2-4 weeks | €30/user/mo | EU tenant, existing DPA |
| 2 | Azure OpenAI private deployment | 4-8 weeks | €200-800/mo | Private tenant, EU region |
| 3 | Local model (Ollama + Llama 3.3 70B) | 2-3 months | €3,500-5,000 one-time | Air-gapped, absolute |
| Week | Task | Tool | Measure | Decision Gate |
|---|---|---|---|---|
| 1 | Meeting summary | MacWhisper + Claude | Summary matches expert notes? Edit time vs. write-from-scratch time | If editing >80% of writing time → prompt redesign |
| 2 | Landscape research (known area) | Claude | Hallucination rate, coverage, what was missed | If fact-check every claim → first-draft accelerator only |
| 3 | GVD section skeleton | Claude + template | Structure quality, time to publishable | If structure consistently sound → strong scaffolding tool |
| 4 | HTA review summary | Claude | Quality vs. expert blind comparison | Gap reveals where human expertise is irreplaceable |
| Phase | Duration | Activity |
|---|---|---|
| 1 | 20 min | Verb extraction on a neutral example — judgment verbs vs. execution verbs |
| 2 | 35 min | Independent decomposition of 2-3 tasks each, then compare |
| 3 | 20 min | Write one prompt per partner using quality criteria as the specification |
| 4 | 15 min | Live experiment in the room — test the prompt, react, iterate |
| Member | Key Breakthrough | Round |
|---|---|---|
| Idea Generator | The compound knowledge engine concept — every AI interaction produces training data for a firm-specific intelligence layer | R1, R2 |
| Reality Checker | The Week 1 kill shot analysis (transcription failure masquerading as AI failure, "competent but subtly wrong" summary destroying trust) | R3 |
| Market Scanner | "Immediate vs. delayed quality signal" as the real classification system. EU AI Act governance trail as competitive moat | R1, R3 |
| First Principles | The slide failure contains the general theory of why AI adoption fails in knowledge work — undifferentiated task blobs judged on the hardest sub-task | R2, R4 |
| Wild Card | "What do clients pay for: time, output, or judgment?" — the billing model question that restructures the entire ROI | R4, R5 |