AI Adoption Strategy for PRMA Consulting

Session zk6mw0 · Duration: 15.2 min · Board: Idea Generator, Reality Checker, Market Scanner, First Principles, Wild Card

Top 5 Ranked Ideas

The AI Error & Correction Log (Compound Knowledge Engine)

A simple shared document capturing every human correction to AI output — "AI wrote X, I changed it to Y, because Z" — that simultaneously builds a prompt library, quality checklist, EU AI Act governance trail, and competitive moat.

Target user: Krzysztof and Ania (partners) + any future junior consultants
Value proposition: Costs nothing to start today. In 18 months produces: a PRMA-specific prompt library, an EU AI Act human oversight audit trail, a quality training document for new hires, a defensible asset no competitor can replicate retroactively
First validation step: Create a Google Sheet on Monday with columns: Date | Task type | Prompt used | AI output excerpt | Human correction | Reason | Quality score 1-5
Biggest risk: Capture discipline dies after Week 2 when it feels like overhead. Fix: make it part of the experiment protocol, not a separate task

The Task Decomposition Workshop (90-Minute Pre-Sprint Foundation)

A structured 90-minute workshop where partners decompose each task into "judgment verbs" vs. "execution verbs," surfacing tacit quality criteria that become both the first AI prompts and the seed of the compound knowledge engine.

Target user: Krzysztof and Ania — done together, produces shared vocabulary
Value proposition: Prevents every kill shot identified in the session. Partners walk out with: 2 tested prompts, a 3-layer decomposition map of all 7 tasks, an adoption decision protocol, and Week 1's experiment defined
First validation step: Schedule 90 minutes. One neutral example (landscape research), then partners decompose 2-3 tasks independently using verb extraction method. Test one prompt live before leaving the room
Biggest risk: Partners skip the workshop because they want to "just try the tool." The workshop IS the adoption accelerator — skipping it guarantees the Week 1 kill shots fire

The Two-Job Failure Pattern (Universal Workflow Design Principle)

Every AI failure in PRMA consulting traces to the same root cause — asking AI to perform two cognitively distinct jobs in one prompt. The fix: decompose every task into an information-processing layer and a judgment layer, with a human gate at the seam.

Target user: Anyone designing AI workflows for knowledge work
Value proposition: Retroactively explains the bad slide experience, prevents future failures across all 7 tasks, teachable in 10 minutes. Universal workflow: AI (information processing) → Human gate → AI (formatting/output) → Human review
First validation step: Apply the two-job lens to the slide workflow. Prompt Claude for argument skeleton only, approve it, then format separately
Biggest risk: The pattern is so simple it gets dismissed as obvious — most AI adoption failures in professional services are exactly this pattern unrecognized

The 4-Week Adoption Sprint (Sequenced by Feedback Speed)

Four experiments in four weeks, sequenced not by strategic importance but by feedback speed — building AI intuition on zero-stakes material before touching anything consequential.

Target user: The full PRMA team
Value proposition: Builds confidence through controlled exposure. Week 1 = meeting summaries, Week 2 = landscape research, Week 3 = GVD skeleton, Week 4 = HTA review
First validation step: Calendar-block all four experiment slots before the sprint starts. Each slot: 90 minutes, named owner, specific task, written evaluation rubric
Biggest risk: Three kill shots: transcription pipeline failure masquerading as AI failure, experiments deferred by billable work, only one partner runs experiments

The CDP Wall as a 1-Week Legal Question

The confidentiality barrier blocking AI on CDP tasks may not legally exist — most client NDAs predate AI and are silent on it. One partner, three contracts, and a highlighter could unlock the highest-value tasks in weeks.

Target user: Krzysztof (likely the contract-reviewing partner)
Value proposition: Could flip CDP gap analysis and evidence strategy from "blocked" to "available" within 2-4 weeks
First validation step: This week: Krzysztof reviews 3 client contracts looking for AI/data processing language (2-hour investment)
Biggest risk: Contracts explicitly restrict third-party processing, requiring Azure OpenAI private deployment (4-8 weeks, €500-2,500 setup)

Prioritized AI Use Case Map

Discovery Questions (15 Questions, 6 Themes)

Workflow & Pain Points

Q1. "Walk me through the last GVD section you wrote from scratch — what were you actually doing for the first two hours?"

Reveals: where time actually goes — searching, structuring, finding analogues, or writing.

Workflow & Pain Points

Q2. "On a typical GVD project — how many hours does the team spend on work that feels like assembly versus work that feels like judgment?"

Reveals: the actual AI-addressable percentage of project hours. If assembly >40%, the business case is self-funding.

Workflow & Pain Points

Q3. "When was the last time a client asked you to move faster than you could — and what did you lose because of it?"

Reveals: cost of inaction. One lost project makes this a survival conversation.

Confidentiality & Data

Q4. "In your standard client agreements — does anything specifically address AI processing of client data, or are the confidentiality clauses silent on that?"

Reveals: whether the CDP wall legally exists. 40% chance it doesn't — NDAs predate AI.

Confidentiality & Data

Q5. "When you open a CDP today, what systems does that data live in? Do you use Microsoft 365?"

Reveals: the fastest infrastructure path. If M365 → Copilot in 2-4 weeks.

Quality & Failure Modes

Q6. "When AI output is wrong in your domain — what does 'wrong' look like specifically?"

Reveals: the specific failure mode to design review gates around.

Quality & Failure Modes

Q7. "When you review a junior's GVD draft — what percentage of your comments are about form versus content?"

Reveals: if mostly form → AI owns 70-80% of drafting. Single biggest ROI indicator.

Quality & Failure Modes

Q8. "When you tried AI for slides and it was terrible — what specifically was bad?"

Reveals: which sub-task failed. Argument failure = capability problem. Visual failure = tool mismatch (fixable).

Team Readiness

Q9. "Is there anyone whose reaction to AI adoption you're privately worried about?"

Reveals: the phantom veto. In small firms, the real block is often social, not technical.

Team Readiness

Q10. "Is there a version of AI-augmented work that you'd be uncomfortable with — not because of confidentiality, but because it changes what the job feels like?"

Reveals: the professional identity ceiling — if reviewing AI output feels like a demotion from authorship.

Team Readiness

Q11. "If I told you 2 of 4 experiments would produce embarrassing results — would that feel like failure or learning?"

Reveals: experimental tolerance. "Failure" = reframe as research. "Learning" = green light.

Strategic Impact

Q12. "Last project you finished faster than expected — did you keep the full fee, or adjust?"

Reveals: billing psychology. Kept fee = AI is pure upside. Discounted = time-billing trap to address first.

Strategic Impact

Q13. "Three years from now, if this firm is known for something — what? And does AI help or hurt that story?"

Reveals: whether AI is central to positioning or backstage infrastructure.

Strategic Impact

Q14. "What would have to be true in 3 months for you to feel this worked — and what would feel like it damaged something you care about?"

Reveals: the success definition AND the real protection boundary.

Growth & Junior Dev

Q15. "How do you train someone to know what AI got wrong — if they don't yet know what right looks like?"

Reveals: whether AI adoption is compatible with the firm's growth model.

Idea Clusters & Themes

Cluster 1: The Compound Knowledge Engine

Three ideas converge into a single artifact:

The AI Error & Correction Log (capture every human edit)
The EU AI Act governance trail (audit documentation from Day 1)
The PRMA-specific prompt library (corrections become prompt refinements)

A Google Sheet started on Monday that captures "AI wrote X, I changed it to Y, because Z" simultaneously produces operational learning, regulatory compliance, and competitive moat.

Cluster 2: The Two-Job Decomposition Framework

Every task is actually two tasks masquerading as one:

Information processing layer — retrieval, extraction, structuring, drafting
Judgment layer — interpreting, deciding, recommending, quality-gating

Universal workflow: AI (processing) → Human gate → AI (formatting) → Human review

Cluster 3: The Adoption Infrastructure

The 90-minute task decomposition workshop as Sprint pre-condition
The billing model discovery (time vs. output vs. judgment billing changes everything)
The confidentiality architecture decision tree (NDA review → tool selection → infrastructure tier)

Cluster 4: Adoption Anti-Patterns (Reverse Brainstorm)

Tool before rubric → Every experiment gets a written evaluation rubric before it runs
CDP anxiety blocks everything → Separate capability track from confidentiality track on Day 1
Shared ownership = no ownership → One obsessed owner for 4 weeks before it becomes a team practice

Confidentiality Architecture Options

Tier	Solution	Timeline	Cost	Data Guarantee
0	De-identification / anonymization	This week	€0	CDP data never enters AI
1a	Claude Teams / ChatGPT Enterprise	1-2 weeks	€25-40/user/mo	Zero data retention, DPA
1b	Microsoft 365 Copilot (if on M365)	2-4 weeks	€30/user/mo	EU tenant, existing DPA
2	Azure OpenAI private deployment	4-8 weeks	€200-800/mo	Private tenant, EU region
3	Local model (Ollama + Llama 3.3 70B)	2-3 months	€3,500-5,000 one-time	Air-gapped, absolute

The 4-Week Sprint

Week	Task	Tool	Measure	Decision Gate
1	Meeting summary	MacWhisper + Claude	Summary matches expert notes? Edit time vs. write-from-scratch time	If editing >80% of writing time → prompt redesign
2	Landscape research (known area)	Claude	Hallucination rate, coverage, what was missed	If fact-check every claim → first-draft accelerator only
3	GVD section skeleton	Claude + template	Structure quality, time to publishable	If structure consistently sound → strong scaffolding tool
4	HTA review summary	Claude	Quality vs. expert blind comparison	Gap reveals where human expertise is irreplaceable

Workshop Structure (90 Minutes)

Phase	Duration	Activity
1	20 min	Verb extraction on a neutral example — judgment verbs vs. execution verbs
2	35 min	Independent decomposition of 2-3 tasks each, then compare
3	20 min	Write one prompt per partner using quality criteria as the specification
4	15 min	Live experiment in the room — test the prompt, react, iterate

Board Member Highlights

Member	Key Breakthrough	Round
Idea Generator	The compound knowledge engine concept — every AI interaction produces training data for a firm-specific intelligence layer	R1, R2
Reality Checker	The Week 1 kill shot analysis (transcription failure masquerading as AI failure, "competent but subtly wrong" summary destroying trust)	R3
Market Scanner	"Immediate vs. delayed quality signal" as the real classification system. EU AI Act governance trail as competitive moat	R1, R3
First Principles	The slide failure contains the general theory of why AI adoption fails in knowledge work — undifferentiated task blobs judged on the hardest sub-task	R2, R4
Wild Card	"What do clients pay for: time, output, or judgment?" — the billing model question that restructures the entire ROI	R4, R5

Recommended Next Steps — This Week

Monday: Create the AI Error & Correction Log (Google Sheet, 5 minutes). Columns: Date | Task | Prompt | AI Output | Human Correction | Reason | Score 1-5.
Tuesday-Wednesday: Krzysztof reviews 3 client contracts for AI/data processing language (2 hours). This unlocks or confirms the CDP infrastructure path.
Thursday: Schedule the 90-minute Task Decomposition Workshop for next week. Both partners block the time.

The idea to explore first: The AI Error & Correction Log. Not because it's the flashiest — because it's the one that makes everything else compound. Every experiment feeds it. Every correction improves the next prompt. Every entry is simultaneously learning, governance, and moat. Start it before anything else. The clock only starts when you start capturing.