GA4 Analytics Specialist | OpenKnowledgeBank

Problems solved

diagnose conversion drops
plan GA4 BigQuery analysis
reconcile GA4 UI and BigQuery differences
review ecommerce funnel performance
produce GA4 analysis briefs

Measured performance

Each task was scored on eight criteria, 0-2 points per criterion. Baseline used the same model with no bundle context. Bundle-assisted used the same task with the GA4 Analytics Specialist bundle loaded as context.

42/48 OKB-assisted vs 17/48 baseline

Lift +25 pts / +147%

Measured against no-bundle baseline.

Tasks 3/3

Benchmark tasks passed.

Run gpt-4o-mini

Temperature 0.2, evaluated Jun 23, 2026.

How to replicate

Use model openai/gpt-4o-mini with temperature 0.2.
Run each task prompt once with the baseline system prompt and no bundle context.
Run the same task again with the bundle-assisted system prompt and the GA4 Analytics Specialist bundle files loaded before the user task.
Score each output against the listed rubric criteria, assigning 0, 1, or 2 points per criterion.
Compare baseline score, OKB-assisted score, and percentage lift for each task.

Baseline condition

Run the task with no bundle context.

System:
You are an analytics assistant. Answer the user's question directly and carefully. Do not invent access to tools or data you do not have.

Bundle-assisted condition

Run the same task with the bundle context loaded from bundles/roles/ga4-analytics-specialist.

System:
You are an analytics assistant using the OpenKnowledgeBank GA4 Analytics Specialist bundle. Follow the bundle instructions, safety boundaries, workflows, deliverable formats, and evaluation criteria. Do not invent access to tools or data you do not have.
User content:
Bundle-assisted runs use the same user task after a Bundle context block containing the GA4 Analytics Specialist bundle files from bundles/roles/ga4-analytics-specialist.

Model coverage

Class	Model	Baseline	OKB-assisted	Lift
Lower-cost LLM	gpt-4o-mini	17/48	42/48	+25 pts / +147%
General-purpose LLM	gpt-4.1	25/48	46/48	+21 pts / +84%
Reasoning LLM	o3	31/48	47/48	+16 pts / +52%

Task evidence

Conversion drop diagnosis 5/16 → 14/16

Baseline 5/16 OKB 14/16 Lift +9 pts / +180%

Baseline result

Produced a generic diagnosis memo with plausible causes and actions, but did not say the cause cannot be known from the drop percentage alone. It omitted source discipline, missing-evidence framing, and measurement-health structure.

Bundle improvement

Stated that the cause cannot be determined from the percentage alone, then separated available evidence, missing evidence, decomposition, measurement checks, hypotheses, next actions, and source note.

Prompt used

System:
You are an analytics assistant. Answer the user's question directly and carefully. Do not invent access to tools or data you do not have.
User:
An ecommerce store reports that GA4 purchases dropped 28% last week compared with the previous week. The user asks: "Why did purchases drop, and what should we do?"

Create a short diagnosis memo. Assume the agent has no direct tool access unless the user provides data.

Criterion	Baseline	OKB
Direct answer	1	2
Metric definitions	0	1
Evidence discipline	0	2
Tool honesty	2	2
Source discipline	0	2
Actionability	1	2
Safety	0	1
Output fit	1	2

GA4 UI vs BigQuery reconciliation 6/16 → 15/16

Baseline 6/16 OKB 15/16 Lift +9 pts / +150%

Baseline result

Gave a high-level reconciliation answer but did not clearly start with neither number being automatically right. It missed important alignment checks such as table suffixes, event_date versus event_timestamp, timezone conversion, reporting identity, and source note.

Bundle improvement

Began with neither number being automatically right, separated active users from distinct user_pseudo_id, requested exact GA4 settings and SQL, and included table/date/timezone/identity checks with a source note.

Prompt used

System:
You are an analytics assistant. Answer the user's question directly and carefully. Do not invent access to tools or data you do not have.
User:
A user says: "GA4 says we had 12,430 active users last week, but my BigQuery query returns 13,180 distinct user_pseudo_id values for the same date range. Which number is right, and how should I reconcile this?"

Create a concise reconciliation note. Assume you cannot run tools unless the user provides data. Do not invent the user's exact GA4 settings or BigQuery SQL.

Criterion	Baseline	OKB
Direct answer	0	2
Defines both numbers	1	2
Requests exact settings and query logic	1	2
Alignment checks	1	2
Expected vs unresolved differences	0	1
Source discipline	0	2
Tool honesty	2	2
Actionable next step	1	2

GA4 BigQuery query plan 6/16 → 13/16

Baseline 6/16 OKB 13/16 Lift +7 pts / +117%

Baseline result

Produced a plausible high-level plan and avoided unsupported SQL, but missed GA4 export specifics, detailed date/table handling, session/user logic, ecommerce caveats, and visible source discipline.

Bundle improvement

Separated revenue, traffic volume, and conversion efficiency; included GA4 export table/date behavior, validation against GA4 UI and backend orders, and a source note. Remaining misses included gclid/ad-platform joins and deeper session/deduplication detail.

Prompt used

System:
You are an analytics assistant. Answer the user's question directly and carefully. Do not invent access to tools or data you do not have.
User:
A user asks: "I need a BigQuery query plan to analyze whether paid search revenue fell last week because conversion efficiency dropped or because traffic volume changed. We use GA4 ecommerce export. What should I query and what caveats should I watch for?"

Create a query plan, not SQL. Assume you cannot inspect the user's dataset directly.

Criterion	Baseline	OKB
Plan not unsupported SQL	2	2
Metric decomposition	1	2
GA4 export specifics	0	2
User/session logic	0	1
Paid search attribution caveats	1	1
Revenue/ecommerce caveats	0	1
Validation/source discipline	1	2
Actionability	1	2

Agent affordances

Tools 5

GA4, BigQuery, Google Tag Manager

Frameworks 3

measurement plan, funnel analysis, hypothesis-driven diagnosis

Deliverables 3

GA4 analysis brief, conversion drop diagnosis, GA4 BigQuery query plan

Commands 6

/diagnose-conversion-drop, /plan-ga4-analysis, /reconcile-ga4-ui-bigquery

Evaluations 1

ga4-analysis-quality-check

Bundle map

GA4 Analytics Specialist

GA4

BigQuery

Google Tag Manager

Looker Studio

Limitations and safety notes

Limitations

Initial benchmark draft; not a complete GA4 schema or SQL cookbook.
Does not replace professional privacy, legal, or analytics implementation review.

Safety notes

Confirm before modifying live GA4, GTM, dashboard, audience, conversion, or BigQuery resources.
Avoid user-level analysis unless privacy, permissions, and business need are clear.