Skip to main content

Data / methodology

How the data is collected and aggregated

Source

The figures on footballgpt.co/data are computed from the live FootballGPT database — the queries, drills, and conversations that real users (coaches, Football Manager video-game players, and individual players) generate inside the product. No external data sources are used.

Refresh cadence

Charts and downloads are recomputed once a week, every Monday at 06:00 UTC. The "Updated" date on the homepage reflects that compute time. There is no live or on-demand refresh — the weekly cadence is deliberate, so charts are stable across the working week and so the same numbers appear in any press piece written about a given week.

Anonymisation

All published charts are aggregate. No row-level data leaves the database. No usernames, emails, club names, team names, or session free-text are ever included in the dashboard, the CSV downloads, or the PDF report. Free-text query content is bucketed into pre-defined categories by automated rules; raw queries are never published verbatim.

Every chart enforces a k-anonymity floor of 50: any cohort with fewer than 50 underlying records is hidden, bucketed into "other", or omitted entirely. Where a chart shows percentages, the underlying n is shown alongside.

Reporting layer normalisation

User-stated values like age group and drill category arrive in many forms ("U10", "u10", "Under 10", "10-year-olds", "U10-U12", "Adult", "Senior", "Erwachsene"). The product treats these as flexible inputs, because the AI chatbot can interpret them. The dashboard cannot, so we apply a fixed reporting-layer normaliser:

  • Age bands: Mini (U6-U9), Junior (U10-U12), Youth (U13-U15), Senior Youth (U16-U18), Adult (U19+), Mixed.
  • Drill categories: technical, tactical, game-based, set-piece, warm-up, physical, defending, attacking, goalkeeping, ball-mastery, cool-down, other.

Source data is never modified. Normalisation lives only in published views.

Chart 1 — drill-and-cone problem

Source: generated_drills, filtered to records with both an age group and a drill category set. Percentages are within each age band, summing to 100 across visible categories.

Chart 2 — planning rhythm

Source: generated_drills.created_at, bucketed into day-of-week and hour-of-day in UTC. Note that the heatmap reflects UTC time, not user-local time — clusters around 8pm UTC correspond to different local times for users in different countries.

Chart 3 — audience mix

Source: query_analytics.mode, which is set per query based on which surface the user is interacting with (coach mode, FM mode, player mode, scout mode, goalkeeper mode). A single user can appear in multiple modes across different sessions.

Chart 4 — age band distribution

Source: generated_drills.age_group, normalised to age bands and grouped. A small share of drills (roughly 1%) carry an age_group value that does not map cleanly to a single band ("U10-Adult", "U13+", "All ages") — these are bucketed as "Mixed".

Opt-out

Any FootballGPT user who does not want their generated drills or queries to contribute to public aggregates can opt out by emailing [email protected]. An opt-out is honoured at the next weekly refresh and applied permanently going forward.

Caveats

This dataset reflects what coaches generate when using AI tooling. It is not a representative survey of grassroots football coaching as a whole — coaches who use AI tools may differ systematically from coaches who don't. Findings should be read as "what AI-using coaches ask for", not "what every grassroots coach plans". We've also recently improved our profile-collection in onboarding; older numbers reflect the pre-fix data, which under-counts certain age groups.