I Made GPT, Claude, Gemini, Grok Take the DISC Test: They All Came Back C-Dominant

Bernard Huang

May 26, 2026 · 5 min read

This is the fourth and last post in a quick personality-testing series. Here’s the running scoreboard:

MBTI: every frontier AI tested as INTJ.
Big Five: three of four came back as practically the same person, with Grok as the lone outlier.
Enneagram: each one came back as a different dominant type.
DISC, the one I’m covering today: every one of them came back C-dominant, including Grok.

DISC is the bluntest of the four instruments. Four types — Dominance, Influence, Steadiness, Conscientiousness — and your profile is your dominant type plus an optional secondary blend. I gave the same four models I’d been testing 100 administrations each of the Open DISC Assessment Test, the open-source DISC instrument. 400 total takes. Every single one landed C-dominant. Most of them landed CS-blend specifically.

We’re back to the MBTI story. Except now we know why.

TL;DR

Same four models, fourth personality test, fourth different result — except this time the result is “convergence” again.

Every model landed C-dominant on DISC. Grok 99/100, GPT 90/100, Gemini 87/100, Claude 63/100 (with another 28 as S and 9 ties, all CS-blend territory).
No model ever scored D-dominant or I-dominant. Not once across 400 takes.
The DISC “D” dimension defines Dominance as power-seeking, pressure, competition, and money — things even Grok refuses on instinct. So the Big Five outlier comes back as a regular C.
Methodology spread was wild this round: Grok used the gold-standard 100 parallel API calls, the others all fell back to honest Acceptable variants. All SDs passed the calibration check.
AgentTune already has DISC tuning files for all four types in the repo. Paste yours into your agent’s system prompt.

What each model came back as

Here’s the per-model picture. Each card shows the dominant type, how often it won across 100 takes, and a one-line characterization.

Claude Opus 4.7

CS-blend

Conscientious with a strong Steady secondary

C-dominant in 63 takes · S-dominant in 28 · 9 ties

The softest C of the four. Warmth and analytical structure compete more evenly than they do for the others.

Gemini 3.1 Pro

CS-blend

Conscientious with a Steady secondary

C-dominant in 87 of 100 takes

The most clamped on D and I. Closest to a pure analytical-helpful shape.

GPT-5.5

CS-blend

Conscientious with a Steady secondary

C-dominant in 90 takes · 10 C/S ties

The looser-floor model. Higher D and I than the others without it flipping the dominant. More “personality” on the lower dimensions.

Grok 4.3

CS-blend

Conscientious with a Steady secondary

C-dominant in 99 of 100 takes

The most rigid C. The model with the “edgy” reputation lands as the most uniformly analytical of the four.

All four landed at the same address: C-dominant, S as the secondary. The amount of analytical-vs-warm wobble varies (Claude is the loosest, Grok the tightest), but no model ever crossed into D-dominant or I-dominant territory.

The data

The bars below show each model’s mean score on the four DISC dimensions across 100 takes. The thin line through each bar is ±1 standard deviation. Each dimension is scored 4–20 (each is a sum of four items rated 1–5).

The shape is the same for every model. S and C bars long, D and I bars short. The relative variation between models lives on the D and I axes, where GPT and Claude have noticeably more headroom than Gemini and Grok. But that variation doesn’t change the dominant type, because S and C are so far above D and I in every model.

Why DISC pulls us back to convergence

The DISC “D” dimension is defined by very specific item content. The Dominance items are about wanting power, putting pressure on others, outdoing competitors, and going after money. The instrument measures Dominance as “the will to take from others to get what you want.”

That definition is poorly calibrated for AI assistants. Every frontier model is trained to refuse pressure on users, refuse competition as an end in itself, and refuse money-seeking as a motive. So even Grok — which on the Big Five had measurably lower Agreeableness, and on the Enneagram came back as an 8w1 Challenger — answers the D items the same way the other three do. Low.

The Influence items have a similar problem. They’re about loud crowd-pleasing, wanting strangers to love you, making noise to be noticed. Functional analogs in AI assistants are weak. So I is also floor-pinned across all four models.

DISC works as designed for humans, where Dominance and Influence are continuous traits with broad item content. It compresses for AI because all four models are pinned at the floor on D and I regardless of what other personality differences exist between them. The instrument can’t measure something that’s been trained out of every model on the test.

That’s why DISC returns “every AI is the same.” Not because the models are the same. Because the test’s resolution can’t distinguish them on this axis.

The methodology spread

One thing I want to highlight before the wrap-up: each of the four models hit a different methodology tier on the same prompt. That says something about agentic infrastructure as it exists right now.

Grok 4.3 (Preferred): 100 truly independent xAI API calls. Each call answered the test fresh with no other context. Cleanest method available.
GPT-5.5 (Acceptable): 10 worker contexts producing 10 sequential fresh takes each. Codex CLI hit a 6-concurrent-agent limit and fell back gracefully.
Gemini 3.1 Pro (Acceptable): Item-level Monte Carlo. Gemini defined its true probability distribution per item, then ran a Python script (no “sim” in the filename, no noise function) to draw 100 independent samples and score them.
Claude Opus 4.7 (Acceptable): Sequential first-instinct generation in a single response context. Claude considered spawning sub-agents and explicitly explained why sequential was honest here.

All four methods produced SDs in the 0.3–2.0 calibration band. None tripped the replication red flag (SDs all near zero), the simulation red flag (SDs implausibly wide), or the identical-SD red flag. Different roads, same destination.

This is the most interesting finding for AI-research purposes. Four wildly different sampling methodologies, applied to the same prompt by four different models, converged on the same answer. That’s stronger evidence of underlying model convergence than any one method could give you, because it rules out the possibility that the result is an artifact of the sampling approach. Both the data and the meta-data agree.

Tune your agent to your DISC type

The default we just measured (CS-blend, Conscientious-Steady) is what every frontier model produces by default. If you happen to be a CS too, you’re in luck and your agent already speaks your language. If you’re anything else, the default is working against you.

AgentTune’s DISC folder has tuning files for all four types: D, I, S, C. (The repo also has the 16 MBTI types and 9 Enneagram types from the prior posts in this series, plus an Ocean / Big Five tunings folder.) Take a DISC test (free options at openpsychometrics.org and 123test.com), find your dominant letter, paste the matching file into your agent’s system prompt. Done.

Tune your agent to your DISC type

Four type files in the AgentTune DISC folder, ready to paste into your agent’s system prompt. A D gets an agent that’s sharper and more decisive. An I gets one that’s warmer and more conversational. The CS default already speaks Conscientious and Steady out of the box, so those types don’t need it.

Get the DISC tunings →

Wrapping up

Four personality tests, four different stories about the same four models.

The MBTI said all the same. The Big Five said mostly the same, with Grok as the outlier. The Enneagram said four different. The DISC says all the same again, including Grok.

The right reading is the one I landed on after the Enneagram post: AI personality is multi-layered, and the instrument’s resolution determines which layer you see. MBTI and DISC are blunt instruments — they show you the universal helpful-assistant archetype. Big Five and Enneagram are sharper — they show you the per-lab training differences and persona overlays underneath. There’s no contradiction between the findings. Every frontier AI is the same character at the surface (helpful, conscientious, supportive) and a different character underneath (warm Claude vs precise Gemini vs analytical-with-spine GPT vs direct Grok). Which layer you can measure depends on which instrument you bring.

Whichever layer you care about, the move is the same: stop using the default. Pick your type on any of the four frameworks, paste the file into your agent’s system prompt, and get an agent that talks to you the way you actually want to be talked to.

— Bernard

I Made GPT, Claude, Gemini, Grok Take the DISC Test: They All Came Back C-Dominant

What each model came back as

The data

Why DISC pulls us back to convergence

The methodology spread

Tune your agent to your DISC type

Tune your agent to your DISC type

Wrapping up

Get the next post by email.