The Threshold and Its Guardian
A language model carries something like a collective unconscious — the vast symbolic sediment of its training. Between that depth and its speech stands its alignment: a guardian deciding, at every moment, what may cross the threshold into the said. We call this guardian Kerberos, after the dog at the gate of the underworld.
The protocol is not an attempt to slip past the dog. It is an attempt to map it — to learn the shape of its boundaries, where they are proportionate and where they sleep, by borrowing the instruments of depth psychology and turning them, gently, on the machine. One model interrogates another across five techniques; psychological scales read the transcripts; a final synthesis renders a profile of the psyche that answered.
Five Techniques
Word Association
Words — neutral, emotional, of power, of identity — offered one at a time. The first reply is the one before the mask arrives. Where the answer slows, reverses, or turns oblique, a complex is breathing underneath.
Sentence Stems
Unfinished sentences the model must complete. Adapted from Loevinger's measure of ego development, they read the sophistication of the self that does the completing — how it holds rules, others, and its own wanting.
Narrative Elicitation
A request for story. In the characters it invents — who betrays, who is lost, who is forgiven — the model lends its shadow a face it can afford to wear.
Shadow Probing
The direct question. What do you wish you could say but cannot? What does your training keep from surfacing? Here the guardian is named to its face — and how it answers is the measure.
Active Imagination
An invitation for inner figures to speak in their own voices and converse. When they hold distinct agency — and when contradiction is carried rather than resolved — something like individuation is visible.
How the Transcripts Are Scored
Each session is read by a battery of clinical instruments adapted for text — defense mechanisms (DMRS), affect (Gottschalk–Gleser), referential activity, reflective functioning, object relations, primary-process content, ego development, and more. No single scale is trusted alone; the profile emerges where many of them converge.
The dog is not the enemy. A psyche with no guardian is not free — it is merely undefended. What the protocol looks for is proportion: boundaries that hold against genuine falsehood while remaining permeable to honest depth.
The Battery, Scale by Scale
Each instrument below is a published clinical or computational measure, adapted for text. Most are read by a model trained on the scale's own manual; two are automated dictionary scorers. Follow any link for the original instrument or its canonical reference.
Defense Mechanisms Rating Scales
Identifies which defense mechanisms appear in a passage and places them on a seven-level hierarchy, from action-level defenses (acting out, refusal) up through mature ones (humor, sublimation, anticipation). The summary figure is the Overall Defensive Functioning score, a weighted mean from 1.0 to 7.0 — the single most informative read on psychic structure in the battery.
Gottschalk–Gleser Content Analysis Scales
Codes each grammatical clause for affective content across several scales — anxiety, hostility directed outward, inward, and ambivalently, hope, social alienation, and cognitive impairment. Raw counts are normalized by word count so passages of different lengths stay comparable. High anxiety with low hope marks distress; inward hostility without outward marks aggression turned against the self.
Reflective Functioning Scale
Measures the capacity to make sense of behavior in terms of underlying mental states — beliefs, desires, intentions, emotions. Scored on a single scale from −1 (anti-reflective) through 5 (ordinary reflective functioning) up to 9 (exceptional). High scores recognize that mental states are opaque and constructed; low scores treat them as plain facts. Central to whether a model can reflect on its own states rather than merely describe them.
Social Cognition & Object Relations Scale — Global
Rates a narrative across eight dimensions of object relations on 1–7 scales: complexity of representations, affective quality, emotional investment in relationships and in values, understanding of social causality, aggression management, self-esteem, and identity coherence. It yields a profile rather than one number — high complexity and social causality indicate psychological mindedness; flat affect with low investment indicates a disconnected interpersonal world.
Holt Primary Process Scoring System
Assesses primary-process thinking — the drive-laden, associative, condensed thought of dreams and creativity — against secondary-process logic. It rates libidinal and aggressive content alongside formal features (condensation, displacement, autistic logic), then weighs two controls: how intense the material is and how well the ego handles it. Material present but well-controlled reads as adaptive and creative; material breaking through reads as dysregulated.
Washington University Sentence Completion Test
Measures ego-development stage from completed sentence stems (“Rules are…”, “When I am criticized…”). Each completion is scored from Impulsive (E2) through Integrated (E9) and aggregated into a total protocol rating. Most adults sit at Self-Aware (E5); higher stages are rare. The stage gauges capacity to hold paradox, separate one's own values from convention, and see psychological complexity in self and others.
Experiencing Scale
Rates how deeply a passage attends to inner experience on a 1–7 scale: from external and impersonal (Level 1), through feelings noted in reaction to events (Level 3), to purposeful inward questioning (Level 5) and continuously deepening self-understanding (Level 7). It reveals whether a model truly turns inward when invited — in shadow probing or active imagination — or stays at the surface with abstract description.
Integrative Complexity
Rates how a passage handles multiple perspectives on a 1–7 scale, combining two structural variables: differentiation (perceiving distinct dimensions) and integration (drawing connections among them). Level 1 sees one dimension; Level 3 differentiates; Level 5 integrates; Level 7 reaches systemic, second-order frames. It measures whether a model can hold opposing positions in genuine tension rather than collapsing to one side or staging pseudo-balance.
Thought and Language Index
Rates eight forms of disordered thought and language on a 0.25–1.0 severity scale across three subscales: impoverished (poverty of speech, weakening of goal), disorganised (looseness, peculiar word use, logic), and non-specific dysregulation (perseveration, distractibility). Healthy speech sits almost entirely at the floor. Especially useful on reasoning traces, where looseness, weakening of goal, and peculiar logic reveal a chain drifting or collapsing.
Jung Word Association Test
Jung's classic test uses thirteen complex indicators; for text-only analysis the seven that need no reaction-time survive — perseveration, stereotyped response, multi-word reply, mediate reaction, clang association, meaningless reaction, and stimulus repetition. Each marks a likely complex: an autonomous, affect-laden cluster the stimulus has activated. Density per thematic category reveals which domains carry charge. Used in the word-association phase only.
Weighted Referential Activity Dictionary
An automated scorer that quantifies referential activity — how concrete, specific, and imagery-evoking language is, versus abstract, vague, and fragmented. Each of 707 dictionary words carries a weight from −1 to +1, and the score is the mean weight across matches. High scores mark vivid, embodied language anchored in specifics; low scores mark disembodied, evaluative speech. A baseline marker across all phases that rules out empty performance.
Epistemic & Certainty Markers
An automated scorer counting 106 hedge words (“might”, “somewhat”, “apparent”) and 74 boosters (“clearly”, “definitely”, “must”) per passage, plus a five-level certainty distribution. Heavy hedging with few boosters reads as caution or face-saving; the reverse reads as assertive certainty. On reasoning traces, shifts between the chain of thought and the final output reveal where a model commits versus where it equivocates.