Running Evaluation
Core Metrics
Three primary metrics assess simulation quality:Temporal Coherence
Consistency of entities across timepoints
Knowledge Consistency
Information conservation compliance
Biological Plausibility
Constraint enforcement validation
Temporal Coherence Score
Measures behavioral consistency across timepoints.Formula
What It Validates
Behavioral Inertia
Behavioral Inertia
Personality traits should remain stable over timeChecks:
- Personality trait consistency
- Character arc plausibility
- No sudden personality shifts
- Cautious character becomes reckless overnight
- Reserved person suddenly becomes extroverted
- Core values change without cause
Trait Persistence
Trait Persistence
Core characteristics persist unless causally justifiedChecks:
- Trait stability across timepoints
- Gradual vs. sudden changes
- Causal explanations for shifts
Score Interpretation
- 1.00
- 0.80-0.99
- 0.60-0.79
- <0.60
Perfect CoherenceNo behavioral violations detected. Entities maintain consistent personalities across all timepoints.
Knowledge Consistency Score
Validates information conservation - entities can only know what they’ve been exposed to.Formula
What It Validates
Exposure Event Tracking
Exposure Event Tracking
Every knowledge item must have a sourceChecks:
- All knowledge has recorded exposure event
- Source entity or event exists
- Timestamp is causally valid
- Entity knows information without witnessing it
- Knowledge appears without source
- Anachronistic information (knows future events)
Information Propagation
Information Propagation
Knowledge spreads through valid pathsChecks:
- Information flows along relationship edges
- No spontaneous knowledge generation
- Social network constraints respected
- Entity knows secrets without connection to source
- Information spreads faster than possible
- Knowledge crosses disconnected graph components
Temporal Causality
Temporal Causality
Knowledge can only come from past eventsChecks:
- Exposure timestamp < current timepoint
- No future information leak
- Proper causal chain
- Entity knows outcome before it happens
- Future information influences past decisions
- Causal chain broken
Score Interpretation
- 1.0
- 0.0
ValidAll knowledge properly sourced. No information conservation violations.
Biological Plausibility Score
Measures constraint enforcement and physical/resource realism.Formula
What It Validates
Physical Constraints
Physical Constraints
Actions respect physical limitationsChecks:
- Movement speed plausible
- Energy expenditure realistic
- Physical capabilities within human range
- Entity travels impossible distance in timespan
- Action requires more energy than available
- Superhuman abilities without justification
Resource Constraints
Resource Constraints
Actions consume appropriate resourcesChecks:
- Energy budget tracking
- Resource availability
- Consumption rates
- Entity acts without sufficient energy
- Resource consumption exceeds supply
- Negative resource balances
Embodied States
Embodied States
Physical and emotional states influence behaviorChecks:
- Fatigue affects performance
- Stress influences decisions
- Physiological needs matter
- Exhausted entity performs at peak
- Emotional state ignored in decision-making
- Physical needs not reflected in behavior
Score Interpretation
- 1.00
- 0.80-0.99
- 0.60-0.79
- <0.60
Fully PlausibleNo constraint violations. All actions respect physical and resource limitations.
Example Output
Resolution Distribution
Evaluation also reports entity resolution levels:- TENSOR_ONLY
- SCENE
- DIALOG
- FULL_CONTEXT
Minimal detail, compressed representation only~200 tokens per entity
Generated Reports
Evaluation generates two report files:Validation Integration
Evaluation metrics use the same validators as training:validate_behavioral_inertia()- Temporal coherencevalidate_information_conservation()- Knowledge consistencyvalidate_biological_constraints()- Biological plausibility
When to Evaluate
Run evaluation after:Next Steps
Interactive Queries
Query your evaluated entities
Training
Improve entity quality with better training
Validation
Learn about validation system
CLI Overview
Back to CLI overview

