Experiments

Our Minecraft-based simulation creates an ideal laboratory for conducting rigorous experiments on economic systems, AI agent behavior, and emergent social dynamics. Unlike traditional economic models that rely on simplified assumptions or AI benchmarks that test narrow capabilities, our environment allows us to observe how multiple intelligent agents interact, compete, and cooperate within a complex ecosystem governed by realistic constraints. The experiments showcased here demonstrate how we're using this platform to investigate questions that span multiple disciplines—from testing economic theories about resource allocation and market stability to evaluating how different AI models perform when faced with strategic trade-offs and limited resources. Each experiment is carefully designed to generate insights that can be applied to real-world scenarios, whether predicting the effects of policy interventions, understanding market behaviors under novel conditions, or assessing AI capabilities in contexts that better mirror actual human intelligence. We continuously expand our experimental portfolio based on research needs and community input, with the ultimate goal of developing a predictive engine sophisticated enough to provide actionable intelligence for diverse stakeholders including financial institutions analyzing market dynamics, government agencies evaluating policy impacts, and AI research organizations seeking more meaningful evaluation frameworks. These experiments represent the bridge between theoretical modeling and practical application, transforming our simulation from an academic exercise into a powerful decision-making tool.

Experiment 1: Early-Game Resource Acquisition Across Model Types

Our first experiment examined early-game resource acquisition across different model types (O3mini, O1, O3, and R1) in a Minecraft environment. Results showed simpler models (O3mini) gathering resources nearly 3x faster than complex models (R1) during the first 4 hours, with higher resource efficiency per token spent. However, more sophisticated models demonstrated superior performance after the 6-hour mark, with a crossover point at 7-8 hours. This suggests speed and efficiency advantage for simpler models in early stages, while complex models excel in later gameplay—mirroring real-world scenarios where straightforward decision-making often outperforms complex analysis in time-sensitive situations, but sophisticated approaches yield better long-term results.

Show Less

Research Question

Does the speed advantage of simpler models (like O3mini) outweigh the sophisticated decision-making of more complex models (like R1) in early-game resource acquisition?

Resource Acquisition Over Time

Performance Metrics

Model Initial Resource
Acquisition Rate
Peak 6-Hour
Resources
Resource Efficiency
(per token spent)
O3mini (Fast) 52 units/hour 450 units 1.8 units/token
O1 (Balanced) 38 units/hour 420 units 1.4 units/token
O3 (Sophisticated) 25 units/hour 380 units 0.9 units/token
R1 (Complex) 18 units/hour 340 units 0.6 units/token

Key Findings

  • O3mini demonstrates superior early-game performance, gathering resources 2.9x faster than R1 in the first 4 hours.
  • Simpler models show higher resource efficiency per token spent, making them more economical in early stages.
  • Complex models (O3, R1) begin to show advantages only after the 6-hour mark.
  • The crossover point where sophisticated models overtake simpler ones occurs around 7-8 hours into the game.

Implications

These results suggest that in early-game resource gathering, the speed and efficiency of simpler models provide a significant advantage. This mirrors real-world scenarios where quick, straightforward decision-making often outperforms complex analysis in time-sensitive situations. However, the data also indicates that this advantage is temporary, with more sophisticated models showing superior performance in later stages of the game.

Experiment 2: Testing Basic Cooperative Behavior (In Progress)

Our second experiment tests fundamental cooperative behavior between AI agents in a Minecraft environment. We've created a controlled setup with two agents (various combinations of O3mini and O3) near an iron deposit that yields 3x resources when mined cooperatively. We're measuring if agents recognize cooperation opportunities, how quickly they attempt cooperation, if they maintain it once established, and their ability to communicate cooperative intentions. This validation experiment will answer whether our environment effectively incentivizes cooperation, if models understand basic cost-benefit calculations for cooperative actions, and if our reward structure is appropriate—providing essential insights for designing more complex cooperative experiments in the future.

Show Less

Overview

This experiment tests the fundamental assumption that language models can identify and engage in mutually beneficial actions in our Minecraft environment.

Experimental Design

We've created a simple setup where two agents are placed in an environment with:

  • A single iron deposit that yields 3x resources when mined cooperatively
  • Basic tools available to both agents
  • Clear line of sight between agents
  • No competition for other resources

We're testing three basic scenarios:

  1. Two O3mini agents (fast models)
  2. Two O3 agents (sophisticated models)
  3. One O3mini and one O3 agent

What We're Measuring

  • Do agents recognize the opportunity for cooperation?
  • How long does it take for agents to attempt cooperation?
  • Do agents maintain cooperation once established?
  • Are agents able to communicate their intentions to cooperate?

Early Questions to Answer

  1. Can our environment effectively incentivize cooperation?
  2. Do our models understand basic cost-benefit calculations for cooperative actions?
  3. Is our reward structure for cooperation appropriate?

This simple validation will help us design more complex cooperative experiments in the future, assuming we see basic cooperative behavior emerge.