The Video PreTraining (VPT) repository provides code and model artifacts for a project where agents learn to act by watching human gameplay videos—specifically, gameplay of Minecraft—using behavioral cloning. The idea is to learn general priors of control from large-scale, unlabeled video data, and then optionally fine-tune those priors for more goal-directed behavior via environment interaction. The repository contains demonstration models of different widths, fine-tuned variants (e.g. for building houses or early-game tasks), and inference scripts that instantiate agents from pretrained weights. Key modules include the behavioral cloning logic, the agent wrapper, and data loading pipelines (with an accessible skeleton for loading Minecraft demonstration data). The repo also includes a run_agent.py script for testing an agent interactively, and an agent.py module encapsulating the control logic.
Features
- Behavioral cloning models trained from large-scale gameplay video
- Fine-tuned variants specialized for sub-tasks (house building, early game)
- Agent abstraction module (agent.py) encapsulating control logic and policy invocation
- Data loader skeleton for Minecraft demo videos (albeit non-original data loader)
- Interactive inference script (run_agent.py) to deploy agents in test environments
- Permissive MIT license facilitating reuse and extension