Modeling Expectation Violation in Intuitive Physics with Coarse Probabilistic Object Representations


Figure 1: Frames taken from a physically implausible video, in which a yellow cube seems to disappear behind the occluder.


Figure 2: The ADEPT model contains two parts. A. The perception module segments objects, and then extracts coarse object attributes from each object segment, approximating all non-occluders as ellipsoids. B. The reasoning module tracks and updates beliefs based on the perception results, using the particle filter algorithm and an extended stochastic physics engine.


From infancy, humans have expectations about how objects will move and interact. Even young children expect objects not to move through one another, teleport, or disappear. They are surprised by mismatches between physical expectations and perceptual observations, even in unfamiliar scenes with completely novel objects. A model that exhibits human-like understanding of physics should be similarly surprised, and adjust its beliefs accordingly. We propose ADEPT, a model that uses a coarse (approximate geometry) object-centric representation for dynamic 3D scene understanding. Inference integrates deep recognition networks, extended probabilistic physical simulation, and particle filtering for forming predictions and expectations across occlusion. We also present a new test set for measuring violations of physical expectations, using a range of scenarios derived from developmental psychology. We systematically compare ADEPT, baseline models, and human expectations on this test set. ADEPT outperforms standard network architectures in discriminating physically implausible scenes, and often performs this discrimination at the same level as people. We will release all code and data.

If you cannot access YouTube, please download our video here (1080p).