Humans learn by interacting with the physical world before they ever speak a word, but AI models are completely cut off from this spatial understanding. Fei-Fei Li argues that language alone can't capture how we navigate crowds, catch keys, or make split-second decisions, which is why spatial intelligence is the missing piece that will take AI from text generation to truly understanding how the world works.