It tries to match a background sound to the environment, then tries to identify subjects, and what they're doing, and the exact moments when their activity should cause sounds, and where in the stereo ...