As others have pointed out, humans train on existing codebases as well. And then use that knowledge to build clean room implementations.
What they don't do is read the product they're clean-rooming. That's kinda disqualifying. Impossible to know if the GCC source is in 4.6's training set but it would be kinda weird if it wasn't.
All said and done, that its even possible is remarkable. Maybe these all go into training the next Opus or Sonnet and we start getting models that can create efficient compilers from scratch. That would be something!