Ever heard a song or some other set of sounds and thought you could make out some sound or phrase that, on close examination, wasn't really there? I'm not talking about misheard lyrics, but lyrics that didn't even exist at a point in a song. Well there's a reason for that. The reason is that, due to the way the song is layered, a specific set of frequencies that the song's instruments play is close enough to the set of frequencies that would be heard if a human were talking, that the brain can perceive it as such.
Don't believe me? Here's a video of various songs broken up into sets of notes based on the frequencies in their audio files, and then played solely on a virtual piano. There is no other instrument being played here, simply a piano. See how much if it you can make out:
No, the video doesn't just consist of All Star, but it is a common enough song that you should be able to pick out at least some of the lyrics. So, why exactly does this work?
As a human speaks, the frequency of their voice changes in order to create the sounds of various sets of letters. At the same time, their voice cuts in and out, also to get the proper sounds of the syllables being said, to make it more smooth. By copying these frequencies precisely at the precise times they occur, it's possible to use any instrument in order to simulate human speech patterns, thus creating the illusion of a voice being heard.
This not enough to convince you that a computer could mimic a human voice? Look up a video of a neural network analyzing human speech. It can actually get pretty freaky to listen to.