Auto-Timing: Upload a transcript (a simple file with the text of what's said in the video), and through speech recognition technology we'll turn it into synchronized captions. Timing is the toughest part of creating captions, but now this should be much easier. The technology works best for videos with good sound quality and clear spoken English.
Auto-Captions: We use the same speech recognition technology to create machine-generated captions (which can then be translated into 51 languages). You can see auto-caps in action right now on a range of educational channels, such as UC Berkeley, Stanford, MIT, Yale, UCLA, Duke, UCTV, Columbia, PBS, National Geographic, Demand Media, UNSW and most Google channels, including YouTube's. Click on the menu button at the bottom right of the video player, then click CC and the arrow to its left, then click the new "Transcribe Audio" button. In time, we hope to expand this feature for many more YouTube videos.
Auto-caps is a continued step towards YouTube's goal of making video accessible everywhere (web, mobile, TV) and to everyone (other countries, languages, alternative access modes). It's also an example of using technology to enhance the video experience. For more details, please check this post on the Google Blog.
To learn more about how to use auto-caps and auto-timing, check out our help center article and this short video:
Hiroto Tokusei, Senior Product Manager, recently watched "(HD) 夜のゆりかもめ(新橋→豊洲) 01."
Update (11/29): On November 24, we posted a full-length video of this feature's announcement event in Washington, D.C. We have included English captions using the new auto-timing capabilities: