Creating Kinetic Typography is a fairly complex process. We wanted to simplify the process to speech-only input.
At the TartanHacks hackathon at CMU, we built an HTML5 prototype that connects Google's speech-to-text API with the HTML5 audio API (for sound processing).
We mapped per-syllable loudness from the audio stream to the words returned by the speech processor, in order to display dynamic, real-time text.
I extracted the loudness level from the raw HTML5 audio stream in chunks that roughly map to the length of each syllable.
The demo currently requires Google Chrome Canary / Chrome Version ≥27).