Randomized phases preserves speech content and identity

I decided to make a quick demonstration of the effect of phase information on speech reconstruction. I took the short-time Fourier transform of one of the TIMIT examples (/TRAIN/DR1/FDAW0/SI1406.wav), extracted and randomized its phase values, and then inverted it back into audio using the randomized phases. You can listen to it here.

The result is clearly distorted, but the most important information, namely the content of the speech and the identity of the speaker, is preserved. I think this justifies the use of a magnitude only time-frequency representation, at least to start with.

Relatedly, I’ve been thinking about the invertibility of the wavelet transform and how that might impact its usefulness as a representation for speech synthesis. I think an interesting experiment would be to do something similar to what João suggested last class: First, set up a CNN to perform a wavelet transform. Then, use the corresponding deconvolutional network (by transposing the kernels) as the inverse transform. We could then train the network to reconstruct the input. If the network was able to make accurate reconstructions, this would show that it is indeed possible to learn the inverse wavelet transform using convolutional neural networks (which would make it a very appropriate input representation for our speech synthesis task).

 

Advertisements

2 thoughts on “Randomized phases preserves speech content and identity

  1. Hi Jess. I came across your blog after Ian Goodfellow mentioned you made some updates to the GaussianVisLayer class in pylearn2. (pylearn2-users link: https://groups.google.com/forum/#!topic/pylearn-users/gU6szgQIvNE)

    With your most recent changes committed, I am able to also execute the GaussianVisLayer. So first thank you for that update!

    I was wondering if you’d be willing to share some of your more recent YAML files (or equivalent .py files) that define the models you’ve been playing with? There aren’t too many examples of using the DBM class outside the single layer rbm.py and I’d love to see more that use the convolutional layers in particular.

    Thanks!

    Tom

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s