How to use dynamic phonation feature with mfcc/fbank feature as the input to feed a DNN. #32

ASR2020Guru · 2021-04-08T06:38:45Z

I would like to use dynamic phonation feature with mfcc/fbank feature as the input to feed a DNN.

The related code is shown as below:
phonafeature=phonation.extract_features_file(filename, static=False, plots=False, fmt="npy") fbankfeature, energies = python_speech_features.fbank(filename, samplerate=16000, nfilt=40, nfft=768,winlen=0.04,winstep=0.02, winfunc=np.hamming)

Because I noticed that the dynamic phonation feature is using winlen=0.04,winstep=0.02, so I set the same parameter value to fbank function.
However, the len(phonafeature) and len(fbankfeature) for one filename input is not same.
e.g.: filename=demo.wav,this demo.wav has 15s long and 16000 sample rate.
the len(phonafeature) for this demo.wav is (430.7), the len(fbankfeature) is (749.40).

For concatenate propose, I have to padding the phonafeature with constant value 0 to match the len(fbankfeature), i.e., from (430.7) to (749.7). Then I can get the concatenated phonation plus fbank feature (749.47) for demo.wav

But I dont think it is the correct way to use dynamic phonation feature with mfcc/fbank feature as the input to feed a DNN.

Could you help me with this issue?
And why is the different in the length of the output phonation feature and fbank feature under same winlen and winstep?

Many thanks

The text was updated successfully, but these errors were encountered:

jcvasquezc · 2021-04-08T07:30:38Z

Hi @ASR2020Guru

You are right on the fact that a zero padding is not the right way to combine those features

There are two reasons why you are getting different lengths for the phonation and fbank features

Phonation features are only computed for speech segments where there is F0 values, i.e., only for voiced segments

Check the code in

DisVoice/phonation/phonation.py

Line 193 in 67c2f0c

for l in range(nF):

as follows, which only add feature vectors when f0!=0

for l in range(nF):
    data_frame=data_audio[int(l*size_stepS):int(l*size_stepS+size_frameS)]
    energy=10*logEnergy(data_frame)
    if F0[l]!=0:
        Amp.append(np.max(np.abs(data_frame)))
        logE.append(energy)
        if lnz>=12: 
            amp_arr=np.asarray([Amp[j] for j in range(lnz-12, lnz)])
            #print(amp_arr)
            apq.append(APQ(amp_arr))
        if lnz>=6: # TODO:
            f0arr=np.asarray([F0nz[j] for j in range(lnz-6, lnz)])
            ppq.append(PPQ(1/f0arr))
        lnz=lnz+1

In case you want to combine the features you should add an else: statement and add zero values to variables Amp, logE, apq, and ppq

In addition, you should consider that apq is only computed after the 12th frame because it is a log-term perturbation with respect to the 11th previous frames, thus they have to padd 11 zeros at the beginning for this feature.
The same ocurrs for ppq, but in this case with the first five frames

If you add these padds at the beginning for apq and ppq you should remove this line where it considers only those frames after the 12th, in orderto properly merge apq and ppq with the rest of the features

DisVoice/phonation/phonation.py

Line 224 in 67c2f0c

    
           feat_mat=np.vstack((DF0[11:], DDF0[10:], Jitter[12:], Shimmer[12:], apq, ppq[6:], logE[12:])).T

If you have further questions, let me know and I can help you

ASR2020Guru · 2021-04-08T20:56:09Z

Hi @jcvasquezc ,

Thanks for your quick and helpful reply.

Now I managed to combine these features correctly.

I will let you know if I have any further questions.

Cheers

jcvasquezc added the question label Sep 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use dynamic phonation feature with mfcc/fbank feature as the input to feed a DNN. #32

How to use dynamic phonation feature with mfcc/fbank feature as the input to feed a DNN. #32

ASR2020Guru commented Apr 8, 2021

jcvasquezc commented Apr 8, 2021

ASR2020Guru commented Apr 8, 2021

How to use dynamic phonation feature with mfcc/fbank feature as the input to feed a DNN. #32

How to use dynamic phonation feature with mfcc/fbank feature as the input to feed a DNN. #32

Comments

ASR2020Guru commented Apr 8, 2021

jcvasquezc commented Apr 8, 2021

ASR2020Guru commented Apr 8, 2021