You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to use dynamic phonation feature with mfcc/fbank feature as the input to feed a DNN.
The related code is shown as below: phonafeature=phonation.extract_features_file(filename, static=False, plots=False, fmt="npy") fbankfeature, energies = python_speech_features.fbank(filename, samplerate=16000, nfilt=40, nfft=768,winlen=0.04,winstep=0.02, winfunc=np.hamming)
Because I noticed that the dynamic phonation feature is using winlen=0.04,winstep=0.02, so I set the same parameter value to fbank function.
However, the len(phonafeature) and len(fbankfeature) for one filename input is not same.
e.g.: filename=demo.wav,this demo.wav has 15s long and 16000 sample rate.
the len(phonafeature) for this demo.wav is (430.7), the len(fbankfeature) is (749.40).
For concatenate propose, I have to padding the phonafeature with constant value 0 to match the len(fbankfeature), i.e., from (430.7) to (749.7). Then I can get the concatenated phonation plus fbank feature (749.47) for demo.wav
But I dont think it is the correct way to use dynamic phonation feature with mfcc/fbank feature as the input to feed a DNN.
Could you help me with this issue?
And why is the different in the length of the output phonation feature and fbank feature under same winlen and winstep?
Many thanks
The text was updated successfully, but these errors were encountered:
as follows, which only add feature vectors when f0!=0
for l in range(nF):
data_frame=data_audio[int(l*size_stepS):int(l*size_stepS+size_frameS)]
energy=10*logEnergy(data_frame)
if F0[l]!=0:
Amp.append(np.max(np.abs(data_frame)))
logE.append(energy)
if lnz>=12:
amp_arr=np.asarray([Amp[j] for j in range(lnz-12, lnz)])
#print(amp_arr)
apq.append(APQ(amp_arr))
if lnz>=6: # TODO:
f0arr=np.asarray([F0nz[j] for j in range(lnz-6, lnz)])
ppq.append(PPQ(1/f0arr))
lnz=lnz+1
In case you want to combine the features you should add an else: statement and add zero values to variables Amp, logE, apq, and ppq
In addition, you should consider that apq is only computed after the 12th frame because it is a log-term perturbation with respect to the 11th previous frames, thus they have to padd 11 zeros at the beginning for this feature.
The same ocurrs for ppq, but in this case with the first five frames
If you add these padds at the beginning for apq and ppq you should remove this line where it considers only those frames after the 12th, in orderto properly merge apq and ppq with the rest of the features
Hi @jcvasquezc ,
I would like to use dynamic phonation feature with mfcc/fbank feature as the input to feed a DNN.
The related code is shown as below:
phonafeature=phonation.extract_features_file(filename, static=False, plots=False, fmt="npy") fbankfeature, energies = python_speech_features.fbank(filename, samplerate=16000, nfilt=40, nfft=768,winlen=0.04,winstep=0.02, winfunc=np.hamming)
Because I noticed that the dynamic phonation feature is using winlen=0.04,winstep=0.02, so I set the same parameter value to fbank function.
However, the len(phonafeature) and len(fbankfeature) for one
filename
input is not same.e.g.:
filename=demo.wav
,thisdemo.wav
has 15s long and 16000 sample rate.the len(phonafeature) for this demo.wav is (430.7), the len(fbankfeature) is (749.40).
For concatenate propose, I have to padding the
phonafeature
with constant value 0 to match thelen(fbankfeature)
, i.e., from (430.7) to (749.7). Then I can get the concatenated phonation plus fbank feature (749.47) fordemo.wav
But I dont think it is the correct way to use dynamic phonation feature with mfcc/fbank feature as the input to feed a DNN.
Could you help me with this issue?
And why is the different in the length of the output phonation feature and fbank feature under same
winlen
andwinstep
?Many thanks
The text was updated successfully, but these errors were encountered: