org.oc.ocvolume.dsp
Class featureExtraction

java.lang.Object
  extended by org.oc.ocvolume.dsp.featureExtraction

public class featureExtraction
extends java.lang.Object

last updated on June 15, 2002
description: feature extraction class used to extract mel-frequency cepstral coefficients from input signal
calls: none
called by: volume, train
input: speech signal
output: mel-frequency cepstral coefficient

Author:
Danny Su

Field Summary
protected  fft FFT
          Fast Fourier Transformation
protected static int fftSize
          FFT Size (Must be be a power of 2)
protected static int frameLength
          Number of samples per frame
protected  double[][] frames
          All the frames of the input signal
protected  double[] hammingWindow
          hamming window values
protected static double lowerFilterFreq
          lower limit of filter (or 64 Hz?)
 int numCepstra
          Number of MFCCs per frame Modifed 4/5/06 to be non final variable - Daniel McEnnnis
protected static int numMelFilters
          number of mel filters (SPHINX-III uses 40)
protected static double preEmphasisAlpha
          Pre-Emphasis Alpha (Set to 0 if no pre-emphasis should be performed)
protected static int shiftInterval
          Number of overlapping samples (usually 50% of frame length)
protected static double upperFilterFreq
          upper limit of filter (or half of sampling freq.?)
 
Constructor Summary
featureExtraction()
           
 
Method Summary
 double[] cepCoefficients(double[] f)
          Cepstral coefficients are calculated from the output of the Non-linear Transformation method
calls: none
called by: featureExtraction
 int[] fftBinIndices(double samplingRate, int frameSize)
          calculates the FFT bin indices
calls: none
called by: featureExtraction 5-3-05 Daniel MCEnnis paramaterize sampling rate and frameSize
protected  void framing(double[] inputSignal)
          performs Frame Blocking to break down a speech signal into frames
calls: none
called by: featureExtraction
protected static double freqToMel(double freq)
          convert frequency to mel-frequency
calls: none
called by: featureExtraction
protected static double log10(double value)
          calculates logarithm with base 10
calls: none
called by: featureExtraction
 double[] magnitudeSpectrum(double[] frame)
          computes the magnitude spectrum of the input frame
calls: none
called by: featureExtraction
 double[] melFilter(double[] bin, int[] cbin)
          Calculate the output of the mel filter
calls: none called by: featureExtraction
 double[] nonLinearTransformation(double[] fbank)
          the output of mel filtering is subjected to a logarithm function (natural logarithm)
calls: none
called by: featureExtraction
protected static double[] preEmphasis(short[] inputSignal)
          perform pre-emphasis to equalize amplitude of high and low frequency
calls: none
called by: featureExtraction
 double[][] process(short[] inputSignal, double samplingRate)
          takes a speech signal and returns the Mel-Frequency Cepstral Coefficient (MFCC)
calls: fft
called by: volume, train 5-3-05 Daniel McEnnis - paramatrized sampling rate.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

frameLength

protected static final int frameLength
Number of samples per frame

See Also:
Constant Field Values

shiftInterval

protected static final int shiftInterval
Number of overlapping samples (usually 50% of frame length)

See Also:
Constant Field Values

numCepstra

public int numCepstra
Number of MFCCs per frame Modifed 4/5/06 to be non final variable - Daniel McEnnnis


fftSize

protected static final int fftSize
FFT Size (Must be be a power of 2)

See Also:
Constant Field Values

preEmphasisAlpha

protected static final double preEmphasisAlpha
Pre-Emphasis Alpha (Set to 0 if no pre-emphasis should be performed)

See Also:
Constant Field Values

lowerFilterFreq

protected static final double lowerFilterFreq
lower limit of filter (or 64 Hz?)

See Also:
Constant Field Values

upperFilterFreq

protected static final double upperFilterFreq
upper limit of filter (or half of sampling freq.?)

See Also:
Constant Field Values

numMelFilters

protected static final int numMelFilters
number of mel filters (SPHINX-III uses 40)

See Also:
Constant Field Values

frames

protected double[][] frames
All the frames of the input signal


hammingWindow

protected double[] hammingWindow
hamming window values


FFT

protected fft FFT
Fast Fourier Transformation

Constructor Detail

featureExtraction

public featureExtraction()
Method Detail

process

public double[][] process(short[] inputSignal,
                          double samplingRate)
takes a speech signal and returns the Mel-Frequency Cepstral Coefficient (MFCC)
calls: fft
called by: volume, train 5-3-05 Daniel McEnnis - paramatrized sampling rate.

Parameters:
inputSignal - Speech Waveform (16 bit integer data)
Returns:
Mel Frequency Cepstral Coefficients (32 bit floating point data)

fftBinIndices

public int[] fftBinIndices(double samplingRate,
                           int frameSize)
calculates the FFT bin indices
calls: none
called by: featureExtraction 5-3-05 Daniel MCEnnis paramaterize sampling rate and frameSize

Returns:
array of FFT bin indices

melFilter

public double[] melFilter(double[] bin,
                          int[] cbin)
Calculate the output of the mel filter
calls: none called by: featureExtraction


cepCoefficients

public double[] cepCoefficients(double[] f)
Cepstral coefficients are calculated from the output of the Non-linear Transformation method
calls: none
called by: featureExtraction

Parameters:
f - Output of the Non-linear Transformation method
Returns:
Cepstral Coefficients

nonLinearTransformation

public double[] nonLinearTransformation(double[] fbank)
the output of mel filtering is subjected to a logarithm function (natural logarithm)
calls: none
called by: featureExtraction

Parameters:
fbank - Output of mel filtering
Returns:
Natural log of the output of mel filtering

log10

protected static double log10(double value)
calculates logarithm with base 10
calls: none
called by: featureExtraction

Parameters:
value - Number to take the log of
Returns:
base 10 logarithm of the input values

freqToMel

protected static double freqToMel(double freq)
convert frequency to mel-frequency
calls: none
called by: featureExtraction

Parameters:
freq - Frequency
Returns:
Mel-Frequency

magnitudeSpectrum

public double[] magnitudeSpectrum(double[] frame)
computes the magnitude spectrum of the input frame
calls: none
called by: featureExtraction

Parameters:
frame - Input frame signal
Returns:
Magnitude Spectrum array

framing

protected void framing(double[] inputSignal)
performs Frame Blocking to break down a speech signal into frames
calls: none
called by: featureExtraction

Parameters:
inputSignal - Speech Signal (16 bit integer data)

preEmphasis

protected static double[] preEmphasis(short[] inputSignal)
perform pre-emphasis to equalize amplitude of high and low frequency
calls: none
called by: featureExtraction

Parameters:
inputSignal - Speech Signal (16 bit integer data)
Returns:
Speech signal after pre-emphasis (16 bit integer data)