jAudioFeatureExtractor.ACE.DataTypes
Class DataSet

java.lang.Object
  extended by jAudioFeatureExtractor.ACE.DataTypes.DataSet
All Implemented Interfaces:
java.io.Serializable

public class DataSet
extends java.lang.Object
implements java.io.Serializable

Objects of this class each hold feature values for an item to be classified. Methods are including for displaying these values as formatted strings, saving them to disk or loading them to disk. A method is also available for reconciling these objects with FeatureDefinition objects. Methods are also available for extracting feature values in String form.

Author:
Cory McKay
See Also:
Serialized Form

Field Summary
 java.lang.String[] feature_names
          The names of the features in each corresponding (by first indice) entry of feature_values.
 double[][] feature_values
          The feature values for this DataSet as a whole.
 java.lang.String identifier
          The name of the data set.
 DataSet parent
          If this object is a sub-set of another DataSet, this field points to that parent dataset.
 double start
          Identifies the start of a sub-set of a DataSet.
 double stop
          Identifies the end of a sub-set of a DataSet.
 DataSet[] sub_sets
          Sub-sets of this DataSet.
 
Constructor Summary
DataSet()
          Generate an empty DataSet.
 
Method Summary
 java.lang.String getDataSetDescription(int depth)
          Generate a formatted strind detailing the contents of this DataSet.
static java.lang.String getDataSetDescriptions(DataSet[] dataset)
          Returns a formatted text description of the given DataSet objects.
 java.lang.String[][][] getFeatureValuesOfSubSections(FeatureDefinition[] definitions)
          Returns the feature values stored in the DataSets in the sub_sets field of this object.
 java.lang.String[][] getFeatureValuesOfTopLevel(FeatureDefinition[] definitions)
          Returns the feature values stored in the feature_values field of this object.
 void orderAndCompactFeatures(FeatureDefinition[] definitions, boolean is_top_level)
          Processes this DataSet based on the given definitions parameter.
static DataSet[] parseDataSetFile(java.lang.String data_set_file_path)
          Parses a feature_vector_file XML file and returns an array of DataSet objects holding its contents.
static DataSet[] parseDataSetFile(java.lang.String data_set_file_path, FeatureDefinition[] definitions)
          Parses a feature_vector_file XML file and returns an array of DataSet objects holding its contents.
static DataSet[] parseDataSetFiles(java.lang.String[] data_set_file_paths, FeatureDefinition[] definitions)
          Parses a several feature_vector_file XML files and returns an array of DataSet objects holding the combined contents of all of the files.
static void saveDataSets(DataSet[] data_sets, FeatureDefinition[] definitions, java.io.File to_save_to, java.lang.String comments)
          Saves a feature_vector_file XML file with the contents specified in the given DataSet array and the comments specified in the comments parameter.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

identifier

public java.lang.String identifier
The name of the data set. This name should be unique among each group of data sets. Should be null for non-top-level DataSets.


sub_sets

public DataSet[] sub_sets
Sub-sets of this DataSet. Each such sub-set can serve as an instance that is individually classifiable. For example, sub-sets could consist of windows of audio extracted from the recording that makes the overall DataSet. The sub_sets field should be null if there are no sub-sets that can be individually classified.


start

public double start
Identifies the start of a sub-set of a DataSet. Set to NaN if this object is a top-level DataSet.


stop

public double stop
Identifies the end of a sub-set of a DataSet. Set to NaN if this object is a top-level DataSet.


feature_values

public double[][] feature_values
The feature values for this DataSet as a whole. If there are any sub-sets, they will store there own feature values, and these will not be referenced here. The first indice identifies the feature and the second indice identifies the dimension of the feature. It is clear that features of arbitrary dimensions may be accomodated. Features whose value or values are missing are assigned a value of null. This field is assigned a value of null if no features have been extracted. It is assumed that the Java Class calling the DataSet knows the ordering and identity of the features of the DataSet and its sub-sets. The feature_values may be ordered based on FeatureDefinitions using the orderAndCompactFeatures method. Individual features may also be assigned null values if they are unknown or inappropriate.


feature_names

public java.lang.String[] feature_names
The names of the features in each corresponding (by first indice) entry of feature_values. These are often only stored here temporarily until they can be accessed and stored externally in a more efficient fashion. This field is therefore often null, even when the feature_values field is not.


parent

public DataSet parent
If this object is a sub-set of another DataSet, this field points to that parent dataset. Otherwise this field is null.

Constructor Detail

DataSet

public DataSet()
Generate an empty DataSet.

Method Detail

orderAndCompactFeatures

public void orderAndCompactFeatures(FeatureDefinition[] definitions,
                                    boolean is_top_level)
                             throws java.lang.Exception
Processes this DataSet based on the given definitions parameter. The feature values stored in the feature_values field are re-ordered based on the correspondance between the feature_names field and the defintions parameter. All features in feature_names that are not referred to in definitions are delected. All features referred to in definitions but not present in feature_names are given a null entry in feature_values. The feature_names field is set to null at the end of processing in order to save memory.

This method also processes the sub_sets of this DataSet recursively.

The end result of running this method is that the features in feature_values that are referred to in both feature_names and definitions are given the same order as in definitions. Any features in definitions that are not present in feature_names are set to null in feature_values. Any features in feature_names that are not in definitions are deleted. At the end of running this method, feature_names is null and feature_values has the same number of entries as definitions.

The purpose of running this method is to put this DataSet in a configuration that can be stored and processed more efficiently and to verify the validity of the stored features.

Parameters:
definitions - The feature definitions to order the feature_values field by.
is_top_level - True if this DataSet is a top-level DataSet (i.e. not a sub-set of another DataSet). This parameter should always be true when this method is called externally.
Throws:
java.lang.Exception - An informative exception is thrown if the dimensions of a stored feature does not match the dimensions that it should have according to its definition. An excpetion is also thrown if features in the sub_sets that have values of false for the is_sequential field of the corresponding FeatureDefinition are present in the sub-set.

getFeatureValuesOfTopLevel

public java.lang.String[][] getFeatureValuesOfTopLevel(FeatureDefinition[] definitions)
Returns the feature values stored in the feature_values field of this object. The first indice of the returned array denotes the feature and the second indice indicates the dimension of the feature (in order to accomodate multi-dimensional features).

The returned array is null if no features have been extracted. If a particular feature value is not available, then a question mark is returned in the appropriate entry.

Parameters:
definitions - Feature definitions that are used to get the dimensions of unknown features.
Returns:
The array of feature values.

getFeatureValuesOfSubSections

public java.lang.String[][][] getFeatureValuesOfSubSections(FeatureDefinition[] definitions)
Returns the feature values stored in the DataSets in the sub_sets field of this object. The first indice of the returned array denotes the sub-section. The second indice indicates the feature and the third indice indicates the dimension of the feature (in order to accomodate multi- dimensional features).

The returned array is null if no sub-sections are available. The first dimension is null if no features have been extracted for a given sub- section. If a particular feature value is not available, then a question mark is returned in the appropriate entry.

Parameters:
definitions - Feature definitions that are used to get the dimensions of unknown features.
Returns:
The array of feature values.

getDataSetDescription

public java.lang.String getDataSetDescription(int depth)
Generate a formatted strind detailing the contents of this DataSet.

Parameters:
depth - How deep this DataSet is in a hierarchy of DataSets (i.e. through the sub_sets field). This parameter should generally be 0 when called externally, as this method operates recursively.
Returns:
A formatted string describing this DataSet.

getDataSetDescriptions

public static java.lang.String getDataSetDescriptions(DataSet[] dataset)
Returns a formatted text description of the given DataSet objects.

Parameters:
dataset - The data sets to describe.
Returns:
The formatted description.

parseDataSetFile

public static DataSet[] parseDataSetFile(java.lang.String data_set_file_path)
                                  throws java.lang.Exception
Parses a feature_vector_file XML file and returns an array of DataSet objects holding its contents. An exception is thrown if the file is invalid in some way.

Parameters:
data_set_file_path - The path of the XML file to parse.
Throws:
java.lang.Exception - Informative exceptions is thrown if an invalid file or file path is specified.

parseDataSetFile

public static DataSet[] parseDataSetFile(java.lang.String data_set_file_path,
                                         FeatureDefinition[] definitions)
                                  throws java.lang.Exception
Parses a feature_vector_file XML file and returns an array of DataSet objects holding its contents. An exception is thrown if the file is invalid in some way.

Also processes each resulting DataSet in order to reconcile it with the given definitions. See the orderAndCompactFeatures method for details.

Parameters:
data_set_file_path - The path of the XML file to parse.
definitions - FeatureDefinitions to use for formatting and validating the contents of the file to be parsed.
Throws:
java.lang.Exception - Informative exceptions is thrown if an invalid file or file path is specified. An exception is also thrown if the given feature definitions are incompatible with the contents of the file.

parseDataSetFiles

public static DataSet[] parseDataSetFiles(java.lang.String[] data_set_file_paths,
                                          FeatureDefinition[] definitions)
                                   throws java.lang.Exception
Parses a several feature_vector_file XML files and returns an array of DataSet objects holding the combined contents of all of the files. An exception is thrown if the file is invalid in some way.

Also processes each resulting DataSet in order to reconcile it with the given definitions. See the orderAndCompactFeatures method for details. This will not occur if the definitions parameter is null.

Parameters:
data_set_file_paths - The paths of the XML files to parse.
definitions - FeatureDefinitions to use for formatting and validating the contents of the files to be parsed.
Throws:
java.lang.Exception - Informative exceptions is thrown if an invalid file or file path is specified. An exception is also thrown if the given feature definitions are incompatible with the contents of a file.

saveDataSets

public static void saveDataSets(DataSet[] data_sets,
                                FeatureDefinition[] definitions,
                                java.io.File to_save_to,
                                java.lang.String comments)
                         throws java.lang.Exception
Saves a feature_vector_file XML file with the contents specified in the given DataSet array and the comments specified in the comments parameter. Uses the feature_names in each of the data_sets if they are present, and uses those in the definitions parameter if they are not present in a given DataSet. If all data_sets contain feature_names, then the passed value of definitions may be null. This method does not apply the orderAndCompactFeatures method.

In general, it is best to have applied the orderAndCompactFeatures method to data_sets before calling this saveDataSets method.

Parameters:
data_sets - The DataSets to save.
definitions - The FeatureDefinitions to base feature names on if they are not present in individual DataSets. May be null.
to_save_to - The file to save to.
comments - Any comments to be saved inside the comments element of the XML file.
Throws:
java.lang.Exception - An informative exception is thrown if the file cannot be saved or if feature names are available in neither individual data_sets nor in definitions.