cern.jet.stat.quantile
Class EquiDepthHistogram

java.lang.Object
  extended by cern.colt.PersistentObject
      extended by cern.jet.stat.quantile.EquiDepthHistogram
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable

public class EquiDepthHistogram
extends PersistentObject

Read-only equi-depth histogram for selectivity estimation. Assume you have collected statistics over a data set, among them a one-dimensional equi-depth histogram (quantiles). Then an applications or DBMS might want to estimate the selectivity of some range query [from,to], i.e. the percentage of data set elements contained in the query range. This class does not collect equi-depth histograms but only space efficiently stores already produced histograms and provides operations for selectivity estimation. Uses linear interpolation.

This class stores a list l of float values for which holds:

  • Let v be a list of values (sorted ascending) an equi-depth histogram has been computed over.
  • Let s=l.length.
  • Let p=(0, 1/s-1), 2/s-1,..., s-1/s-1=1.0) be a list of the s percentages.
  • Then for each i=0..s-1: l[i] = e : v.contains(e) && v[0],..., v[p[i]*v.length] <= e.
  • (In particular: l[0]=min(v)=v[0] and l[s-1]=max(v)=v[s-1].)
  • Author:
    wolfgang.hoschek@cern.ch
    See Also:
    Serialized Form

    Field Summary
    protected  float[] binBoundaries
               
     
    Fields inherited from class cern.colt.PersistentObject
    serialVersionUID
     
    Constructor Summary
    EquiDepthHistogram(float[] quantileElements)
              Constructs an equi-depth histogram with the given quantile elements.
     
    Method Summary
     int binOfElement(float element)
              Returns the bin index of the given element.
     int bins()
              Returns the number of bins.
     float endOfBin(int binIndex)
              Returns the end of the range associated with the given bin.
     double percentFromTo(float from, float to)
              Returns the percentage of elements in the range (from,to].
     double phi(float element)
              Returns how many percent of the elements contained in the receiver are <= element.
     int size()
              Deprecated. Deprecated. Returns the number of bin boundaries.
     float startOfBin(int binIndex)
              Returns the start of the range associated with the given bin.
    static void test(float element)
              Not yet commented.
     
    Methods inherited from class cern.colt.PersistentObject
    clone
     
    Methods inherited from class java.lang.Object
    equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
     

    Field Detail

    binBoundaries

    protected float[] binBoundaries
    Constructor Detail

    EquiDepthHistogram

    public EquiDepthHistogram(float[] quantileElements)
    Constructs an equi-depth histogram with the given quantile elements. Quantile elements must be sorted ascending and have the form specified in the class documentation.

    Method Detail

    binOfElement

    public int binOfElement(float element)
    Returns the bin index of the given element. In other words, returns a handle to the range the element falls into.

    Parameters:
    element - the element to search for.
    Throws:
    java.lang.IllegalArgumentException - if the element is not contained in any bin.

    bins

    public int bins()
    Returns the number of bins. In other words, returns the number of subdomains partitioning the entire value domain.


    endOfBin

    public float endOfBin(int binIndex)
    Returns the end of the range associated with the given bin.

    Throws:
    java.lang.ArrayIndexOutOfBoundsException - if binIndex < 0 || binIndex >= bins().

    percentFromTo

    public double percentFromTo(float from,
                                float to)
    Returns the percentage of elements in the range (from,to]. Does linear interpolation.

    Parameters:
    from - the start point (exclusive).
    to - the end point (inclusive).

    phi

    public double phi(float element)
    Returns how many percent of the elements contained in the receiver are <= element. Does linear interpolation.

    Parameters:
    the - element to search for.

    size

    public int size()
    Deprecated. Deprecated. Returns the number of bin boundaries.


    startOfBin

    public float startOfBin(int binIndex)
    Returns the start of the range associated with the given bin.

    Throws:
    java.lang.ArrayIndexOutOfBoundsException - if binIndex < 0 || binIndex >= bins().

    test

    public static void test(float element)
    Not yet commented.