

PREV CLASS NEXT CLASS  FRAMES NO FRAMES  
SUMMARY: NESTED  FIELD  CONSTR  METHOD  DETAIL: FIELD  CONSTR  METHOD 
java.lang.Object cern.colt.PersistentObject hep.aida.bin.AbstractBin hep.aida.bin.AbstractBin1D hep.aida.bin.StaticBin1D hep.aida.bin.MightyStaticBin1D hep.aida.bin.QuantileBin1D hep.aida.bin.DynamicBin1D
public class DynamicBin1D
1dimensional rebinnable bin holding double elements; Efficiently computes advanced statistics of data sequences. Technically speaking, a multiset (or bag) with efficient statistics operations defined upon. First see the package summary and javadoc tree view to get the broad picture.
The data filled into a DynamicBin1D is internally preserved in the bin.
As a consequence this bin can compute more than only basic statistics.
On the other hand side, if you add huge amounts of elements, you may run out of memory (each element takes 8 bytes).
If this drawbacks matter, consider to use StaticBin1D
,
which overcomes them at the expense of limited functionality.
This class is fully thread safe (all public methods are synchronized). Thus, you can have one or more threads adding to the bin as well as one or more threads reading and viewing the statistics of the bin while it is filled. For high performance, add data in large chunks (buffers) via method addAllOf rather than piecewise via method add.
If your favourite statistics measure is not directly provided by this class,
check out Descriptive
in combination with methods elements()
and sortedElements()
.
Implementation: Lazy evaluation, caching, incremental maintainance.
Descriptive
,
Serialized FormField Summary  

protected DoubleArrayList 
elements
The elements contained in this bin. 
protected boolean 
fixedOrder
Preserve element order under all circumstances? 
protected boolean 
isIncrementalStatValid

protected boolean 
isSorted

protected boolean 
isSumOfInversionsValid

protected boolean 
isSumOfLogarithmsValid

protected DoubleArrayList 
sortedElements
The elements contained in this bin, sorted ascending. 
Fields inherited from class hep.aida.bin.QuantileBin1D 

finder 
Fields inherited from class hep.aida.bin.MightyStaticBin1D 

hasSumOfInversions, hasSumOfLogarithms, sumOfInversions, sumOfLogarithms, sumOfPowers 
Fields inherited from class hep.aida.bin.StaticBin1D 

arguments, max, min, size, sum, sum_xx 
Fields inherited from class cern.colt.PersistentObject 

serialVersionUID 
Constructor Summary  

DynamicBin1D()
Constructs and returns an empty bin; implicitly calls setFixedOrder(false) . 
Method Summary  

void 
add(double element)
Adds the specified element to the receiver. 
void 
addAllOfFromTo(DoubleArrayList list,
int from,
int to)
Adds the part of the specified list between indexes from (inclusive) and to (inclusive) to the receiver. 
double 
aggregate(DoubleDoubleFunction aggr,
DoubleFunction f)
Applies a function to each element and aggregates the results. 
void 
clear()
Removes all elements from the receiver. 
protected void 
clearAllMeasures()
Resets the values of all measures. 
java.lang.Object 
clone()
Returns a deep copy of the receiver. 
double 
correlation(DynamicBin1D other)
Returns the correlation of two bins, which is corr(x,y) = covariance(x,y) / (stdDev(x)*stdDev(y)) (Pearson's correlation coefficient). 
double 
covariance(DynamicBin1D other)
Returns the covariance of two bins, which is cov(x,y) = (1/size()) * Sum((x[i]mean(x)) * (y[i]mean(y))). 
protected DoubleArrayList 
elements_unsafe()
Returns the currently stored elements; WARNING: not a copy of them. 
DoubleArrayList 
elements()
Returns a copy of the currently stored elements. 
boolean 
equals(java.lang.Object object)
Returns whether two bins are equal. 
void 
frequencies(DoubleArrayList distinctElements,
IntArrayList frequencies)
Computes the frequency (number of occurances, count) of each distinct element. 
int 
getMaxOrderForSumOfPowers()
Returns Integer.MAX_VALUE, the maximum order k for which sums of powers are retrievable. 
int 
getMinOrderForSumOfPowers()
Returns Integer.MIN_VALUE, the minimum order k for which sums of powers are retrievable. 
protected void 
invalidateAll()

boolean 
isRebinnable()
Returns true. 
double 
max()
Returns the maximum. 
double 
min()
Returns the minimum. 
double 
moment(int k,
double c)
Returns the moment of kth order with value c, which is Sum( (x[i]c)^{k} ) / size(). 
double 
quantile(double phi)
Returns the exact phiquantile; that is, the smallest contained element elem for which holds that phi percent of elements are less than elem. 
double 
quantileInverse(double element)
Returns exactly how many percent of the elements contained in the receiver are <= element. 
DoubleArrayList 
quantiles(DoubleArrayList percentages)
Returns the exact quantiles of the specified percentages. 
boolean 
removeAllOf(DoubleArrayList list)
Removes from the receiver all elements that are contained in the specified list. 
void 
sample(int n,
boolean withReplacement,
RandomEngine randomGenerator,
DoubleBuffer buffer)
Uniformly samples (chooses) n random elements with or without replacement from the contained elements and adds them to the given buffer. 
DynamicBin1D 
sampleBootstrap(DynamicBin1D other,
int resamples,
RandomEngine randomGenerator,
BinBinFunction1D function)
Generic bootstrap resampling. 
void 
setFixedOrder(boolean fixedOrder)
Determines whether the receivers internally preserved elements may be reordered or not. 
int 
size()
Returns the number of elements contained in the receiver. 
protected void 
sort()
Sorts elements if not already sorted. 
protected DoubleArrayList 
sortedElements_unsafe()
Returns the currently stored elements, sorted ascending; WARNING: not a copy of them; Thus, improper usage of the returned list may not only corrupt the receiver's internal state, but also break thread safety! Only provided for performance and memory sensitive applications. 
DoubleArrayList 
sortedElements()
Returns a copy of the currently stored elements, sorted ascending. 
void 
standardize(double mean,
double standardDeviation)
Modifies the receiver to be standardized. 
double 
sum()
Returns the sum of all elements, which is Sum( x[i] ). 
double 
sumOfInversions()
Returns the sum of inversions, which is Sum( 1 / x[i] ). 
double 
sumOfLogarithms()
Returns the sum of logarithms, which is Sum( Log(x[i]) ). 
double 
sumOfPowers(int k)
Returns the kth order sum of powers, which is Sum( x[i]^{k} ). 
double 
sumOfSquares()
Returns the sum of squares, which is Sum( x[i] * x[i] ). 
java.lang.String 
toString()
Returns a String representation of the receiver. 
void 
trim(int s,
int l)
Removes the s smallest and l largest elements from the receiver. 
double 
trimmedMean(int s,
int l)
Returns the trimmed mean. 
void 
trimToSize()
Trims the capacity of the receiver to be the receiver's current size. 
protected void 
updateIncrementalStats()
assertion: isBasicParametersValid == false 
protected void 
updateSumOfInversions()
assertion: isBasicParametersValid == false 
protected void 
updateSumOfLogarithms()

protected void 
validateAll()

Methods inherited from class hep.aida.bin.QuantileBin1D 

compareWith, median, sizeOfRange, splitApproximately, splitApproximately 
Methods inherited from class hep.aida.bin.MightyStaticBin1D 

geometricMean, harmonicMean, hasSumOfInversions, hasSumOfLogarithms, hasSumOfPowers, kurtosis, product, setMaxOrderForSumOfPowers, skew, xcheckOrder, xequals, xhasSumOfPowers, xisLegalOrder 
Methods inherited from class hep.aida.bin.AbstractBin1D 

addAllOf, buffered, mean, relError, rms, standardDeviation, standardError, variance 
Methods inherited from class hep.aida.bin.AbstractBin 

center, center, error, error, offset, offset, value, value 
Methods inherited from class java.lang.Object 

finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait 
Field Detail 

protected DoubleArrayList elements
protected DoubleArrayList sortedElements
protected boolean fixedOrder
protected boolean isSorted
protected boolean isIncrementalStatValid
protected boolean isSumOfInversionsValid
protected boolean isSumOfLogarithmsValid
Constructor Detail 

public DynamicBin1D()
setFixedOrder(false)
.
Method Detail 

public void add(double element)
add
in class StaticBin1D
element
 element to be appended.public void addAllOfFromTo(DoubleArrayList list, int from, int to)
addAllOfFromTo
in class QuantileBin1D
list
 the list of which elements shall be added.from
 the index of the first element to be added (inclusive).to
 the index of the last element to be added (inclusive).
java.lang.IndexOutOfBoundsException
 if list.size()>0 && (from<0  from>to  to>=list.size()).public double aggregate(DoubleDoubleFunction aggr, DoubleFunction f)
Example:
cern.jet.math.Functions F = cern.jet.math.Functions.functions; bin = 0 1 2 3 // Sum( x[i]*x[i] ) bin.aggregate(F.plus,F.square); > 14For further examples, see the package doc.
aggr
 an aggregation function taking as first argument the current aggregation and as second argument the transformed current element.f
 a function transforming the current element.
Functions
public void clear()
clear
in class QuantileBin1D
protected void clearAllMeasures()
clearAllMeasures
in class MightyStaticBin1D
public java.lang.Object clone()
clone
in class QuantileBin1D
public double correlation(DynamicBin1D other)
other
 the bin to compare with.
java.lang.IllegalArgumentException
 if size() != other.size().public double covariance(DynamicBin1D other)
other
 the bin to compare with.
java.lang.IllegalArgumentException
 if size() != other.size().public DoubleArrayList elements()
setFixedOrder(boolean)
.
protected DoubleArrayList elements_unsafe()
... double sinSum = 0; synchronized (dynamicBin) { // lock out anybody else DoubleArrayList elements = dynamicBin.elements_unsafe(); // read each element and do something with it, for example double[] values = elements.elements(); // zerocopy for (int i=dynamicBin.size(); i >=0; ) { sinSum += Math.sin(values[i]); } } System.out.println(sinSum); ...Concerning the order in which elements are returned, see
setFixedOrder(boolean)
.
public boolean equals(java.lang.Object object)
Definition of Equality for multisets: A,B are equal <=> A is a superset of B and B is a superset of A. (Elements must occur the same number of times, order is irrelevant.)
equals
in class AbstractBin1D
public void frequencies(DoubleArrayList distinctElements, IntArrayList frequencies)
Distinct elements are filled into distinctElements, starting at index 0. The frequency of each distinct element is filled into frequencies, starting at index 0. Further, both distinctElements and frequencies are sorted ascending by "element" (in sync, of course). As a result, the smallest distinct element (and its frequency) can be found at index 0, the second smallest distinct element (and its frequency) at index 1, ..., the largest distinct element (and its frequency) at index distinctElements.size()1.
Example:
elements = (8,7,6,6,7) > distinctElements = (6,7,8), frequencies = (2,2,1)
distinctElements
 a list to be filled with the distinct elements; can have any size.frequencies
 a list to be filled with the frequencies; can have any size; set this parameter to null to ignore it.public int getMaxOrderForSumOfPowers()
getMaxOrderForSumOfPowers
in class MightyStaticBin1D
MightyStaticBin1D.hasSumOfPowers(int)
,
sumOfPowers(int)
public int getMinOrderForSumOfPowers()
getMinOrderForSumOfPowers
in class MightyStaticBin1D
MightyStaticBin1D.hasSumOfPowers(int)
,
sumOfPowers(int)
protected void invalidateAll()
element
 element to be appended.public boolean isRebinnable()
isRebinnable
in class StaticBin1D
public double max()
max
in class StaticBin1D
public double min()
min
in class StaticBin1D
public double moment(int k, double c)
moment
in class MightyStaticBin1D
k
 the order; any number  can be less than zero, zero or greater than zero.c
 any number.
public double quantile(double phi)
quantile
in class QuantileBin1D
phi
 must satisfy 0 < phi < 1.
public double quantileInverse(double element)
quantileInverse
in class QuantileBin1D
element
 the element to search for.
public DoubleArrayList quantiles(DoubleArrayList percentages)
quantiles
in class QuantileBin1D
percentages
 the percentages for which quantiles are to be computed.
Each percentage must be in the interval (0.0,1.0]. percentages must be sorted ascending.
public boolean removeAllOf(DoubleArrayList list)
list
 the elements to be removed.
true
if the receiver changed as a result of the call.public void sample(int n, boolean withReplacement, RandomEngine randomGenerator, DoubleBuffer buffer)
buffered
.
n
 the number of elements to choose.withReplacement
 true samples with replacement, otherwise samples without replacement.randomGenerator
 a random number generator. Set this parameter to null to use a default random number generator seeded with the current time.buffer
 the buffer to which chosen elements will be added.
java.lang.IllegalArgumentException
 if !withReplacement && n > size().cern.jet.random.sampling
public DynamicBin1D sampleBootstrap(DynamicBin1D other, int resamples, RandomEngine randomGenerator, BinBinFunction1D function)
Finally returns the auxiliary bootstrap bin b3 from which the measure of interest can be read off.
Background:
Also see a more indepth discussion on bootstrapping and related randomization methods. The classical statistical test for comparing the means of two samples is the ttest. Unfortunately, this test assumes that the two samples each come from a normal distribution and that these distributions have the same standard deviation. Quite often, however, data has a distribution that is nonnormal in many ways. In particular, distributions are often unsymmetric. For such data, the ttest may produce misleading results and should thus not be used. Sometimes asymmetric data can be transformed into normally distributed data by taking e.g. the logarithm and the ttest will then produce valid results, but this still requires postulation of a certain distribution underlying the data, which is often not warranted, because too little is known about the data composition.
Bootstrap resampling of means differences (and other differences) is a robust replacement for the ttest and does not require assumptions about the actual distribution of the data. The idea of bootstrapping is quite simple: simulation. The only assumption required is that the two samples a and b are representative for the underlying distribution with respect to the statistic that is being tested  this assumption is of course implicit in all statistical tests. We can now generate lots of further samples that correspond to the two given ones, by sampling with replacement. This process is called resampling. A resample can (and usually will) have a different mean than the original one and by drawing hundreds or thousands of such resamples a_{r} from a and b_{r} from b we can compute the socalled bootstrap distribution of all the differences "mean of a_{r} minus mean of b_{r}". That is, a bootstrap bin filled with the differences. Now we can compute, what fraction of these differences is, say, greater than zero. Let's assume we have computed 1000 resamples of both a and b and found that only 8 of the differences were greater than zero. Then 8/1000 or 0.008 is the pvalue (probability) for the hypothesis that the mean of the distribution underlying a is actually larger than the mean of the distribution underlying b. From this bootstrap test, we can clearly reject the hypothesis.
Instead of using means differences, we can also use other differences, for example, the median differences.
Instead of pvalues we can also read arbitrary confidence intervals from the bootstrap bin. For example, 90% of all bootstrap differences are left of the value 3.5, hence a left 90% confidence interval for the difference would be (3.5,infinity); in other words: the difference is 3.5 or larger with probability 0.9.
Sometimes we would like to compare not only means and medians, but also the variability (spread) of two samples. The conventional method of doing this is the Ftest, which compares the standard deviations. It is related to the ttest and, like the latter, assumes the two samples to come from a normal distribution. The Ftest is very sensitive to data with deviations from normality. Instead we can again resort to more robust bootstrap resampling and compare a measure of spread, for example the interquartile range. This way we compute a bootstrap resampling of interquartile range differences in order to arrive at a test for inequality or variability.
Example:
// v1,v2  the two samples to compare against each other double[] v1 = { 1, 2, 3, 4, 5, 6, 7, 8, 9,10, 21, 22,23,24,25,26,27,28,29,30,31}; double[] v2 = {10,11,12,13,14,15,16,17,18,19, 20, 30,31,32,33,34,35,36,37,38,39}; hep.aida.bin.DynamicBin1D X = new hep.aida.bin.DynamicBin1D(); hep.aida.bin.DynamicBin1D Y = new hep.aida.bin.DynamicBin1D(); X.addAllOf(new cern.colt.list.DoubleArrayList(v1)); Y.addAllOf(new cern.colt.list.DoubleArrayList(v2)); cern.jet.random.engine.RandomEngine random = new cern.jet.random.engine.MersenneTwister(); // bootstrap resampling of differences of means: BinBinFunction1D diff = new BinBinFunction1D() { public double apply(DynamicBin1D x, DynamicBin1D y) {return x.mean()  y.mean();} }; // bootstrap resampling of differences of medians: BinBinFunction1D diff = new BinBinFunction1D() { public double apply(DynamicBin1D x, DynamicBin1D y) {return x.median()  y.median();} }; // bootstrap resampling of differences of interquartile ranges: BinBinFunction1D diff = new BinBinFunction1D() { public double apply(DynamicBin1D x, DynamicBin1D y) {return (x.quantile(0.75)x.quantile(0.25))  (y.quantile(0.75)y.quantile(0.25)); } }; DynamicBin1D boot = X.sampleBootstrap(Y,1000,random,diff); cern.jet.math.Functions F = cern.jet.math.Functions.functions; System.out.println("pvalue="+ (boot.aggregate(F.plus, F.greater(0)) / boot.size())); System.out.println("left 90% confidence interval = ("+boot.quantile(0.9) + ",infinity)"); > // bootstrap resampling of differences of means: pvalue=0.0080 left 90% confidence interval = (3.571428571428573,infinity) // bootstrap resampling of differences of medians: pvalue=0.36 left 90% confidence interval = (5.0,infinity) // bootstrap resampling of differences of interquartile ranges: pvalue=0.5699 left 90% confidence interval = (5.0,infinity) 
other
 the other bin to compare the receiver against.resamples
 the number of times resampling shall be done.randomGenerator
 a random number generator. Set this parameter to null to use a default random number
generator seeded with the current time.function
 a difference function comparing two samples; takes as first argument a sample of this and as second argument
a sample of other.
GenericPermuting.permutation(long,int)
public void setFixedOrder(boolean fixedOrder)
Naturally, if fixedOrder is set to true you should not already have added elements to the receiver; it should be empty.
public int size()
size
in class StaticBin1D
protected void sort()
public DoubleArrayList sortedElements()
setFixedOrder(boolean)
.
protected DoubleArrayList sortedElements_unsafe()
... synchronized (dynamicBin) { // lock out anybody else DoubleArrayList elements = dynamicBin.sortedElements_unsafe(); // read each element and do something with it, e.g. double[] values = elements.elements(); // zerocopy for (int i=dynamicBin.size(); i >=0; ) { foo(values[i]); } } ...Concerning the memory required for operations involving sorting, see
setFixedOrder(boolean)
.
public void standardize(double mean, double standardDeviation)
public double sum()
sum
in class StaticBin1D
public double sumOfInversions()
sumOfInversions
in class MightyStaticBin1D
MightyStaticBin1D.hasSumOfInversions()
public double sumOfLogarithms()
sumOfLogarithms
in class MightyStaticBin1D
MightyStaticBin1D.hasSumOfLogarithms()
public double sumOfPowers(int k)
sumOfPowers
in class MightyStaticBin1D
k
 the order of the powers.
MightyStaticBin1D.hasSumOfPowers(int)
public double sumOfSquares()
sumOfSquares
in class StaticBin1D
public java.lang.String toString()
toString
in class QuantileBin1D
public void trim(int s, int l)
s
 the number of smallest elements to trim away (s >= 0).l
 the number of largest elements to trim away (l >= 0).public double trimmedMean(int s, int l)
s
 the number of smallest elements to trim away (s >= 0).l
 the number of largest elements to trim away (l >= 0).
public void trimToSize()
Releases any superfluos internal memory. An application can use this operation to minimize the storage of the receiver. Does not affect functionality.
trimToSize
in class AbstractBin1D
protected void updateIncrementalStats()
protected void updateSumOfInversions()
protected void updateSumOfLogarithms()
protected void validateAll()
element
 element to be appended.


PREV CLASS NEXT CLASS  FRAMES NO FRAMES  
SUMMARY: NESTED  FIELD  CONSTR  METHOD  DETAIL: FIELD  CONSTR  METHOD 