Class ExpressionExperimentFilter
- java.lang.Object
-
- ubic.gemma.core.analysis.preprocess.filter.ExpressionExperimentFilter
-
public class ExpressionExperimentFilter extends Object
Methods to handle filtering expression experiments for analysis.- Author:
- Paul
-
-
Constructor Summary
Constructors Constructor Description ExpressionExperimentFilter(Collection<ArrayDesign> arrayDesignsUsed, FilterConfig config)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static ExpressionDataDoubleMatrix
doNothingFilter(ExpressionDataDoubleMatrix matrix)
ExpressionDataDoubleMatrix
getFilteredMatrix(Collection<ProcessedExpressionDataVector> dataVectors)
Provides a ready-to-use expression data matrix that is transformed and filtered.static ExpressionDataDoubleMatrix
lowVarianceFilter(ExpressionDataDoubleMatrix matrix, int quantile)
Remove rows that have a low variance, below the stated quantilestatic ExpressionDataDoubleMatrix
tooFewDistinctValues(ExpressionDataDoubleMatrix matrix, double threshold)
Remove rows that have a low diversity of values (equality judged based on tolerancee set in RowLevelFilter).static ExpressionDataDoubleMatrix
tooFewDistinctValues(ExpressionDataDoubleMatrix matrix, double threshold, double tolerance)
static ExpressionDataDoubleMatrix
zeroVarianceFilter(ExpressionDataDoubleMatrix matrix)
Remove rows that have a variance of zero (within a small constant)
-
-
-
Constructor Detail
-
ExpressionExperimentFilter
public ExpressionExperimentFilter(Collection<ArrayDesign> arrayDesignsUsed, FilterConfig config)
- Parameters:
config
- configuration used for all filtering. This must be defined at construction and cannot be changed afterwards.arrayDesignsUsed
- collection of ADs used
-
-
Method Detail
-
doNothingFilter
public static ExpressionDataDoubleMatrix doNothingFilter(ExpressionDataDoubleMatrix matrix)
-
lowVarianceFilter
public static ExpressionDataDoubleMatrix lowVarianceFilter(ExpressionDataDoubleMatrix matrix, int quantile)
Remove rows that have a low variance, below the stated quantile- Parameters:
quantile
- e.g. 10 to remove 10% lowest variance rows.matrix
- the data matrix- Returns:
- filtered matrix
-
tooFewDistinctValues
public static ExpressionDataDoubleMatrix tooFewDistinctValues(ExpressionDataDoubleMatrix matrix, double threshold)
Remove rows that have a low diversity of values (equality judged based on tolerancee set in RowLevelFilter). This happens when people "set values less than 10 equal to 10" for example. This effectively filters rows that have too many missing values, because missing values are counted as a single value. The tolerance is set to a default value of Constants.SMALLISH.- Parameters:
threshold
- fraction of values that must be distinct. Thus if set to 0.5, a vector of 10 values must have at least 5 distinct values.matrix
- the data matrix- Returns:
- updated matrix
-
tooFewDistinctValues
public static ExpressionDataDoubleMatrix tooFewDistinctValues(ExpressionDataDoubleMatrix matrix, double threshold, double tolerance)
- Parameters:
matrix
- the data matrixthreshold
- fraction of values that must be distinct. Thus if set to 0.5, a vector of 10 values must have at 5 distinct values.tolerance
- differences smaller than this are counted as "the same value".
-
zeroVarianceFilter
public static ExpressionDataDoubleMatrix zeroVarianceFilter(ExpressionDataDoubleMatrix matrix)
Remove rows that have a variance of zero (within a small constant)- Parameters:
matrix
- the data matrix- Returns:
- filtered matrix
-
getFilteredMatrix
public ExpressionDataDoubleMatrix getFilteredMatrix(Collection<ProcessedExpressionDataVector> dataVectors) throws FilteringException
Provides a ready-to-use expression data matrix that is transformed and filtered. The processes that are applied, in this order:- Log transform, if requested and not already done
- Use the missing value data to mask the preferred data (ratiometric data only)
- Remove rows that don't have biosequences (always applied)
- Remove Affymetrix control probes (Affymetrix only)
- Remove rows that have too many missing values (as configured)
- Remove rows with low variance (ratiometric) or CV (one-color) (as configured)
- Remove rows with very high or low expression (as configured)
- Parameters:
dataVectors
- data vectors- Returns:
- filtered matrix
- Throws:
NoRowsLeftAfterFilteringException
- if filtering results in no row left in the expression matrixFilteringException
-
-