Class ExpressionExperimentFilter


  • public class ExpressionExperimentFilter
    extends Object
    Methods to handle filtering expression experiments for analysis.
    Author:
    Paul
    • Constructor Detail

      • ExpressionExperimentFilter

        public ExpressionExperimentFilter​(Collection<ArrayDesign> arrayDesignsUsed,
                                          FilterConfig config)
        Parameters:
        config - configuration used for all filtering. This must be defined at construction and cannot be changed afterwards.
        arrayDesignsUsed - collection of ADs used
    • Method Detail

      • lowVarianceFilter

        public static ExpressionDataDoubleMatrix lowVarianceFilter​(ExpressionDataDoubleMatrix matrix,
                                                                   int quantile)
        Remove rows that have a low variance, below the stated quantile
        Parameters:
        quantile - e.g. 10 to remove 10% lowest variance rows.
        matrix - the data matrix
        Returns:
        filtered matrix
      • tooFewDistinctValues

        public static ExpressionDataDoubleMatrix tooFewDistinctValues​(ExpressionDataDoubleMatrix matrix,
                                                                      double threshold)
        Remove rows that have a low diversity of values (equality judged based on tolerancee set in RowLevelFilter). This happens when people "set values less than 10 equal to 10" for example. This effectively filters rows that have too many missing values, because missing values are counted as a single value. The tolerance is set to a default value of Constants.SMALLISH.
        Parameters:
        threshold - fraction of values that must be distinct. Thus if set to 0.5, a vector of 10 values must have at least 5 distinct values.
        matrix - the data matrix
        Returns:
        updated matrix
      • tooFewDistinctValues

        public static ExpressionDataDoubleMatrix tooFewDistinctValues​(ExpressionDataDoubleMatrix matrix,
                                                                      double threshold,
                                                                      double tolerance)
        Parameters:
        matrix - the data matrix
        threshold - fraction of values that must be distinct. Thus if set to 0.5, a vector of 10 values must have at 5 distinct values.
        tolerance - differences smaller than this are counted as "the same value".
      • getFilteredMatrix

        public ExpressionDataDoubleMatrix getFilteredMatrix​(Collection<ProcessedExpressionDataVector> dataVectors)
                                                     throws FilteringException
        Provides a ready-to-use expression data matrix that is transformed and filtered. The processes that are applied, in this order:
        1. Log transform, if requested and not already done
        2. Use the missing value data to mask the preferred data (ratiometric data only)
        3. Remove rows that don't have biosequences (always applied)
        4. Remove Affymetrix control probes (Affymetrix only)
        5. Remove rows that have too many missing values (as configured)
        6. Remove rows with low variance (ratiometric) or CV (one-color) (as configured)
        7. Remove rows with very high or low expression (as configured)
        Parameters:
        dataVectors - data vectors
        Returns:
        filtered matrix
        Throws:
        NoRowsLeftAfterFilteringException - if filtering results in no row left in the expression matrix
        FilteringException