Class ExpressionDataSVD


  • public class ExpressionDataSVD
    extends Object
    Perform SVD on an expression data matrix, E = U S V'. The rows of the input matrix are probes (genes), following the convention of Alter et al. 2000 (PNAS). Thus the U matrix columns are the eigensamples (eigenarrays) and the V matrix columns are the eigengenes. See also http://genome-www.stanford.edu/SVD/. Because SVD can't be done on a matrix with missing values, values are imputed. Rows with no variance are removed, and rows with too many missing values are also removed (MIN_PRESENT_FRACTION_FOR_ROW)
    Author:
    paul
    • Constructor Detail

      • ExpressionDataSVD

        public ExpressionDataSVD​(ExpressionDataDoubleMatrix expressionData,
                                 boolean normalizeMatrix)
                          throws SVDException
        Parameters:
        expressionData - Note that this may be modified!
        normalizeMatrix - If true, the data matrix will be rescaled and centred to mean zero, variance one, for both rows and columns ("double-standardized")
        Throws:
        SVDException
    • Method Detail

      • equalize

        public ExpressionDataDoubleMatrix equalize()
        Implements the method described in the SPELL paper, alternative interpretation as related by Q. Morris. Set all components to have equal weight (set all singular values to 1)
        Returns:
        the reconstructed matrix; values that were missing before are re-masked.
      • getEigenGene

        public Double[] getEigenGene​(int i)
        Parameters:
        i - which eigengene
        Returns:
        the ith eigengene (column of V)
      • getEigenSample

        public Double[] getEigenSample​(int i)
        Parameters:
        i - which eigensample
        Returns:
        the ith eigensample (column of U)
      • getEigenvalues

        public double[] getEigenvalues()
        Returns:
        the square roots of the singular values.
      • getNumVariables

        public int getNumVariables()
        Returns:
        how many rows the U matrix has.
      • getS

        public DoubleMatrix<Integer,​Integer> getS()
        Returns:
        the matrix of singular values, indexed by the eigenarray (row) and eigengene (column) numbers (starting from 0).
      • getSingularValues

        public double[] getSingularValues()
      • getV

        public DoubleMatrix<Integer,​BioMaterial> getV()
        Returns:
        the right singular vectors. The column indices are of the eigengenes (starting from 0). The row indices are of the original samples in the given ExpressionDataDoubleMatrix.
      • getVarianceFractions

        public Double[] getVarianceFractions()
        Returns:
        fractions of the variance for each singular vector.
      • removeHighestComponents

        public ExpressionDataDoubleMatrix removeHighestComponents​(int numComponentsToRemove)
        Provide a reconstructed matrix removing the first N components (the most significant ones). If the matrix was normalized first, removing the first component replicates the normalization approach taken by Nielsen et al. (Lancet 359, 2002) and Alter et al. (PNAS 2000). Correction by ANOVA would yield similar results if the nuisance variable is known.
        Parameters:
        numComponentsToRemove - The number of components to remove, starting from the largest eigenvalue.
        Returns:
        the reconstructed matrix; values that were missing before are re-masked.
      • uMatrixAsExpressionData

        public ExpressionDataDoubleMatrix uMatrixAsExpressionData()
        Returns:
        Implements the method described in the SPELL paper. Note that this alters the U matrix of this.

        We make two assumptions about the method that are not described in the paper: 1) The data are rescaled and centered; 2) the absolute value of the U matrix is used. Note that unlike the original data, the transformed data will have no missing values.

      • winnow

        public ExpressionDataDoubleMatrix winnow​(double thresholdQuantile)
        Implements method described in Skillicorn et al., "Strategies for winnowing microarray data" (also section 3.5.5 of his book)
        Parameters:
        thresholdQuantile - Enter 0.5 for median. Value must be > 0 and < 1.
        Returns:
        a filtered matrix