Class ExpressionDataSVD

java.lang.Object
ubic.gemma.core.analysis.preprocess.svd.ExpressionDataSVD

public class ExpressionDataSVD extends Object
Perform SVD on an expression data matrix, E = U S V'. The rows of the input matrix are probes (genes), following the convention of Alter et al. 2000 (PNAS). Thus the U matrix columns are the eigensamples (eigenarrays) and the V matrix columns are the eigengenes. See also http://genome-www.stanford.edu/SVD/. Because SVD can't be done on a matrix with missing values, values are imputed. Rows with no variance are removed, and rows with too many missing values are also removed (MIN_PRESENT_FRACTION_FOR_ROW)
Author:
paul
  • Constructor Details

    • ExpressionDataSVD

      public ExpressionDataSVD(ExpressionDataDoubleMatrix expressionData) throws SVDException
      Does normalization.
      Parameters:
      expressionData - expression data
      Throws:
      SVDException
    • ExpressionDataSVD

      public ExpressionDataSVD(ExpressionDataDoubleMatrix expressionData, boolean normalizeMatrix) throws SVDException
      Parameters:
      expressionData - Note that this may be modified!
      normalizeMatrix - If true, the data matrix will be rescaled and centred to mean zero, variance one, for both rows and columns ("double-standardized")
      Throws:
      SVDException
  • Method Details

    • equalize

      public ExpressionDataDoubleMatrix equalize()
      Implements the method described in the SPELL paper, alternative interpretation as related by Q. Morris. Set all components to have equal weight (set all singular values to 1)
      Returns:
      the reconstructed matrix; values that were missing before are re-masked.
    • getEigenGene

      public Double[] getEigenGene(int i)
      Parameters:
      i - which eigengene
      Returns:
      the ith eigengene (column of V)
    • getEigenSample

      public Double[] getEigenSample(int i)
      Parameters:
      i - which eigensample
      Returns:
      the ith eigensample (column of U)
    • getEigenvalues

      public double[] getEigenvalues()
      Returns:
      the square roots of the singular values.
    • getNumVariables

      public int getNumVariables()
      Returns:
      how many rows the U matrix has.
    • getS

      public DoubleMatrix<Integer,Integer> getS()
      Returns:
      the matrix of singular values, indexed by the eigenarray (row) and eigengene (column) numbers (starting from 0).
    • getSingularValues

      public double[] getSingularValues()
    • getU

      Returns:
      the left singular vectors. The column indices are of the eigenarrays (starting from 0).
    • getV

      Returns:
      the right singular vectors. The column indices are of the eigengenes (starting from 0). The row indices are of the original samples in the given ExpressionDataDoubleMatrix.
    • getVarianceFractions

      public Double[] getVarianceFractions()
      Returns:
      fractions of the variance for each singular vector.
    • removeHighestComponents

      public ExpressionDataDoubleMatrix removeHighestComponents(int numComponentsToRemove)
      Provide a reconstructed matrix removing the first N components (the most significant ones). If the matrix was normalized first, removing the first component replicates the normalization approach taken by Nielsen et al. (Lancet 359, 2002) and Alter et al. (PNAS 2000). Correction by ANOVA would yield similar results if the nuisance variable is known.
      Parameters:
      numComponentsToRemove - The number of components to remove, starting from the largest eigenvalue.
      Returns:
      the reconstructed matrix; values that were missing before are re-masked.
    • uMatrixAsExpressionData

      public ExpressionDataDoubleMatrix uMatrixAsExpressionData()
      Returns:
      Implements the method described in the SPELL paper. Note that this alters the U matrix of this.

      We make two assumptions about the method that are not described in the paper: 1) The data are rescaled and centered; 2) the absolute value of the U matrix is used. Note that unlike the original data, the transformed data will have no missing values.

    • winnow

      public ExpressionDataDoubleMatrix winnow(double thresholdQuantile)
      Implements method described in Skillicorn et al., "Strategies for winnowing microarray data" (also section 3.5.5 of his book)
      Parameters:
      thresholdQuantile - Enter 0.5 for median. Value must be > 0 and < 1.
      Returns:
      a filtered matrix