Class ExpressionDataSVD
java.lang.Object
ubic.gemma.core.analysis.preprocess.svd.ExpressionDataSVD
Perform SVD on an expression data matrix, E = U S V'. The rows of the input matrix are probes (genes), following the
convention of Alter et al. 2000 (PNAS). Thus the U matrix columns are the eigensamples (eigenarrays) and the
V matrix columns are the eigengenes. See also http://genome-www.stanford.edu/SVD/.
Because SVD can't be done on a matrix with missing values, values are imputed. Rows with no variance are removed, and
rows with too many missing values are also removed (MIN_PRESENT_FRACTION_FOR_ROW)
- Author:
- paul
-
Constructor Summary
ConstructorsConstructorDescriptionExpressionDataSVD
(ExpressionDataDoubleMatrix expressionData) Does normalization.ExpressionDataSVD
(ExpressionDataDoubleMatrix expressionData, boolean normalizeMatrix) -
Method Summary
Modifier and TypeMethodDescriptionequalize()
Implements the method described in the SPELL paper, alternative interpretation as related by Q.Double[]
getEigenGene
(int i) Double[]
getEigenSample
(int i) double[]
int
getS()
double[]
getU()
getV()
Double[]
removeHighestComponents
(int numComponentsToRemove) Provide a reconstructed matrix removing the first N components (the most significant ones).winnow
(double thresholdQuantile) Implements method described in Skillicorn et al., "Strategies for winnowing microarray data" (also section 3.5.5 of his book)
-
Constructor Details
-
ExpressionDataSVD
Does normalization.- Parameters:
expressionData
- expression data- Throws:
SVDException
-
ExpressionDataSVD
public ExpressionDataSVD(ExpressionDataDoubleMatrix expressionData, boolean normalizeMatrix) throws SVDException - Parameters:
expressionData
- Note that this may be modified!normalizeMatrix
- If true, the data matrix will be rescaled and centred to mean zero, variance one, for both rows and columns ("double-standardized")- Throws:
SVDException
-
-
Method Details
-
equalize
Implements the method described in the SPELL paper, alternative interpretation as related by Q. Morris. Set all components to have equal weight (set all singular values to 1)- Returns:
- the reconstructed matrix; values that were missing before are re-masked.
-
getEigenGene
- Parameters:
i
- which eigengene- Returns:
- the ith eigengene (column of V)
-
getEigenSample
- Parameters:
i
- which eigensample- Returns:
- the ith eigensample (column of U)
-
getEigenvalues
public double[] getEigenvalues()- Returns:
- the square roots of the singular values.
-
getNumVariables
public int getNumVariables()- Returns:
- how many rows the U matrix has.
-
getS
- Returns:
- the matrix of singular values, indexed by the eigenarray (row) and eigengene (column) numbers (starting from 0).
-
getSingularValues
public double[] getSingularValues() -
getU
- Returns:
- the left singular vectors. The column indices are of the eigenarrays (starting from 0).
-
getV
- Returns:
- the right singular vectors. The column indices are of the eigengenes (starting from 0). The row indices are of the original samples in the given ExpressionDataDoubleMatrix.
-
getVarianceFractions
- Returns:
- fractions of the variance for each singular vector.
-
removeHighestComponents
Provide a reconstructed matrix removing the first N components (the most significant ones). If the matrix was normalized first, removing the first component replicates the normalization approach taken by Nielsen et al. (Lancet 359, 2002) and Alter et al. (PNAS 2000). Correction by ANOVA would yield similar results if the nuisance variable is known.- Parameters:
numComponentsToRemove
- The number of components to remove, starting from the largest eigenvalue.- Returns:
- the reconstructed matrix; values that were missing before are re-masked.
-
uMatrixAsExpressionData
- Returns:
- Implements the method described in the SPELL paper. Note that this alters the U matrix of this.
We make two assumptions about the method that are not described in the paper: 1) The data are rescaled and centered; 2) the absolute value of the U matrix is used. Note that unlike the original data, the transformed data will have no missing values.
-
winnow
Implements method described in Skillicorn et al., "Strategies for winnowing microarray data" (also section 3.5.5 of his book)- Parameters:
thresholdQuantile
- Enter 0.5 for median. Value must be > 0 and < 1.- Returns:
- a filtered matrix
-