Interface BulkExpressionDataMatrix<T>
- All Superinterfaces:
ExpressionDataMatrix<T>
- All Known Subinterfaces:
BulkExpressionDataPrimitiveDoubleMatrix,BulkExpressionDataPrimitiveIntMatrix,MultiAssayBulkExpressionDataMatrix<T>,SingleCellDerivedBulkExpressionDataMatrix<T>
- All Known Implementing Classes:
AbstractBulkExpressionDataMatrix,AbstractMultiAssayExpressionDataMatrix,BulkExpressionDataDoubleMatrix,BulkExpressionDataIntMatrix,EmptyBulkExpressionDataMatrix,EmptyExpressionMatrix,ExpressionDataBooleanMatrix,ExpressionDataDoubleMatrix,ExpressionDataIntegerMatrix,ExpressionDataStringMatrix
In a bulk expression data matrix, each column represents a sample.
Expression data is rather complex, so we have to handle some messy cases.
The key problem is how to unambiguously identify rows and columns in the matrix. This is greatly complicated by the fact that experiments can combine data from multiple array designs in various ways.
Put it together, and the result is that there can be more than one BioAssay per column; the same BioMaterial
can be used in multiple columns (supported implicitly). There can also be more than on BioMaterial in one column
(we don't support this yet either). The same BioSequence can be found in multiple rows. A row can contain
data from more than one CompositeSequence. These cases are handled by the MultiAssayBulkExpressionDataMatrix
interface and their corresponding implementations. This interface assumes the simplest case where each column is
represented by a BioAssay and each row is represented by a CompositeSequence.
There are a few constraints: a particular CompositeSequence can only be used once, in a single row. At the
moment we do not directly support technical replicates, though this should be possible. A BioAssay can only
appear in a single column.
For some operations a ExpressionDataMatrixRowElement object is offered, which encapsulates a combination of
CompositeSequence, a BioSequence, and an index. The list of these can be useful for iterating over
the rows of the matrix.
- Author:
- pavlidis, keshav
- See Also:
-
Method Summary
Modifier and TypeMethodDescriptionget(CompositeSequence designElement, BioAssay bioAssay) Access a single value of the matrix.Obtain the dimension for the columns of this matrix.getBioAssayForColumn(int index) Obtain an assay corresponding to a given column.getBioMaterialForColumn(int index) Obtain a biomaterial corresponding to a column.T[]Access a single column of the matrix.intgetColumnIndex(BioAssay bioAssay) intgetColumnIndex(BioMaterial bioMaterial) static BulkExpressionDataMatrix<?> getMatrix(Collection<? extends BulkExpressionDataVector> vectors) Create a bulk expression data matrix from a collection of vectors.T[][]Access the entire matrix.booleansliceColumns(List<BioMaterial> bioMaterials) Slice the requested samples (columns) from this matrix.sliceColumns(List<BioMaterial> bioMaterials, BioAssayDimension dimension) Slice the requested samples (columns) from this matrix.Methods inherited from interface ubic.gemma.core.datastructure.matrix.ExpressionDataMatrix
columns, get, getColumn, getDesignElementForRow, getDesignElements, getExpressionExperiment, getQuantitationType, getRow, getRow, getRowElement, getRowElements, getRowIndex, getRowIndices, rows, sliceRows
-
Method Details
-
getMatrix
static BulkExpressionDataMatrix<?> getMatrix(Collection<? extends BulkExpressionDataVector> vectors) Create a bulk expression data matrix from a collection of vectors. All vectors must share the sameQuantitationType. -
getBioAssayDimension
BioAssayDimension getBioAssayDimension()Obtain the dimension for the columns of this matrix. -
hasMissingValues
boolean hasMissingValues()- Returns:
- true if any values are null or NaN (for doubles and floats); any other value that is considered missing.
-
get
Access a single value of the matrix. Note that because there can be multiple bioassays per column and multiple design elements per row, it is possible for this method to retrieve a data that does not come from the bioassay and/or designelement arguments.- Parameters:
designElement- debioAssay- ba- Returns:
- the value at the given design element and bioassay, or
nullif the value is missing
-
getRawMatrix
T[][] getRawMatrix()Access the entire matrix.- Returns:
- T[][]
-
getBioMaterials
List<BioMaterial> getBioMaterials() -
getColumn
Access a single column of the matrix.- Returns:
- a vector for the given column, or null if the column is not present
-
getColumnIndex
- Returns:
- the index of the column for the data for the bioAssay, or -1 if missing
-
getColumnIndex
-
sliceColumns
Slice the requested samples (columns) from this matrix.Dimensions will be altered to reflect only the selected samples.
- Parameters:
bioMaterials- samples to select from the matrix- Throws:
IllegalArgumentException- if any of the requested biomaterial are not found in the matrix
-
sliceColumns
BulkExpressionDataMatrix<T> sliceColumns(List<BioMaterial> bioMaterials, BioAssayDimension dimension) Slice the requested samples (columns) from this matrix.This also allows specifying a new dimension for the columns that will be used for every design element (rows).
- Parameters:
bioMaterials- samples to select from the matrixdimension- the dimension to use- Throws:
IllegalArgumentException- if any of the requested biomaterial are not found in the matrix
-
getBioAssayForColumn
Obtain an assay corresponding to a given column. -
getBioMaterialForColumn
Obtain a biomaterial corresponding to a column.
-