Interface BulkExpressionDataMatrix<T>

All Superinterfaces:
ExpressionDataMatrix<T>
All Known Subinterfaces:
BulkExpressionDataPrimitiveDoubleMatrix, BulkExpressionDataPrimitiveIntMatrix, MultiAssayBulkExpressionDataMatrix<T>
All Known Implementing Classes:
AbstractBulkExpressionDataMatrix, AbstractMultiAssayExpressionDataMatrix, BulkExpressionDataDoubleMatrix, BulkExpressionDataIntMatrix, EmptyBulkExpressionDataMatrix, EmptyExpressionMatrix, ExpressionDataBooleanMatrix, ExpressionDataDoubleMatrix, ExpressionDataIntegerMatrix, ExpressionDataStringMatrix

public interface BulkExpressionDataMatrix<T> extends ExpressionDataMatrix<T>
Interface for bulk expression data matrices.

In a bulk expression data matrix, each column represents a sample.

Expression data is rather complex, so we have to handle some messy cases.

The key problem is how to unambiguously identify rows and columns in the matrix. This is greatly complicated by the fact that experiments can combine data from multiple array designs in various ways.

Put it together, and the result is that there can be more than one BioAssay per column; the same BioMaterial can be used in multiple columns (supported implicitly). There can also be more than on BioMaterial in one column (we don't support this yet either). The same BioSequence can be found in multiple rows. A row can contain data from more than one CompositeSequence. These cases are handled by the MultiAssayBulkExpressionDataMatrix interface and their corresponding implementations. This interface assumes the simplest case where each column is represented by a BioAssay and each row is represented by a CompositeSequence.

There are a few constraints: a particular CompositeSequence can only be used once, in a single row. At the moment we do not directly support technical replicates, though this should be possible. A BioAssay can only appear in a single column.

For some operations a ExpressionDataMatrixRowElement object is offered, which encapsulates a combination of CompositeSequence, a BioSequence, and an index. The list of these can be useful for iterating over the rows of the matrix.

Author:
pavlidis, keshav
See Also:
  • Method Details

    • getMatrix

      static BulkExpressionDataMatrix<?> getMatrix(Collection<? extends BulkExpressionDataVector> vectors)
      Create a bulk expression data matrix from a collection of vectors. All vectors must share the same QuantitationType.
    • getBioAssayDimension

      BioAssayDimension getBioAssayDimension()
      Obtain the dimension for the columns of this matrix.
    • hasMissingValues

      boolean hasMissingValues()
      Returns:
      true if any values are null or NaN (for doubles and floats); any other value that is considered missing.
    • get

      @Nullable T get(CompositeSequence designElement, BioAssay bioAssay)
      Access a single value of the matrix. Note that because there can be multiple bioassays per column and multiple design elements per row, it is possible for this method to retrieve a data that does not come from the bioassay and/or designelement arguments.
      Parameters:
      designElement - de
      bioAssay - ba
      Returns:
      the value at the given design element and bioassay, or null if the value is missing
    • getRawMatrix

      T[][] getRawMatrix()
      Access the entire matrix.
      Returns:
      T[][]
    • getColumn

      @Nullable T[] getColumn(BioAssay bioAssay)
      Access a single column of the matrix.
      Returns:
      a vector for the given column, or null if the column is not present
    • getColumnIndex

      int getColumnIndex(BioAssay bioAssay)
      Returns:
      the index of the column for the data for the bioAssay, or -1 if missing
    • getColumnIndex

      int getColumnIndex(BioMaterial bioMaterial)
    • getBioAssayForColumn

      BioAssay getBioAssayForColumn(int index)
      Obtain an assay corresponding to a given column.
    • getBioMaterialForColumn

      BioMaterial getBioMaterialForColumn(int index)
      Obtain a biomaterial corresponding to a column.