Class GeoValues

  • All Implemented Interfaces:
    Serializable

    public class GeoValues
    extends Object
    implements Serializable
    Class to store the expression data prior to conversion. The data are read from series files sample by sample, and within each sample designElement by designElement, and within each designElement, quantitationType by quantitationType. Values are stored in vectors, roughly equivalent to DesignElementDataVectors. This is an important class as it encompasses how we convert GEO sample data into vectors. There are a couple of assumptions that this is predicated on. First, we assume that all samples are presented with their quantitation types in the same order. Second, we assume that all samples have the same quantitation type, OR at worst, some are missing off the 'end' for some samples (in which case the vectors are padded). We do not assume that all samples have quantitation types with the same names (quantitation types correspond to column names in the GEO files). There are two counterexamples we have found (so far) that push or violate these assumptions: GSE360 and GSE4345 (which is really broken). Loading GSE4345 results in a cast exception because the quantitation types are 'mixed up' across the samples.
    Author:
    pavlidis
    See Also:
    Serialized Form
    • Constructor Detail

      • GeoValues

        public GeoValues()
    • Method Detail

      • addQuantitationType

        public void addQuantitationType​(GeoPlatform platform,
                                        String columnName,
                                        Integer index)
        Parameters:
        columnName - column name
        index - - the actual index of the data in the final data structure, not necessarily the column where the data are found in the data file (as that can vary from sample to sample).
        platform - platform
      • addSample

        public void addSample​(GeoSample sample)
        Only call this to add a sample for which there are no data.
        Parameters:
        sample - geo sample
      • addValue

        public void addValue​(GeoSample sample,
                             Integer quantitationTypeIndex,
                             String designElement,
                             Object value)
        Store a value. It is assumed that designElements have unique names. Implementation note: The first time we see a sample, we associate it with a 'dimension' that is connected to the platform and quantitation type. In parallel, we add the data to a 'vector' for the designElement that is likewise connected to the platform the sample uses, the quantitation type. Because in GEO files samples are seen one at a time, the vectors for each designElement are built up. Thus it is important that we add a value for each sample for each design element. Note what happens if data is MISSING for a given designElement/quantitationType/sample combination. This can happen (typically all the quantitation types for a designElement in a given sample). This method will NOT be called. When the next sample is processed, the new data will be added onto the end in the wrong place. Then the data in the vectors stored here will be incorrect. Thus the GEO parser has to ensure that each vector is 'completed' before moving to the next sample.
        Parameters:
        sample - sample
        quantitationTypeIndex - The column number for the quantitation type, needed because the names of the quantitation types don't always match across samples (but hopefully the columns do). Even though the first column contains the design element name (ID_REF), the first quantitation type should be numbered 0. This is almost always a good way to match values across samples, there ARE cases where the order isn't the same for two samples in the same series.
        designElement - design element
        value - The data point to be stored.
      • clear

        public void clear​(GeoPlatform geoPlatform)
        Remove the data for a given platform (use to save memory)
        Parameters:
        geoPlatform - geo platform
      • clear

        public void clear​(GeoPlatform platform,
                          List<GeoSample> datasetSamples,
                          Integer quantitationTypeIndex)
        If possible, null out the data for a quantitation type on a given platform.
        Parameters:
        platform - platform
        datasetSamples - dataset samples
        quantitationTypeIndex - QT index
      • getIndices

        public Integer[] getIndices​(GeoPlatform platform,
                                    List<GeoSample> neededSamples,
                                    Integer quantitationType)
        Get the indices of the data for a set of samples - this can be used to get a slice of the data. This is inefficient but shouldn't need to be called all that frequently.
        Parameters:
        platform - platform
        neededSamples - , must be from the same platform. If we don't have data for a given sample, the index returned will be null. This can happen when some samples don't have all the quantitation types (GSE360 for example).
        quantitationType - quantitation type
        Returns:
        integer array
      • getQuantitationTypes

        public Collection<Integer> getQuantitationTypes​(GeoPlatform samplePlatform)
        Parameters:
        samplePlatform - sample platform
        Returns:
        Collection of Objects representing the quantitation types for the given platform.
      • getValues

        public List<Object> getValues​(GeoPlatform platform,
                                      Integer quantitationType,
                                      String designElement,
                                      Integer[] indices)
        Parameters:
        quantitationType - QT
        designElement - design element
        indices - indices
        platform - platforms
        Returns:
        a 'slice' of the data corresponding to the indices provided.
      • hasData

        public boolean hasData()
      • isWantedQuantitationType

        public boolean isWantedQuantitationType​(String quantitationTypeName)
        Some quantitation types are 'skippable' - they are easily recomputed from other values, or are not necessary in the system. Skipping these makes loading the data more manageable for some data sets that are very large.
        Parameters:
        quantitationTypeName - QT name
        Returns:
        true if the name is NOT on the 'skippable' list.
      • subset

        public GeoValues subset​(Collection<GeoSample> samples)
        This creates a new GeoValues that has data only for the selected samples. The quantiatation type information will be semi-deep copies. This is only needed for when we are splitting a series apart, especially when it is not along Platform lines.
        Parameters:
        samples - samples
        Returns:
        geo values
      • validate

        public void validate()
        This method can only be called once a sample has been completely processed, and before a new sample has been started.