Class GeoValues
- java.lang.Object
-
- ubic.gemma.core.loader.expression.geo.model.GeoValues
-
- All Implemented Interfaces:
Serializable
public class GeoValues extends Object implements Serializable
Class to store the expression data prior to conversion. The data are read from series files sample by sample, and within each sample designElement by designElement, and within each designElement, quantitationType by quantitationType. Values are stored in vectors, roughly equivalent to DesignElementDataVectors. This is an important class as it encompasses how we convert GEO sample data into vectors. There are a couple of assumptions that this is predicated on. First, we assume that all samples are presented with their quantitation types in the same order. Second, we assume that all samples have the same quantitation type, OR at worst, some are missing off the 'end' for some samples (in which case the vectors are padded). We do not assume that all samples have quantitation types with the same names (quantitation types correspond to column names in the GEO files). There are two counterexamples we have found (so far) that push or violate these assumptions: GSE360 and GSE4345 (which is really broken). Loading GSE4345 results in a cast exception because the quantitation types are 'mixed up' across the samples.- Author:
- pavlidis
- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description GeoValues()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
addQuantitationType(GeoPlatform platform, String columnName, Integer index)
void
addSample(GeoSample sample)
Only call this to add a sample for which there are no data.void
addValue(GeoSample sample, Integer quantitationTypeIndex, String designElement, Object value)
Store a value.void
clear(GeoPlatform geoPlatform)
Remove the data for a given platform (use to save memory)void
clear(GeoPlatform platform, List<GeoSample> datasetSamples, Integer quantitationTypeIndex)
If possible, null out the data for a quantitation type on a given platform.Integer[]
getIndices(GeoPlatform platform, List<GeoSample> neededSamples, Integer quantitationType)
Get the indices of the data for a set of samples - this can be used to get a slice of the data.Integer
getQuantitationTypeIndex(GeoPlatform platform, String columnName)
Collection<Integer>
getQuantitationTypes(GeoPlatform samplePlatform)
List<Object>
getValues(GeoPlatform platform, Integer quantitationType, String designElement)
List<Object>
getValues(GeoPlatform platform, Integer quantitationType, String designElement, Integer[] indices)
boolean
hasData()
boolean
isWantedQuantitationType(String quantitationTypeName)
Some quantitation types are 'skippable' - they are easily recomputed from other values, or are not necessary in the system.GeoValues
subset(Collection<GeoSample> samples)
This creates a new GeoValues that has data only for the selected samples.String
toString()
void
validate()
This method can only be called once a sample has been completely processed, and before a new sample has been started.
-
-
-
Method Detail
-
addQuantitationType
public void addQuantitationType(GeoPlatform platform, String columnName, Integer index)
- Parameters:
columnName
- column nameindex
- - the actual index of the data in the final data structure, not necessarily the column where the data are found in the data file (as that can vary from sample to sample).platform
- platform
-
addSample
public void addSample(GeoSample sample)
Only call this to add a sample for which there are no data.- Parameters:
sample
- geo sample
-
addValue
public void addValue(GeoSample sample, Integer quantitationTypeIndex, String designElement, Object value)
Store a value. It is assumed that designElements have unique names. Implementation note: The first time we see a sample, we associate it with a 'dimension' that is connected to the platform and quantitation type. In parallel, we add the data to a 'vector' for the designElement that is likewise connected to the platform the sample uses, the quantitation type. Because in GEO files samples are seen one at a time, the vectors for each designElement are built up. Thus it is important that we add a value for each sample for each design element. Note what happens if data is MISSING for a given designElement/quantitationType/sample combination. This can happen (typically all the quantitation types for a designElement in a given sample). This method will NOT be called. When the next sample is processed, the new data will be added onto the end in the wrong place. Then the data in the vectors stored here will be incorrect. Thus the GEO parser has to ensure that each vector is 'completed' before moving to the next sample.- Parameters:
sample
- samplequantitationTypeIndex
- The column number for the quantitation type, needed because the names of the quantitation types don't always match across samples (but hopefully the columns do). Even though the first column contains the design element name (ID_REF), the first quantitation type should be numbered 0. This is almost always a good way to match values across samples, there ARE cases where the order isn't the same for two samples in the same series.designElement
- design elementvalue
- The data point to be stored.
-
clear
public void clear(GeoPlatform geoPlatform)
Remove the data for a given platform (use to save memory)- Parameters:
geoPlatform
- geo platform
-
clear
public void clear(GeoPlatform platform, List<GeoSample> datasetSamples, Integer quantitationTypeIndex)
If possible, null out the data for a quantitation type on a given platform.- Parameters:
platform
- platformdatasetSamples
- dataset samplesquantitationTypeIndex
- QT index
-
getIndices
public Integer[] getIndices(GeoPlatform platform, List<GeoSample> neededSamples, Integer quantitationType)
Get the indices of the data for a set of samples - this can be used to get a slice of the data. This is inefficient but shouldn't need to be called all that frequently.- Parameters:
platform
- platformneededSamples
- , must be from the same platform. If we don't have data for a given sample, the index returned will be null. This can happen when some samples don't have all the quantitation types (GSE360 for example).quantitationType
- quantitation type- Returns:
- integer array
-
getQuantitationTypeIndex
public Integer getQuantitationTypeIndex(GeoPlatform platform, String columnName)
-
getQuantitationTypes
public Collection<Integer> getQuantitationTypes(GeoPlatform samplePlatform)
- Parameters:
samplePlatform
- sample platform- Returns:
- Collection of Objects representing the quantitation types for the given platform.
-
getValues
public List<Object> getValues(GeoPlatform platform, Integer quantitationType, String designElement)
-
getValues
public List<Object> getValues(GeoPlatform platform, Integer quantitationType, String designElement, Integer[] indices)
- Parameters:
quantitationType
- QTdesignElement
- design elementindices
- indicesplatform
- platforms- Returns:
- a 'slice' of the data corresponding to the indices provided.
-
hasData
public boolean hasData()
-
isWantedQuantitationType
public boolean isWantedQuantitationType(String quantitationTypeName)
Some quantitation types are 'skippable' - they are easily recomputed from other values, or are not necessary in the system. Skipping these makes loading the data more manageable for some data sets that are very large.- Parameters:
quantitationTypeName
- QT name- Returns:
- true if the name is NOT on the 'skippable' list.
-
subset
public GeoValues subset(Collection<GeoSample> samples)
This creates a new GeoValues that has data only for the selected samples. The quantiatation type information will be semi-deep copies. This is only needed for when we are splitting a series apart, especially when it is not along Platform lines.- Parameters:
samples
- samples- Returns:
- geo values
-
validate
public void validate()
This method can only be called once a sample has been completely processed, and before a new sample has been started.
-
-