Class DataUpdaterImpl

  • All Implemented Interfaces:
    DataUpdater

    @Service
    public class DataUpdaterImpl
    extends Object
    implements DataUpdater
    Update or fill in the data associated with an experiment. Cases include reprocessing data from CEL files (Affymetrix, GEO only), inserting data for RNA-seq data sets but also generic cases where data didn't come from GEO and we need to add or replace data. For loading experiments from flat files, see SimpleExpressionDataLoaderService
    Author:
    paul
    • Constructor Detail

      • DataUpdaterImpl

        public DataUpdaterImpl()
    • Method Detail

      • addAffyDataFromAPTOutput

        @Transactional(propagation=NEVER)
        public void addAffyDataFromAPTOutput​(ExpressionExperiment ee,
                                             String pathToAptOutputFile)
                                      throws IOException
        Affymetrix: Use to bypass the automated running of apt-probeset-summarize. For example if GEO doesn't have them and we ran apt-probeset-summarize ourselves, or if some GEO files were corrupted (in which case the file used here must have blank columns added with headers for the unused samples). Must be single-platform. Will switch the data set to use the "right" platform when the one originally used was an alt CDF or exon-level, so be sure never to use an alt CDF for processing raw data.
        Specified by:
        addAffyDataFromAPTOutput in interface DataUpdater
        Parameters:
        ee - ee
        pathToAptOutputFile - file, presumed to be analyzed using the "right" platform (not an alt CDF or exon-level)
        Throws:
        IOException - when IO problems occur.
      • addCountData

        @Transactional(propagation=NEVER)
        public void addCountData​(ExpressionExperiment ee,
                                 ArrayDesign targetArrayDesign,
                                 DoubleMatrix<String,​String> countMatrix,
                                 DoubleMatrix<String,​String> rpkmMatrix,
                                 @Nullable
                                 Integer readLength,
                                 @Nullable
                                 Boolean isPairedReads,
                                 boolean allowMissingSamples)
        RNA-seq: Replaces data. Starting with the count data, we compute the log2cpm, which is the preferred quantitation type we use internally. Counts and FPKM (if provided) are stored in addition. Rows (genes) that have all zero counts are ignored entirely.
        Specified by:
        addCountData in interface DataUpdater
        Parameters:
        ee - ee
        targetArrayDesign - - this should be one of the "Generic" gene-based platforms. The data set will be switched to use it.
        countMatrix - Representing 'raw' counts (added after rpkm, if provided).
        rpkmMatrix - Representing per-gene normalized data, optional (RPKM or FPKM)
        allowMissingSamples - if true, samples that are missing data will be deleted from the experiment.
        isPairedReads - is paired reads
        readLength - read length
      • replaceData

        @Transactional(propagation=NEVER)
        public void replaceData​(ExpressionExperiment ee,
                                ArrayDesign targetPlatform,
                                QuantitationType qt,
                                DoubleMatrix<String,​String> data)
        Replace the data associated with the experiment (or add it if there is none). These data become the 'preferred' quantitation type. Note that this replaces the "raw" data. Similar to AffyPowerToolsProbesetSummarize.convertDesignElementDataVectors and code in SimpleExpressionDataLoaderService. This method exists in addition to the other replaceData to allow more direct reading of data from files, allowing sample- and element-matching to happen here.
        Specified by:
        replaceData in interface DataUpdater
        Parameters:
        ee - ee
        targetPlatform - (this only works for a single-platform data set)
        qt - qt
        data - data
      • reprocessAffyDataFromCel

        @Transactional(propagation=NEVER)
        public void reprocessAffyDataFromCel​(ExpressionExperiment ee)
        Affymetrix only: Provide or replace data for an Affymetrix-based experiment, using CEL files. CEL files are downloaded from GEO, apt-probeset-summarize is executed to get the data, and then the experiment is updated. One side-effect is that the data set may end up being on a different platform than originally. A complication is the CEL file type may not match the platform we want the experiment to end up being one. A further complication is when this is re-run on a data set, or if the data set is on a merged platform. Therefore, some of the steps involve inspecting the CEL files to determine the chip type used so we can run apt-probset-summarize correctly; replacing the vectors. Exceptions will be thrown if CEL files can't be located, or the experiments is set up in a way we can't support.
        Specified by:
        reprocessAffyDataFromCel in interface DataUpdater
        Parameters:
        ee - the experiment (already lightly thawed)
      • addData

        @Transactional(propagation=NEVER)
        public ExpressionExperiment addData​(ExpressionExperiment ee,
                                            ArrayDesign targetPlatform,
                                            ExpressionDataDoubleMatrix data)
        Generic but in practice used for RNA-seq. Add an additional data (with associated quantitation type) to the selected experiment. Will do postprocessing if the data quantitationType is 'preferred', but if there is already a preferred quantitation type, an error will be thrown.
        Specified by:
        addData in interface DataUpdater
        Parameters:
        ee - ee
        targetPlatform - optional; if null, uses the platform already used (if there is just one; you can't use this for a multi-platform dataset)
        data - to slot in
        Returns:
        ee
      • replaceData

        @Transactional(propagation=NEVER)
        public ExpressionExperiment replaceData​(ExpressionExperiment ee,
                                                ArrayDesign targetPlatform,
                                                ExpressionDataDoubleMatrix data)
        Replace the data associated with the experiment (or add it if there is none). These data become the 'preferred' quantitation type. Note that this replaces the "raw" data. Similar to AffyPowerToolsProbesetSummarize.convertDesignElementDataVectors and code in SimpleExpressionDataLoaderService.
        Specified by:
        replaceData in interface DataUpdater
        Parameters:
        ee - the experiment to be modified
        targetPlatform - the platform for the new data (this can only be used for single-platform data sets). The experiment will be switched to it if necessary.
        data - the data to be used
        Returns:
        ee