Class VectorMergingServiceImpl

  • All Implemented Interfaces:
    VectorMergingService

    @Service
    public class VectorMergingServiceImpl
    extends ExpressionExperimentVectorManipulatingService
    implements VectorMergingService
    Tackles the problem of concatenating DesignElementDataVectors for a single experiment. This is necessary When a study uses two or more similar array designs without 'replication'. Typical of the genre is GSE60 ("Diffuse large B-cell lymphoma"), with 31 BioAssays on GPL174, 35 BioAssays on GPL175, and 66 biomaterials. A more complex one: GSE3500, with 13 ArrayDesigns. In that (and others) case, there are quantitation types which do not appear on all array designs, leaving gaps in the vectors that have to be filled in. The algorithm for dealing with this is a preprocessing step:
    1. Generate a merged set of vectors for each of the (important) quantitation types.
    2. Create a merged BioAssayDimension
    3. Persist the new vectors, which are now tied to a single DesignElement. This is, strictly speaking, incorrect, but because the design elements used in the vector all point to the same sequence, there is no major problem in analyzing this. However, there is a potential loss of information.
    4. Cleanup: remove old vectors, analyses, and BioAssayDimensions.
    5. Postprocess: Recreate the processed datavectors, including masking missing values if necesssary.
    Vectors which are empty (all missing values) are not persisted. If problems are found during merging, an exception will be thrown, though this may leave things in a bad state requiring a reload of the data.
    Author:
    pavlidis
    See Also:
    ExpressionDataMatrixBuilder