Class AnnDataSingleCellDataLoader
java.lang.Object
ubic.gemma.core.loader.expression.singleCell.AnnDataSingleCellDataLoader
- All Implemented Interfaces:
Closeable,AutoCloseable,DataLoader,SequencingDataLoader,SingleCellDataLoader
Reads single-cell vectors from the AnnData on-disk HDF5 format.
- Author:
- poirigui
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()Free any resources that the loaded has setup.getCellTypeAssignments(SingleCellDimension dimension) Load single-cell type assignments present in the data.getFactors(Collection<BioAssay> samples, Map<BioMaterial, Set<FactorValue>> factorValueAssignments) Load experimental factors present in the data.getGenes()Load gene identifiers present in the data.Obtain the genes for a specific QT.Load cell-level characteristics that are not cell type assignments present in the data.Load quantitation types present in the data.Obtain the sample names present in the data.getSamplesCharacteristics(Collection<BioAssay> samples) Load samples characteristics present in the data.getSequencingMetadata(Collection<BioAssay> samples) Retrieve various sequencing metadata if counting data is present.getSequencingMetadata(SingleCellDimension dimension) getSingleCellDimension(Collection<BioAssay> bioAssays) Load the single-cell dimension present in the data.loadVectors(Collection<CompositeSequence> designElements, SingleCellDimension dimension, QuantitationType quantitationType) Produces a stream of single-cell expression data vectors for the givenQuantitationType.voidsetBioAssayToSampleNameMapper(BioAssayMapper bioAssayToSampleNameMapper) Set the strategy used for mappingBioAssayto sample names from the data.voidsetCellTypeFactorName(String cellTypeFactorName) The name of the cell type factor under/var.voidsetDesignElementToGeneMapper(DesignElementMapper designElementToGeneMapper) Set the strategy used for mappingCompositeSequenceto gene identifiers from the data.voidsetIgnoreCellTypeFactor(boolean ignoreCellTypeFactor) Ignore cell type assignment.voidsetIgnoreUnmatchedDesignElements(boolean ignoreUnmatchedDesignElements) Ignore unmatched design elements from the data when creating vectors.voidsetIgnoreUnmatchedSamples(boolean ignoreUnmatchedSamples) Ignore unmatched samples from the data.voidsetMaxCharacteristics(int maxCharacteristics) Maximum number of characteristics to consider when loading a cell-level characteristic.voidsetSampleFactorName(String sampleFactorName) The name of the sample factor under/var.voidsetTranspose(boolean transpose) Transpose obs/var dataframes.voidsetUnknownCellTypeIndicator(String unknownCellTypeIndicator) An indicator for unknown cell type if the dataset uses something else than the-1code.voidsetUseRawX(Boolean useRawX) Use or not theraw.Xlayer.
-
Constructor Details
-
AnnDataSingleCellDataLoader
-
-
Method Details
-
getSampleNames
Description copied from interface:DataLoaderObtain the sample names present in the data.- Specified by:
getSampleNamesin interfaceDataLoader- Throws:
IOException
-
getSingleCellDimension
public SingleCellDimension getSingleCellDimension(Collection<BioAssay> bioAssays) throws IOException Description copied from interface:SingleCellDataLoaderLoad the single-cell dimension present in the data.Not all samples might be present and thus the returned
SingleCellDimensionwill have a expression data for a subset of the data.- Specified by:
getSingleCellDimensionin interfaceSingleCellDataLoader- Parameters:
bioAssays- a set of bioassays to use when populating the dimension, not all bioassays may be used- Throws:
IOException
-
getQuantitationTypes
Description copied from interface:DataLoaderLoad quantitation types present in the data.- Specified by:
getQuantitationTypesin interfaceDataLoader- Throws:
IOException
-
getCellTypeAssignments
public Set<CellTypeAssignment> getCellTypeAssignments(SingleCellDimension dimension) throws IOException Description copied from interface:SingleCellDataLoaderLoad single-cell type assignments present in the data.- Specified by:
getCellTypeAssignmentsin interfaceSingleCellDataLoader- Throws:
IOException
-
getOtherCellLevelCharacteristics
public Set<CellLevelCharacteristics> getOtherCellLevelCharacteristics(SingleCellDimension dimension) throws IOException Description copied from interface:SingleCellDataLoaderLoad cell-level characteristics that are not cell type assignments present in the data.- Specified by:
getOtherCellLevelCharacteristicsin interfaceSingleCellDataLoader- Throws:
IOException
-
getFactors
public Set<ExperimentalFactor> getFactors(Collection<BioAssay> samples, @Nullable Map<BioMaterial, Set<FactorValue>> factorValueAssignments) throws IOExceptionDescription copied from interface:DataLoaderLoad experimental factors present in the data.- Specified by:
getFactorsin interfaceDataLoader- Parameters:
samples- samples to use when determining which factors to loadfactorValueAssignments- if non-null, the proposed assignment of factor values to samples are populated in the mapping.- Returns:
- a set of factors present in the data
- Throws:
IOException
-
getSamplesCharacteristics
public Map<BioMaterial,Set<Characteristic>> getSamplesCharacteristics(Collection<BioAssay> samples) throws IOException Description copied from interface:DataLoaderLoad samples characteristics present in the data.- Specified by:
getSamplesCharacteristicsin interfaceDataLoader- Parameters:
samples- to use when determining which characteristics to load- Returns:
- proposed characteristics grouped by sample
- Throws:
IOException
-
getGenes
Description copied from interface:DataLoaderLoad gene identifiers present in the data.- Specified by:
getGenesin interfaceDataLoader- Throws:
IOException
-
getSequencingMetadata
public Map<BioAssay,SequencingMetadata> getSequencingMetadata(Collection<BioAssay> samples) throws IOException Description copied from interface:SequencingDataLoaderRetrieve various sequencing metadata if counting data is present.- Specified by:
getSequencingMetadatain interfaceSequencingDataLoader- Throws:
IOException
-
getSequencingMetadata
public Map<BioAssay,SequencingMetadata> getSequencingMetadata(SingleCellDimension dimension) throws IOException - Specified by:
getSequencingMetadatain interfaceSingleCellDataLoader- Throws:
IOException
-
getGenes
Obtain the genes for a specific QT.- Throws:
IOException
-
loadVectors
public Stream<SingleCellExpressionDataVector> loadVectors(Collection<CompositeSequence> designElements, SingleCellDimension dimension, QuantitationType quantitationType) throws IOException, IllegalArgumentException Description copied from interface:SingleCellDataLoaderProduces a stream of single-cell expression data vectors for the givenQuantitationType.- Specified by:
loadVectorsin interfaceSingleCellDataLoader- Parameters:
designElements- a collection of design elements for mapping of element names used in the dataset toCompositeSequencedimension- a dimension to use for creating vectors, may be loaded from the single-cell data withSingleCellDataLoader.getSingleCellDimension(Collection)quantitationType- a quantitation type to extract from the data for, may be loaded from the single-cell data withDataLoader.getQuantitationTypes()- Returns:
- a stream of single-cell expression data vectors that must be closed when done, preferably using a try-with-resource block.
- Throws:
IllegalArgumentException- if a design element present in the data cannot be matched to one of the supplied elements, requires settingDataLoader.setIgnoreUnmatchedDesignElements(boolean)to falseIOException
-
close
Description copied from interface:DataLoaderFree any resources that the loaded has setup.- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Specified by:
closein interfaceDataLoader- Throws:
IOException
-
setBioAssayToSampleNameMapper
Description copied from interface:DataLoaderSet the strategy used for mappingBioAssayto sample names from the data.- Specified by:
setBioAssayToSampleNameMapperin interfaceDataLoader
-
setIgnoreUnmatchedSamples
public void setIgnoreUnmatchedSamples(boolean ignoreUnmatchedSamples) Description copied from interface:DataLoaderIgnore unmatched samples from the data.This defaults to true.
- Specified by:
setIgnoreUnmatchedSamplesin interfaceDataLoader
-
setDesignElementToGeneMapper
Description copied from interface:DataLoaderSet the strategy used for mappingCompositeSequenceto gene identifiers from the data.- Specified by:
setDesignElementToGeneMapperin interfaceDataLoader
-
setIgnoreUnmatchedDesignElements
public void setIgnoreUnmatchedDesignElements(boolean ignoreUnmatchedDesignElements) Description copied from interface:DataLoaderIgnore unmatched design elements from the data when creating vectors.This defaults to true.
There's a discussions to make this default in false in general for sequencing data.
- Specified by:
setIgnoreUnmatchedDesignElementsin interfaceDataLoader
-
setSampleFactorName
The name of the sample factor under/var. -
setCellTypeFactorName
The name of the cell type factor under/var. -
setIgnoreCellTypeFactor
public void setIgnoreCellTypeFactor(boolean ignoreCellTypeFactor) Ignore cell type assignment. -
setUnknownCellTypeIndicator
An indicator for unknown cell type if the dataset uses something else than the-1code. -
setUseRawX
Use or not theraw.Xlayer.The default is to use
Xifraw.Xis not present. Ifraw.Xis present and no value is specified, an exception will be raised. -
setTranspose
public void setTranspose(boolean transpose) Transpose obs/var dataframes. -
setMaxCharacteristics
public void setMaxCharacteristics(int maxCharacteristics) Maximum number of characteristics to consider when loading a cell-level characteristic.
-