Class MexSingleCellDataLoader
java.lang.Object
ubic.gemma.core.loader.expression.singleCell.MexSingleCellDataLoader
- All Implemented Interfaces:
Closeable,AutoCloseable,DataLoader,SequencingDataLoader,SingleCellDataLoader
Load single cell data from 10X Genomics MEX format.
- Author:
- poirigui
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()Free any resources that the loaded has setup.float[]double2float(double[] vec) int[]double2int(double[] vec) long[]double2long(double[] vec) getCellTypeAssignments(SingleCellDimension dimension) MEX does not provide cell type labels.getFactors(Collection<BioAssay> samples, Map<BioMaterial, Set<FactorValue>> factorValueAssignments) MEX does not provide experimental factors.getGenes()Load gene identifiers present in the data.Load cell-level characteristics that are not cell type assignments present in the data.Load quantitation types present in the data.Obtain the sample names present in the data.getSamplesCharacteristics(Collection<BioAssay> samples) MEX does not provide sample characteristics.getSequencingMetadata(Collection<BioAssay> samples) Retrieve various sequencing metadata if counting data is present.getSequencingMetadata(SingleCellDimension dimension) getSingleCellDimension(Collection<BioAssay> bioAssays) Load the single-cell dimension present in the data.loadVectors(Collection<CompositeSequence> designElements, SingleCellDimension scd, QuantitationType quantitationType) Produces a stream of single-cell expression data vectors for the givenQuantitationType.voidsetAllowMappingDesignElementsToGeneSymbols(boolean allowMappingDesignElementsToGeneSymbols) Allow mapping probe to gene symbols.voidsetBioAssayToSampleNameMapper(BioAssayMapper bioAssayToSampleNameMapper) Set the strategy used for mappingBioAssayto sample names from the data.voidsetDesignElementToGeneMapper(DesignElementMapper designElementToGeneMapper) Set the strategy used for mappingCompositeSequenceto gene identifiers from the data.voidsetDiscardEmptyCells(boolean discardEmptyCells) Discard empty cells by removing columns from MTX file that have no gene-associated counts.voidsetIgnoreUnmatchedDesignElements(boolean ignoreUnmatchedDesignElements) Ignore unmatched design elements from the data when creating vectors.voidsetIgnoreUnmatchedSamples(boolean ignoreUnmatchedSamples) Ignore unmatched samples from the data.voidsetUseDoublePrecision(boolean useDoublePrecision) Use double precision (i.e.
-
Constructor Details
-
MexSingleCellDataLoader
-
-
Method Details
-
getSampleNames
Description copied from interface:DataLoaderObtain the sample names present in the data.- Specified by:
getSampleNamesin interfaceDataLoader
-
getSingleCellDimension
public SingleCellDimension getSingleCellDimension(Collection<BioAssay> bioAssays) throws IOException Description copied from interface:SingleCellDataLoaderLoad the single-cell dimension present in the data.Not all samples might be present and thus the returned
SingleCellDimensionwill have a expression data for a subset of the data.- Specified by:
getSingleCellDimensionin interfaceSingleCellDataLoader- Parameters:
bioAssays- a set of bioassays to use when populating the dimension, not all bioassays may be used- Throws:
IOException
-
getQuantitationTypes
Description copied from interface:DataLoaderLoad quantitation types present in the data.- Specified by:
getQuantitationTypesin interfaceDataLoader- Throws:
IOException
-
getCellTypeAssignments
MEX does not provide cell type labels.- Specified by:
getCellTypeAssignmentsin interfaceSingleCellDataLoader
-
getOtherCellLevelCharacteristics
public Set<CellLevelCharacteristics> getOtherCellLevelCharacteristics(SingleCellDimension dimension) Description copied from interface:SingleCellDataLoaderLoad cell-level characteristics that are not cell type assignments present in the data.- Specified by:
getOtherCellLevelCharacteristicsin interfaceSingleCellDataLoader
-
getFactors
public Set<ExperimentalFactor> getFactors(Collection<BioAssay> samples, @Nullable Map<BioMaterial, Set<FactorValue>> factorValueAssignments) MEX does not provide experimental factors.- Specified by:
getFactorsin interfaceDataLoader- Parameters:
samples- samples to use when determining which factors to loadfactorValueAssignments- if non-null, the proposed assignment of factor values to samples are populated in the mapping.- Returns:
- a set of factors present in the data
-
getSamplesCharacteristics
MEX does not provide sample characteristics.- Specified by:
getSamplesCharacteristicsin interfaceDataLoader- Parameters:
samples- to use when determining which characteristics to load- Returns:
- proposed characteristics grouped by sample
-
getGenes
Description copied from interface:DataLoaderLoad gene identifiers present in the data.- Specified by:
getGenesin interfaceDataLoader- Throws:
IOException
-
getSequencingMetadata
public Map<BioAssay,SequencingMetadata> getSequencingMetadata(SingleCellDimension dimension) throws IOException - Specified by:
getSequencingMetadatain interfaceSingleCellDataLoader- Throws:
IOException
-
getSequencingMetadata
public Map<BioAssay,SequencingMetadata> getSequencingMetadata(Collection<BioAssay> samples) throws IOException Description copied from interface:SequencingDataLoaderRetrieve various sequencing metadata if counting data is present.- Specified by:
getSequencingMetadatain interfaceSequencingDataLoader- Throws:
IOException
-
loadVectors
public Stream<SingleCellExpressionDataVector> loadVectors(Collection<CompositeSequence> designElements, SingleCellDimension scd, QuantitationType quantitationType) throws IOException Description copied from interface:SingleCellDataLoaderProduces a stream of single-cell expression data vectors for the givenQuantitationType.- Specified by:
loadVectorsin interfaceSingleCellDataLoader- Parameters:
designElements- a collection of design elements for mapping of element names used in the dataset toCompositeSequencescd- a dimension to use for creating vectors, may be loaded from the single-cell data withSingleCellDataLoader.getSingleCellDimension(Collection)quantitationType- a quantitation type to extract from the data for, may be loaded from the single-cell data withDataLoader.getQuantitationTypes()- Returns:
- a stream of single-cell expression data vectors that must be closed when done, preferably using a try-with-resource block.
- Throws:
IOException
-
close
Description copied from interface:DataLoaderFree any resources that the loaded has setup.- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Specified by:
closein interfaceDataLoader- Throws:
IOException
-
double2float
public float[] double2float(double[] vec) -
double2int
public int[] double2int(double[] vec) -
double2long
public long[] double2long(double[] vec) -
setBioAssayToSampleNameMapper
Description copied from interface:DataLoaderSet the strategy used for mappingBioAssayto sample names from the data.- Specified by:
setBioAssayToSampleNameMapperin interfaceDataLoader
-
setIgnoreUnmatchedSamples
public void setIgnoreUnmatchedSamples(boolean ignoreUnmatchedSamples) Description copied from interface:DataLoaderIgnore unmatched samples from the data.This defaults to true.
- Specified by:
setIgnoreUnmatchedSamplesin interfaceDataLoader
-
setDesignElementToGeneMapper
Description copied from interface:DataLoaderSet the strategy used for mappingCompositeSequenceto gene identifiers from the data.- Specified by:
setDesignElementToGeneMapperin interfaceDataLoader
-
setIgnoreUnmatchedDesignElements
public void setIgnoreUnmatchedDesignElements(boolean ignoreUnmatchedDesignElements) Description copied from interface:DataLoaderIgnore unmatched design elements from the data when creating vectors.This defaults to true.
There's a discussions to make this default in false in general for sequencing data.
- Specified by:
setIgnoreUnmatchedDesignElementsin interfaceDataLoader
-
setAllowMappingDesignElementsToGeneSymbols
public void setAllowMappingDesignElementsToGeneSymbols(boolean allowMappingDesignElementsToGeneSymbols) Allow mapping probe to gene symbols.This is used as fallback if the gene ID cannot be found in the supplied platform. If this is set to true, the second column of the genes file will be looked up.
-
setUseDoublePrecision
public void setUseDoublePrecision(boolean useDoublePrecision) Use double precision (i.e.PrimitiveType.DOUBLEfor real numbers andPrimitiveType.LONGfor integers. -
setDiscardEmptyCells
public void setDiscardEmptyCells(boolean discardEmptyCells) Discard empty cells by removing columns from MTX file that have no gene-associated counts.This can be disabled for faster loading.
-