Class GeoSingleCellDetector
- java.lang.Object
-
- ubic.gemma.core.loader.expression.geo.singleCell.GeoSingleCellDetector
-
- All Implemented Interfaces:
AutoCloseable
,ArchiveBasedSingleCellDetector
,SeriesAwareSingleCellDetector
,SingleCellDetector
public class GeoSingleCellDetector extends Object implements SingleCellDetector, ArchiveBasedSingleCellDetector, SeriesAwareSingleCellDetector, AutoCloseable
This is the main single-cell data detector that delegates to other more specific detectors.Samples can be loaded in parallel when retrieving a GEO series with
downloadSingleCellData(GeoSeries)
. The number of threads used is controlled bysetNumberOfFetchThreads(int)
and defaults to 4.- Author:
- poirigui
-
-
Field Summary
Fields Modifier and Type Field Description static int
DEFAULT_NUMBER_OF_FETCH_THREADS
Default number of threads to use for fetching data.-
Fields inherited from interface ubic.gemma.core.loader.expression.geo.singleCell.ArchiveBasedSingleCellDetector
DEFAULT_MAX_ENTRY_SIZE_IN_ARCHIVE_TO_SKIP, DEFAULT_MAX_NUMBER_OF_ENTRIES_TO_SKIP
-
-
Constructor Summary
Constructors Constructor Description GeoSingleCellDetector()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
Path
downloadSingleCellData(GeoSample sample)
Download single-cell data for the given GEO sample.Path
downloadSingleCellData(GeoSample sample, SingleCellDataType dataType)
Path
downloadSingleCellData(GeoSeries series)
Download single-cell data from a GEO series to disk.Path
downloadSingleCellData(GeoSeries series, GeoSample sample)
Download a sample in the context of a series.Path
downloadSingleCellData(GeoSeries series, GeoSample sample, SingleCellDataType dataType)
void
downloadSingleCellData(GeoSeries series, SingleCellDataType dataType)
void
downloadSingleCellData(GeoSeries series, SingleCellDataType dataType, String supplementaryFile)
List<String>
getAdditionalSupplementaryFiles(GeoSample sample)
Obtain a list of all additional supplementary files.List<String>
getAdditionalSupplementaryFiles(GeoSeries series)
Obtain a list of all additional supplementary files.List<String>
getAdditionalSupplementaryFiles(GeoSeries series, GeoSample sample)
Set<SingleCellDataType>
getAllSingleCellDataTypes(GeoSeries series)
Obtain all single-cell data types a GEO series contains.SingleCellDataLoader
getSingleCellDataLoader(GeoSeries series, SingleCellDataLoaderConfig config)
Obtain a single-cell data loader.SingleCellDataType
getSingleCellDataType(GeoSeries series)
Determine the type of single-cell data a GEO series contains.boolean
hasSingleCellData(GeoSample sample)
Indicate if the given GEO sample has single cell data.boolean
hasSingleCellData(GeoSeries series)
Detects if a GEO series has single-cell data either at the series-level or in individual samples.boolean
hasSingleCellData(GeoSeries series, GeoSample sample)
boolean
hasSingleCellDataInSeries(GeoSeries series)
Check if a GEO series has single-cell data at the series-level.boolean
hasSingleCellDataInSra(GeoSample sample)
Check if a GEO sample has single-cell data in SRA.boolean
hasSingleCellDataInSra(GeoSeries series)
Check if a GEO series has single-cell data in SRA.boolean
hasSingleCellDataInSra(GeoSeries series, Collection<String> sraAccessions)
boolean
isSingleCell(GeoSample sample, boolean hasSingleCellDataInSeries)
Check if a GEO sample is single-cell by looking up its metadata.boolean
isSingleNuclei(GeoSample sample, boolean hasSingleCellDataInSeries)
Check if a GEO sample is single-nuclei by looking up its metadata.void
resetMexFileSuffixes()
void
setDownloadDirectory(Path dir)
Directory where single-cell data is downloaded.void
setFTPClientFactory(FTPClientFactory factory)
Set theFTPClient
factory used to create FTP connection to retrieve supplementary materials.void
setMaxEntrySizeInArchiveToSkip(long maxNumberOfEntriesToSkip)
Set the maximum size of an archive entry to skip the supplementary file altogether.void
setMaxNumberOfEntriesToSkip(long maxNumberOfEntriesToSkip)
Set the maximum number of archive entries to skip in order to ignore the supplementary file altogether.void
setMexFileSuffixes(String barcodes, String features, String matrix)
Set the suffixes to use to detect MEX metadata.void
setNumberOfFetchThreads(int numberOfFetchThreads)
Number of threads to use for downloading single-cell data.void
setRetryPolicy(SimpleRetryPolicy retryPolicy)
Set the retry policy to use when downloading single-cell data.void
setSraFetcher(SraFetcher sraFetcher)
Set theSraFetcher
used to fetch SRA metadata.
-
-
-
Field Detail
-
DEFAULT_NUMBER_OF_FETCH_THREADS
public static final int DEFAULT_NUMBER_OF_FETCH_THREADS
Default number of threads to use for fetching data.- See Also:
- Constant Field Values
-
-
Method Detail
-
close
public void close()
- Specified by:
close
in interfaceAutoCloseable
-
setNumberOfFetchThreads
public void setNumberOfFetchThreads(int numberOfFetchThreads)
Number of threads to use for downloading single-cell data.
-
setFTPClientFactory
public void setFTPClientFactory(FTPClientFactory factory)
Set theFTPClient
factory used to create FTP connection to retrieve supplementary materials.
-
setDownloadDirectory
public void setDownloadDirectory(Path dir)
Directory where single-cell data is downloaded.Data are organized by GEO series or GEO series accessions.
For AnnData, Seurat Disk and Loom:
- {downloadDir}/{geoSeriesAccession}.h5ad
- {downloadDir}/{geoSeriesAccession}.h5Seurat
- {downloadDir}/{geoSeriesAccession}.loom
- {downloadDir}/{geoSampleAccession}/barcodes.tsv.gz
- {downloadDir}/{geoSampleAccession}/features.tsv.gz
- {downloadDir}/{geoSampleAccession}/matrix.mtx.gz
- Specified by:
setDownloadDirectory
in interfaceSingleCellDetector
-
setRetryPolicy
public void setRetryPolicy(SimpleRetryPolicy retryPolicy)
Description copied from interface:SingleCellDetector
Set the retry policy to use when downloading single-cell data.- Specified by:
setRetryPolicy
in interfaceSingleCellDetector
-
setMexFileSuffixes
public void setMexFileSuffixes(String barcodes, String features, String matrix)
Set the suffixes to use to detect MEX metadata.
-
resetMexFileSuffixes
public void resetMexFileSuffixes()
-
setMaxEntrySizeInArchiveToSkip
public void setMaxEntrySizeInArchiveToSkip(long maxNumberOfEntriesToSkip)
Description copied from interface:ArchiveBasedSingleCellDetector
Set the maximum size of an archive entry to skip the supplementary file altogether.Use -1 to indicate no limit.
Note that if a relevant file was previously found in the archive, it will not be skipped.
- Specified by:
setMaxEntrySizeInArchiveToSkip
in interfaceArchiveBasedSingleCellDetector
-
setMaxNumberOfEntriesToSkip
public void setMaxNumberOfEntriesToSkip(long maxNumberOfEntriesToSkip)
Description copied from interface:ArchiveBasedSingleCellDetector
Set the maximum number of archive entries to skip in order to ignore the supplementary file altogether.Use -1 to indicate no limit.
Note that if a relevant file was previously found in the archive, it will not be ignored.
- Specified by:
setMaxNumberOfEntriesToSkip
in interfaceArchiveBasedSingleCellDetector
-
setSraFetcher
public void setSraFetcher(@Nullable SraFetcher sraFetcher)
Set theSraFetcher
used to fetch SRA metadata.
-
hasSingleCellData
public boolean hasSingleCellData(GeoSeries series)
Detects if a GEO series has single-cell data either at the series-level or in individual samples.- Specified by:
hasSingleCellData
in interfaceSingleCellDetector
-
hasSingleCellData
public boolean hasSingleCellData(GeoSample sample)
Description copied from interface:SingleCellDetector
Indicate if the given GEO sample has single cell data.- Specified by:
hasSingleCellData
in interfaceSingleCellDetector
-
hasSingleCellData
public boolean hasSingleCellData(GeoSeries series, GeoSample sample)
- Specified by:
hasSingleCellData
in interfaceSeriesAwareSingleCellDetector
-
getSingleCellDataType
public SingleCellDataType getSingleCellDataType(GeoSeries series) throws NoSingleCellDataFoundException
Determine the type of single-cell data a GEO series contains.- Throws:
NoSingleCellDataFoundException
-
getAllSingleCellDataTypes
public Set<SingleCellDataType> getAllSingleCellDataTypes(GeoSeries series)
Obtain all single-cell data types a GEO series contains.
-
downloadSingleCellData
public Path downloadSingleCellData(GeoSeries series) throws NoSingleCellDataFoundException, IOException
Download single-cell data from a GEO series to disk.This has to be done prior to
#getSingleCellDataLoader(GeoSeries)
.- Specified by:
downloadSingleCellData
in interfaceSingleCellDetector
- Returns:
- a directory or file containing the downloaded series data
- Throws:
NoSingleCellDataFoundException
- if no single-cell data is found either at the series level or in individual samplesUnsupportedOperationException
- if single-cell data is found at the series levelIOException
-
downloadSingleCellData
public void downloadSingleCellData(GeoSeries series, SingleCellDataType dataType, String supplementaryFile) throws IOException
- Throws:
IOException
-
downloadSingleCellData
public void downloadSingleCellData(GeoSeries series, SingleCellDataType dataType) throws NoSingleCellDataFoundException, IOException
-
downloadSingleCellData
public Path downloadSingleCellData(GeoSeries series, GeoSample sample) throws NoSingleCellDataFoundException, IOException
Download a sample in the context of a series.This is only applicable to MEX and Loom.
- Specified by:
downloadSingleCellData
in interfaceSeriesAwareSingleCellDetector
- Throws:
NoSingleCellDataFoundException
IOException
-
downloadSingleCellData
public Path downloadSingleCellData(GeoSeries series, GeoSample sample, SingleCellDataType dataType) throws NoSingleCellDataFoundException, IOException
-
downloadSingleCellData
public Path downloadSingleCellData(GeoSample sample) throws NoSingleCellDataFoundException, IOException
Description copied from interface:SingleCellDetector
Download single-cell data for the given GEO sample.- Specified by:
downloadSingleCellData
in interfaceSingleCellDetector
- Returns:
- a directory or file containing the downloaded sample data
- Throws:
NoSingleCellDataFoundException
- if there is no single cell data for the given sampleIOException
-
downloadSingleCellData
public Path downloadSingleCellData(GeoSample sample, SingleCellDataType dataType) throws NoSingleCellDataFoundException, IOException
-
getSingleCellDataLoader
public SingleCellDataLoader getSingleCellDataLoader(GeoSeries series, SingleCellDataLoaderConfig config) throws NoSingleCellDataFoundException
Obtain a single-cell data loader.Only local files previously retrieved with
downloadSingleCellData(GeoSeries)
are inspected.- Specified by:
getSingleCellDataLoader
in interfaceSingleCellDetector
- Throws:
NoSingleCellDataFoundException
- if no single-cell data was found on-diskUnsupportedOperationException
- if single-cell data was found, but cannot be loaded
-
getAdditionalSupplementaryFiles
public List<String> getAdditionalSupplementaryFiles(GeoSeries series)
Description copied from interface:SingleCellDetector
Obtain a list of all additional supplementary files.- Specified by:
getAdditionalSupplementaryFiles
in interfaceSingleCellDetector
-
getAdditionalSupplementaryFiles
public List<String> getAdditionalSupplementaryFiles(GeoSeries series, GeoSample sample)
- Specified by:
getAdditionalSupplementaryFiles
in interfaceSeriesAwareSingleCellDetector
-
getAdditionalSupplementaryFiles
public List<String> getAdditionalSupplementaryFiles(GeoSample sample)
Description copied from interface:SingleCellDetector
Obtain a list of all additional supplementary files.- Specified by:
getAdditionalSupplementaryFiles
in interfaceSingleCellDetector
-
hasSingleCellDataInSeries
public boolean hasSingleCellDataInSeries(GeoSeries series)
Check if a GEO series has single-cell data at the series-level.
-
hasSingleCellDataInSra
public boolean hasSingleCellDataInSra(GeoSeries series)
Check if a GEO series has single-cell data in SRA.
-
hasSingleCellDataInSra
public boolean hasSingleCellDataInSra(GeoSeries series, Collection<String> sraAccessions)
-
hasSingleCellDataInSra
public boolean hasSingleCellDataInSra(GeoSample sample)
Check if a GEO sample has single-cell data in SRA.
-
isSingleCell
public boolean isSingleCell(GeoSample sample, boolean hasSingleCellDataInSeries)
Check if a GEO sample is single-cell by looking up its metadata.- Parameters:
hasSingleCellDataInSeries
- indicate if the series has single-cell data, this is used as a last resort to determine if a given sample is single-cell, usehasSingleCellDataInSeries(GeoSeries)
to compute and reuse this value.
-
isSingleNuclei
public boolean isSingleNuclei(GeoSample sample, boolean hasSingleCellDataInSeries)
Check if a GEO sample is single-nuclei by looking up its metadata.
-
-