Class GeoSingleCellDetector
- All Implemented Interfaces:
AutoCloseable,ArchiveBasedSingleCellDetector,SeriesAwareSingleCellDetector,SingleCellDetector
Samples can be loaded in parallel when retrieving a GEO series with downloadSingleCellData(GeoSeries). The
number of threads used is controlled by setNumberOfFetchThreads(int) and defaults to 4.
- Author:
- poirigui
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intDefault number of threads to use for fetching data.Fields inherited from interface ubic.gemma.core.loader.expression.geo.singleCell.ArchiveBasedSingleCellDetector
DEFAULT_MAX_ENTRY_SIZE_IN_ARCHIVE_TO_SKIP, DEFAULT_MAX_NUMBER_OF_ENTRIES_TO_SKIP -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()downloadSingleCellData(GeoSample sample) Download single-cell data for the given GEO sample.downloadSingleCellData(GeoSample sample, SingleCellDataType dataType) downloadSingleCellData(GeoSeries series) Download single-cell data from a GEO series to disk.downloadSingleCellData(GeoSeries series, GeoSample sample) Download a sample in the context of a series.downloadSingleCellData(GeoSeries series, GeoSample sample, SingleCellDataType dataType) voiddownloadSingleCellData(GeoSeries series, SingleCellDataType dataType) voiddownloadSingleCellData(GeoSeries series, SingleCellDataType dataType, String supplementaryFile) Obtain a list of all additional supplementary files.Obtain a list of all additional supplementary files.getAdditionalSupplementaryFiles(GeoSeries series, GeoSample sample) getAllSingleCellDataTypes(GeoSeries series) Obtain all single-cell data types a GEO series contains.getSingleCellDataLoader(GeoSeries series, SingleCellDataLoaderConfig config) Obtain a single-cell data loader.getSingleCellDataType(GeoSeries series) Determine the type of single-cell data a GEO series contains.booleanhasSingleCellData(GeoSample sample) Indicate if the given GEO sample has single-cell data.booleanhasSingleCellData(GeoSeries series) Detects if a GEO series has single-cell data either at the series-level or in individual samples.booleanhasSingleCellData(GeoSeries series, GeoSample sample) booleanhasSingleCellDataInSeries(GeoSeries series) Check if a GEO series has single-cell data at the series-level.booleanhasSingleCellDataInSeries(GeoSeries series, SingleCellDataType dataType) Check if a GEO series has single-cell data at the series-level.booleanhasSingleCellDataInSra(GeoSample sample) Check if a GEO sample has single-cell data in SRA.booleanhasSingleCellDataInSra(GeoSeries series) Check if a GEO series has single-cell data in SRA.booleanhasSingleCellDataInSra(GeoSeries series, Collection<String> sraAccessions, Collection<String> sraAccessionsWithOtherDataTypes) booleanisSingleCell(GeoSample sample, boolean hasSingleCellDataInSeries) Check if a GEO sample is single-cell by looking up its metadata.booleanisSingleNuclei(GeoSample sample, boolean hasSingleCellDataInSeries) Check if a GEO sample is single-nuclei by looking up its metadata.voidvoidsetCellRangerPrefix(Path cellRangerPrefix) voidsetDownloadDirectory(Path dir) Directory where single-cell data is downloaded.voidsetFTPClientFactory(FTPClientFactory factory) Set theFTPClientfactory used to create FTP connection to retrieve supplementary materials.voidsetMaxEntrySizeInArchiveToSkip(long maxNumberOfEntriesToSkip) Set the maximum size of an archive entry to skip the supplementary file altogether.voidsetMaxNumberOfEntriesToSkip(long maxNumberOfEntriesToSkip) Set the maximum number of archive entries to skip in order to ignore the supplementary file altogether.voidsetMexFileSuffixes(String barcodes, String features, String matrix) Set the suffixes to use to detect MEX metadata.voidsetNumberOfFetchThreads(int numberOfFetchThreads) Number of threads to use for downloading single-cell data.voidsetRetryPolicy(SimpleRetryPolicy retryPolicy) Set the retry policy to use when downloading single-cell data.voidsetSraFetcher(SraFetcher sraFetcher) Set theSraFetcherused to fetch SRA metadata.
-
Field Details
-
DEFAULT_NUMBER_OF_FETCH_THREADS
public static final int DEFAULT_NUMBER_OF_FETCH_THREADSDefault number of threads to use for fetching data.- See Also:
-
-
Constructor Details
-
GeoSingleCellDetector
public GeoSingleCellDetector()
-
-
Method Details
-
close
public void close()- Specified by:
closein interfaceAutoCloseable
-
setNumberOfFetchThreads
public void setNumberOfFetchThreads(int numberOfFetchThreads) Number of threads to use for downloading single-cell data. -
setFTPClientFactory
Set theFTPClientfactory used to create FTP connection to retrieve supplementary materials. -
setDownloadDirectory
Directory where single-cell data is downloaded.Data are organized by GEO series or GEO series accessions.
For AnnData, Seurat Disk and Loom:
- {downloadDir}/{geoSeriesAccession}.h5ad
- {downloadDir}/{geoSeriesAccession}.h5Seurat
- {downloadDir}/{geoSeriesAccession}.loom
- {downloadDir}/{geoSampleAccession}/barcodes.tsv.gz
- {downloadDir}/{geoSampleAccession}/features.tsv.gz
- {downloadDir}/{geoSampleAccession}/matrix.mtx.gz
- Specified by:
setDownloadDirectoryin interfaceSingleCellDetector
-
setRetryPolicy
Description copied from interface:SingleCellDetectorSet the retry policy to use when downloading single-cell data.- Specified by:
setRetryPolicyin interfaceSingleCellDetector
-
setMexFileSuffixes
Set the suffixes to use to detect MEX metadata. -
resetMexFileSuffixes
public void resetMexFileSuffixes() -
setMaxEntrySizeInArchiveToSkip
public void setMaxEntrySizeInArchiveToSkip(long maxNumberOfEntriesToSkip) Description copied from interface:ArchiveBasedSingleCellDetectorSet the maximum size of an archive entry to skip the supplementary file altogether.Use -1 to indicate no limit.
Note that if a relevant file was previously found in the archive, it will not be skipped.
- Specified by:
setMaxEntrySizeInArchiveToSkipin interfaceArchiveBasedSingleCellDetector
-
setMaxNumberOfEntriesToSkip
public void setMaxNumberOfEntriesToSkip(long maxNumberOfEntriesToSkip) Description copied from interface:ArchiveBasedSingleCellDetectorSet the maximum number of archive entries to skip in order to ignore the supplementary file altogether.Use -1 to indicate no limit.
Note that if a relevant file was previously found in the archive, it will not be ignored.
- Specified by:
setMaxNumberOfEntriesToSkipin interfaceArchiveBasedSingleCellDetector
-
setSraFetcher
Set theSraFetcherused to fetch SRA metadata. -
setCellRangerPrefix
-
hasSingleCellData
Detects if a GEO series has single-cell data either at the series-level or in individual samples.- Specified by:
hasSingleCellDatain interfaceSingleCellDetector
-
hasSingleCellData
Description copied from interface:SingleCellDetectorIndicate if the given GEO sample has single-cell data.- Specified by:
hasSingleCellDatain interfaceSingleCellDetector
-
hasSingleCellData
- Specified by:
hasSingleCellDatain interfaceSeriesAwareSingleCellDetector
-
getSingleCellDataType
public SingleCellDataType getSingleCellDataType(GeoSeries series) throws NoSingleCellDataFoundException Determine the type of single-cell data a GEO series contains.- Throws:
NoSingleCellDataFoundException
-
getAllSingleCellDataTypes
Obtain all single-cell data types a GEO series contains. -
downloadSingleCellData
public Path downloadSingleCellData(GeoSeries series) throws NoSingleCellDataFoundException, IOException Download single-cell data from a GEO series to disk.This has to be done prior to
.invalid reference
#getSingleCellDataLoader(GeoSeries)- Specified by:
downloadSingleCellDatain interfaceSingleCellDetector- Returns:
- a directory or file containing the downloaded series data
- Throws:
NoSingleCellDataFoundException- if no single-cell data is found either at the series level or in individual samplesUnsupportedOperationException- if single-cell data is found at the series levelIOException
-
downloadSingleCellData
public void downloadSingleCellData(GeoSeries series, SingleCellDataType dataType, String supplementaryFile) throws IOException - Throws:
IOException
-
downloadSingleCellData
public void downloadSingleCellData(GeoSeries series, SingleCellDataType dataType) throws NoSingleCellDataFoundException, IOException -
downloadSingleCellData
public Path downloadSingleCellData(GeoSeries series, GeoSample sample) throws NoSingleCellDataFoundException, IOException Download a sample in the context of a series.This is only applicable to MEX and Loom.
- Specified by:
downloadSingleCellDatain interfaceSeriesAwareSingleCellDetector- Throws:
NoSingleCellDataFoundExceptionIOException
-
downloadSingleCellData
public Path downloadSingleCellData(GeoSeries series, GeoSample sample, SingleCellDataType dataType) throws NoSingleCellDataFoundException, IOException -
downloadSingleCellData
public Path downloadSingleCellData(GeoSample sample) throws NoSingleCellDataFoundException, IOException Description copied from interface:SingleCellDetectorDownload single-cell data for the given GEO sample.- Specified by:
downloadSingleCellDatain interfaceSingleCellDetector- Returns:
- a directory or file containing the downloaded sample data
- Throws:
NoSingleCellDataFoundException- if there is no single-cell data for the given sampleIOException
-
downloadSingleCellData
public Path downloadSingleCellData(GeoSample sample, SingleCellDataType dataType) throws NoSingleCellDataFoundException, IOException -
getSingleCellDataLoader
public SingleCellDataLoader getSingleCellDataLoader(GeoSeries series, SingleCellDataLoaderConfig config) throws NoSingleCellDataFoundException Obtain a single-cell data loader.Only local files previously retrieved with
downloadSingleCellData(GeoSeries)are inspected.- Specified by:
getSingleCellDataLoaderin interfaceSingleCellDetector- Throws:
NoSingleCellDataFoundException- if no single-cell data was found on-diskUnsupportedOperationException- if single-cell data was found, but cannot be loaded
-
getAdditionalSupplementaryFiles
Description copied from interface:SingleCellDetectorObtain a list of all additional supplementary files.- Specified by:
getAdditionalSupplementaryFilesin interfaceSingleCellDetector
-
getAdditionalSupplementaryFiles
- Specified by:
getAdditionalSupplementaryFilesin interfaceSeriesAwareSingleCellDetector
-
getAdditionalSupplementaryFiles
Description copied from interface:SingleCellDetectorObtain a list of all additional supplementary files.- Specified by:
getAdditionalSupplementaryFilesin interfaceSingleCellDetector
-
hasSingleCellDataInSeries
Check if a GEO series has single-cell data at the series-level. -
hasSingleCellDataInSeries
Check if a GEO series has single-cell data at the series-level. -
hasSingleCellDataInSra
Check if a GEO series has single-cell data in SRA. -
hasSingleCellDataInSra
public boolean hasSingleCellDataInSra(GeoSeries series, Collection<String> sraAccessions, Collection<String> sraAccessionsWithOtherDataTypes) -
hasSingleCellDataInSra
Check if a GEO sample has single-cell data in SRA. -
isSingleCell
Check if a GEO sample is single-cell by looking up its metadata.- Parameters:
hasSingleCellDataInSeries- indicate if the series has single-cell data, this is used as a last resort to determine if a given sample is single-cell, usehasSingleCellDataInSeries(GeoSeries)to compute and reuse this value.
-
isSingleNuclei
Check if a GEO sample is single-nuclei by looking up its metadata.
-