Class BiomartEnsemblNcbiObjectGenerator
- java.lang.Object
-
- ubic.gemma.core.loader.util.biomart.BiomartEnsemblNcbiObjectGenerator
-
public class BiomartEnsemblNcbiObjectGenerator extends Object
Class that is responsible for generating a map of BioMartEnsembleNcbiObject value objects which are keyed on ensemble protein id. This BioMartEnsembleNcbiObject object represents a mapping between ensemble protein ids, ensemble gene ids and entrez gene ids. If a bioMartFileName is supplied then biomart fetcher is not called and provided filename is used for parsing, in this scenario only 1 taxon can be processed. If the bioMartFileName is null then all eligible taxa files are downloaded from biomart. Eligible taxa are those that are in gemma and that have usable genes and that are species. Once files have been downloaded or located then those files are parsed into BioMartEnsembleNcbi value objects Note that Gemma now includes Ensembl ids imported for NCBI genes, using the gene2ensembl file provided by NCBI.- Author:
- ldonnison
-
-
Field Summary
Fields Modifier and Type Field Description protected BiomartEnsemblNcbiFetcher
biomartEnsemblNcbiFetcher
Fetcher is called to download files if bioMartFileName is nullprotected LineParser<Ensembl2NcbiValueObject>
bioMartEnsemblNcbiParser
A biomart parser which is constructed a new for each taxon due to slight file taxon differencesprotected File
bioMartFileName
If this file name is set then implies that file is local and no remote call should be made to biomart serviceprotected org.apache.commons.logging.Log
log
-
Constructor Summary
Constructors Constructor Description BiomartEnsemblNcbiObjectGenerator()
Constructor ensuring that fetcher is set.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Map<String,Ensembl2NcbiValueObject>
generate(Collection<Taxon> validTaxa)
Main method to generate a map of biomartEnsembleNcbiIds, involves optional fetch from biomart if no file is provided then returns results of parse method.Map<String,Ensembl2NcbiValueObject>
generateRemote(Collection<Taxon> validTaxa)
BiomartEnsemblNcbiFetcher
getBioMartEnsemblNcbiFetcher()
Should be setFile
getBioMartFileName()
Map<String,Ensembl2NcbiValueObject>
parseTaxonBiomartFile(Taxon taxon, File taxonBiomartFile)
Method calls the parse method to parse a biomart file.void
setBioMartEnsemblNcbiFetcher(BiomartEnsemblNcbiFetcher biomartEnsemblNcbiFetcher)
void
setBioMartFileName(File bioMartFileName)
Set a biomart file name can be null if retrieving all taxons
-
-
-
Field Detail
-
log
protected final org.apache.commons.logging.Log log
-
biomartEnsemblNcbiFetcher
protected BiomartEnsemblNcbiFetcher biomartEnsemblNcbiFetcher
Fetcher is called to download files if bioMartFileName is null
-
bioMartEnsemblNcbiParser
protected LineParser<Ensembl2NcbiValueObject> bioMartEnsemblNcbiParser
A biomart parser which is constructed a new for each taxon due to slight file taxon differences
-
bioMartFileName
protected File bioMartFileName
If this file name is set then implies that file is local and no remote call should be made to biomart service
-
-
Method Detail
-
generate
public Map<String,Ensembl2NcbiValueObject> generate(Collection<Taxon> validTaxa) throws IOException
Main method to generate a map of biomartEnsembleNcbiIds, involves optional fetch from biomart if no file is provided then returns results of parse method. If the fetcher is called then all then all files for eligible taxon are retrieved and the results returned as a map keyed on taxon. This map can be iterated through and files parsed with the specific parser generated for that particular taxon. Currently only different for human. The results for each taxon parsing are combined into a map of BioMartEnsembleNcbi value objects. * If a bioMartFileName file is provided then no iteration is needed and the file is directly parsed.- Parameters:
validTaxa
- taxa Taxa to retrieve biomart files for.- Returns:
- Map of BioMartEnsembleNcbi value objects keyed on ensemble peptide id.
- Throws:
IOException
- if there is a problem while manipulating the file
-
generateRemote
public Map<String,Ensembl2NcbiValueObject> generateRemote(Collection<Taxon> validTaxa) throws IOException
- Parameters:
validTaxa
- valid taxa- Returns:
- Generates file from remote biomart location
- Throws:
IOException
- if there is a problem while manipulating the file
-
parseTaxonBiomartFile
public Map<String,Ensembl2NcbiValueObject> parseTaxonBiomartFile(Taxon taxon, File taxonBiomartFile)
Method calls the parse method to parse a biomart file. The parser is configurable based on the taxon.- Parameters:
taxon
- Taxon for which file is for.taxonBiomartFile
- The biomart file for given taxon- Returns:
- Map of BioMartEnsembleNcbi value objects for the given taxon keyed on ensembl peptide id
-
getBioMartFileName
public File getBioMartFileName()
- Returns:
- Get the biomart file name can be null if retrieving all taxons
-
setBioMartFileName
public void setBioMartFileName(File bioMartFileName)
Set a biomart file name can be null if retrieving all taxons- Parameters:
bioMartFileName
- biomart file name
-
getBioMartEnsemblNcbiFetcher
public BiomartEnsemblNcbiFetcher getBioMartEnsemblNcbiFetcher()
Should be set- Returns:
- the bioMartEnsemblNcbiFetcher
-
setBioMartEnsemblNcbiFetcher
public void setBioMartEnsemblNcbiFetcher(BiomartEnsemblNcbiFetcher biomartEnsemblNcbiFetcher)
- Parameters:
biomartEnsemblNcbiFetcher
- the bioMartEnsemblNcbiFetcher to set
-
-