Class BiomartEnsemblNcbiObjectGenerator
java.lang.Object
ubic.gemma.core.loader.util.biomart.BiomartEnsemblNcbiObjectGenerator
Class that is responsible for generating a map of BioMartEnsembleNcbiObject value objects which are keyed on ensemble
protein id. This BioMartEnsembleNcbiObject object represents a mapping between ensemble protein ids, ensemble gene
ids and entrez gene ids.
If a bioMartFileName is supplied then biomart fetcher is not called and provided filename is used for parsing, in
this scenario only 1 taxon can be processed. If the bioMartFileName is null then all eligible taxa files are
downloaded from biomart. Eligible taxa are those that are in gemma and that have usable genes and that are species.
Once files have been downloaded or located then those files are parsed into BioMartEnsembleNcbi value objects
Note that Gemma now includes Ensembl ids imported for NCBI genes, using the gene2ensembl file provided by NCBI.
- Author:
- ldonnison
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected BiomartEnsemblNcbiFetcher
Fetcher is called to download files if bioMartFileName is nullprotected LineParser
<Ensembl2NcbiValueObject> A biomart parser which is constructed a new for each taxon due to slight file taxon differencesprotected File
If this file name is set then implies that file is local and no remote call should be made to biomart serviceprotected final org.apache.commons.logging.Log
-
Constructor Summary
ConstructorsConstructorDescriptionConstructor ensuring that fetcher is set. -
Method Summary
Modifier and TypeMethodDescriptiongenerate
(Collection<Taxon> validTaxa) Main method to generate a map of biomartEnsembleNcbiIds, involves optional fetch from biomart if no file is provided then returns results of parse method.generateRemote
(Collection<Taxon> validTaxa) Should be setparseTaxonBiomartFile
(Taxon taxon, File taxonBiomartFile) Method calls the parse method to parse a biomart file.void
setBioMartEnsemblNcbiFetcher
(BiomartEnsemblNcbiFetcher biomartEnsemblNcbiFetcher) void
setBioMartFileName
(File bioMartFileName) Set a biomart file name can be null if retrieving all taxons
-
Field Details
-
log
protected final org.apache.commons.logging.Log log -
biomartEnsemblNcbiFetcher
Fetcher is called to download files if bioMartFileName is null -
bioMartEnsemblNcbiParser
A biomart parser which is constructed a new for each taxon due to slight file taxon differences -
bioMartFileName
If this file name is set then implies that file is local and no remote call should be made to biomart service
-
-
Constructor Details
-
BiomartEnsemblNcbiObjectGenerator
public BiomartEnsemblNcbiObjectGenerator()Constructor ensuring that fetcher is set. Even if not fetching files should be set so as to get attribute header information for that particular taxon.
-
-
Method Details
-
generate
Main method to generate a map of biomartEnsembleNcbiIds, involves optional fetch from biomart if no file is provided then returns results of parse method. If the fetcher is called then all then all files for eligible taxon are retrieved and the results returned as a map keyed on taxon. This map can be iterated through and files parsed with the specific parser generated for that particular taxon. Currently only different for human. The results for each taxon parsing are combined into a map of BioMartEnsembleNcbi value objects. * If a bioMartFileName file is provided then no iteration is needed and the file is directly parsed.- Parameters:
validTaxa
- taxa Taxa to retrieve biomart files for.- Returns:
- Map of BioMartEnsembleNcbi value objects keyed on ensemble peptide id.
- Throws:
IOException
- if there is a problem while manipulating the file
-
generateRemote
public Map<String,Ensembl2NcbiValueObject> generateRemote(Collection<Taxon> validTaxa) throws IOException - Parameters:
validTaxa
- valid taxa- Returns:
- Generates file from remote biomart location
- Throws:
IOException
- if there is a problem while manipulating the file
-
parseTaxonBiomartFile
public Map<String,Ensembl2NcbiValueObject> parseTaxonBiomartFile(Taxon taxon, File taxonBiomartFile) Method calls the parse method to parse a biomart file. The parser is configurable based on the taxon.- Parameters:
taxon
- Taxon for which file is for.taxonBiomartFile
- The biomart file for given taxon- Returns:
- Map of BioMartEnsembleNcbi value objects for the given taxon keyed on ensembl peptide id
-
getBioMartFileName
- Returns:
- Get the biomart file name can be null if retrieving all taxons
-
setBioMartFileName
Set a biomart file name can be null if retrieving all taxons- Parameters:
bioMartFileName
- biomart file name
-
getBioMartEnsemblNcbiFetcher
Should be set- Returns:
- the bioMartEnsemblNcbiFetcher
-
setBioMartEnsemblNcbiFetcher
- Parameters:
biomartEnsemblNcbiFetcher
- the bioMartEnsemblNcbiFetcher to set
-