Class BiomartEnsemblNcbiFetcher
java.lang.Object
ubic.gemma.core.loader.util.biomart.BiomartEnsemblNcbiFetcher
BioMart is a query-oriented data management system. In our particular case we are using it to map ensembl, ncbi and
hgnc ids. To construct the query we pass the taxon and the attributes we wish to query for. Note the formatting of
taxon for biomart consists of latin name without the point e.g. 'hsapiens'. For more information visit
the biomart website.
Note that Gemma now includes Ensembl ids imported for NCBI genes, using the gene2ensembl file provided by NCBI.
- Author:
- ldonnison
-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionString[]
attributesToRetrieveFromBioMartForProteinQuery
(String biomartTaxonName) Method that based on the taxon supplied constructs an array of attributes that can be queried on.fetch
(Collection<Taxon> taxa) Main method that iterates through each taxon supplied and calls the fetch method for each taxon.fetchFileForProteinQuery
(String bioMartTaxonName) Given a biomart taxon formatted name fetch the file from biomart and save as a local file.getBiomartTaxonName
(Taxon gemmaTaxon) Biomart taxon names are formatted as the scientific name all lowercase with the genus name shortened to one letter and appended to species name E.g.
-
Field Details
-
BIOMARTPATH
- See Also:
-
-
Constructor Details
-
BiomartEnsemblNcbiFetcher
public BiomartEnsemblNcbiFetcher()
-
-
Method Details
-
attributesToRetrieveFromBioMartForProteinQuery
Method that based on the taxon supplied constructs an array of attributes that can be queried on. For example if hsapiens is supplied then hgnc_id can be supplied as a query parameter.- Parameters:
biomartTaxonName
- Biomart formatted taxon name- Returns:
- An Array of strings representing the attributes that can be used to query biomart.
-
fetch
Main method that iterates through each taxon supplied and calls the fetch method for each taxon. Which returns a biomart file for each taxon supplied.- Parameters:
taxa
- Collection of taxa to retrieve biomart files for.- Returns:
- A map of biomart files as stored on local file system keyed on taxon.
- Throws:
IOException
- if there is a problem while manipulating the file
-
fetchFileForProteinQuery
Given a biomart taxon formatted name fetch the file from biomart and save as a local file.- Parameters:
bioMartTaxonName
- taxon name from biomart- Returns:
- biomart file
- Throws:
IOException
- when there is a problem while manipulating the file
-
getBiomartTaxonName
Biomart taxon names are formatted as the scientific name all lowercase with the genus name shortened to one letter and appended to species name E.g. Homo sapiens > hsapiens- Parameters:
gemmaTaxon
- taxon object- Returns:
- Biomart taxon formatted name.
- Throws:
RuntimeException
- The taxon does not contain a valid scientific name.
-