Class BiomartEnsemblNcbiFetcher


  • public class BiomartEnsemblNcbiFetcher
    extends Object
    BioMart is a query-oriented data management system. In our particular case we are using it to map ensembl, ncbi and hgnc ids. To construct the query we pass the taxon and the attributes we wish to query for. Note the formatting of taxon for biomart consists of latin name without the point e.g. 'hsapiens'. For more information visit the biomart website. Note that Gemma now includes Ensembl ids imported for NCBI genes, using the gene2ensembl file provided by NCBI.
    Author:
    ldonnison
    • Constructor Detail

      • BiomartEnsemblNcbiFetcher

        public BiomartEnsemblNcbiFetcher()
    • Method Detail

      • attributesToRetrieveFromBioMartForProteinQuery

        public String[] attributesToRetrieveFromBioMartForProteinQuery​(String biomartTaxonName)
        Method that based on the taxon supplied constructs an array of attributes that can be queried on. For example if hsapiens is supplied then hgnc_id can be supplied as a query parameter.
        Parameters:
        biomartTaxonName - Biomart formatted taxon name
        Returns:
        An Array of strings representing the attributes that can be used to query biomart.
      • fetch

        public Map<Taxon,​File> fetch​(Collection<Taxon> taxa)
                                    throws IOException
        Main method that iterates through each taxon supplied and calls the fetch method for each taxon. Which returns a biomart file for each taxon supplied.
        Parameters:
        taxa - Collection of taxa to retrieve biomart files for.
        Returns:
        A map of biomart files as stored on local file system keyed on taxon.
        Throws:
        IOException - if there is a problem while manipulating the file
      • fetchFileForProteinQuery

        public File fetchFileForProteinQuery​(String bioMartTaxonName)
                                      throws IOException
        Given a biomart taxon formatted name fetch the file from biomart and save as a local file.
        Parameters:
        bioMartTaxonName - taxon name from biomart
        Returns:
        biomart file
        Throws:
        IOException - when there is a problem while manipulating the file
      • getBiomartTaxonName

        public String getBiomartTaxonName​(Taxon gemmaTaxon)
        Biomart taxon names are formatted as the scientific name all lowercase with the genus name shortened to one letter and appended to species name E.g. Homo sapiens > hsapiens
        Parameters:
        gemmaTaxon - taxon object
        Returns:
        Biomart taxon formatted name.
        Throws:
        RuntimeException - The taxon does not contain a valid scientific name.