Class BiomartEnsemblNcbiFetcher

java.lang.Object
ubic.gemma.core.loader.util.biomart.BiomartEnsemblNcbiFetcher

public class BiomartEnsemblNcbiFetcher extends Object
BioMart is a query-oriented data management system. In our particular case we are using it to map ensembl, ncbi and hgnc ids. To construct the query we pass the taxon and the attributes we wish to query for. Note the formatting of taxon for biomart consists of latin name without the point e.g. 'hsapiens'. For more information visit the biomart website. Note that Gemma now includes Ensembl ids imported for NCBI genes, using the gene2ensembl file provided by NCBI.
Author:
ldonnison
  • Field Details

  • Constructor Details

    • BiomartEnsemblNcbiFetcher

      public BiomartEnsemblNcbiFetcher()
  • Method Details

    • attributesToRetrieveFromBioMartForProteinQuery

      public String[] attributesToRetrieveFromBioMartForProteinQuery(String biomartTaxonName)
      Method that based on the taxon supplied constructs an array of attributes that can be queried on. For example if hsapiens is supplied then hgnc_id can be supplied as a query parameter.
      Parameters:
      biomartTaxonName - Biomart formatted taxon name
      Returns:
      An Array of strings representing the attributes that can be used to query biomart.
    • fetch

      public Map<Taxon,File> fetch(Collection<Taxon> taxa) throws IOException
      Main method that iterates through each taxon supplied and calls the fetch method for each taxon. Which returns a biomart file for each taxon supplied.
      Parameters:
      taxa - Collection of taxa to retrieve biomart files for.
      Returns:
      A map of biomart files as stored on local file system keyed on taxon.
      Throws:
      IOException - if there is a problem while manipulating the file
    • fetchFileForProteinQuery

      public File fetchFileForProteinQuery(String bioMartTaxonName) throws IOException
      Given a biomart taxon formatted name fetch the file from biomart and save as a local file.
      Parameters:
      bioMartTaxonName - taxon name from biomart
      Returns:
      biomart file
      Throws:
      IOException - when there is a problem while manipulating the file
    • getBiomartTaxonName

      public String getBiomartTaxonName(Taxon gemmaTaxon)
      Biomart taxon names are formatted as the scientific name all lowercase with the genus name shortened to one letter and appended to species name E.g. Homo sapiens > hsapiens
      Parameters:
      gemmaTaxon - taxon object
      Returns:
      Biomart taxon formatted name.
      Throws:
      RuntimeException - The taxon does not contain a valid scientific name.