Class BiomartEnsemblNcbiObjectGenerator

java.lang.Object
ubic.gemma.core.loader.util.biomart.BiomartEnsemblNcbiObjectGenerator

public class BiomartEnsemblNcbiObjectGenerator extends Object
Class that is responsible for generating a map of BioMartEnsembleNcbiObject value objects which are keyed on ensemble protein id. This BioMartEnsembleNcbiObject object represents a mapping between ensemble protein ids, ensemble gene ids and entrez gene ids. If a bioMartFileName is supplied then biomart fetcher is not called and provided filename is used for parsing, in this scenario only 1 taxon can be processed. If the bioMartFileName is null then all eligible taxa files are downloaded from biomart. Eligible taxa are those that are in gemma and that have usable genes and that are species. Once files have been downloaded or located then those files are parsed into BioMartEnsembleNcbi value objects Note that Gemma now includes Ensembl ids imported for NCBI genes, using the gene2ensembl file provided by NCBI.
Author:
ldonnison
  • Field Details

    • log

      protected final org.apache.commons.logging.Log log
    • biomartEnsemblNcbiFetcher

      protected BiomartEnsemblNcbiFetcher biomartEnsemblNcbiFetcher
      Fetcher is called to download files if bioMartFileName is null
    • bioMartEnsemblNcbiParser

      protected LineParser<Ensembl2NcbiValueObject> bioMartEnsemblNcbiParser
      A biomart parser which is constructed a new for each taxon due to slight file taxon differences
    • bioMartFileName

      protected File bioMartFileName
      If this file name is set then implies that file is local and no remote call should be made to biomart service
  • Constructor Details

    • BiomartEnsemblNcbiObjectGenerator

      public BiomartEnsemblNcbiObjectGenerator()
      Constructor ensuring that fetcher is set. Even if not fetching files should be set so as to get attribute header information for that particular taxon.
  • Method Details

    • generate

      public Map<String,Ensembl2NcbiValueObject> generate(Collection<Taxon> validTaxa) throws IOException
      Main method to generate a map of biomartEnsembleNcbiIds, involves optional fetch from biomart if no file is provided then returns results of parse method. If the fetcher is called then all then all files for eligible taxon are retrieved and the results returned as a map keyed on taxon. This map can be iterated through and files parsed with the specific parser generated for that particular taxon. Currently only different for human. The results for each taxon parsing are combined into a map of BioMartEnsembleNcbi value objects. * If a bioMartFileName file is provided then no iteration is needed and the file is directly parsed.
      Parameters:
      validTaxa - taxa Taxa to retrieve biomart files for.
      Returns:
      Map of BioMartEnsembleNcbi value objects keyed on ensemble peptide id.
      Throws:
      IOException - if there is a problem while manipulating the file
    • generateRemote

      public Map<String,Ensembl2NcbiValueObject> generateRemote(Collection<Taxon> validTaxa) throws IOException
      Parameters:
      validTaxa - valid taxa
      Returns:
      Generates file from remote biomart location
      Throws:
      IOException - if there is a problem while manipulating the file
    • parseTaxonBiomartFile

      public Map<String,Ensembl2NcbiValueObject> parseTaxonBiomartFile(Taxon taxon, File taxonBiomartFile)
      Method calls the parse method to parse a biomart file. The parser is configurable based on the taxon.
      Parameters:
      taxon - Taxon for which file is for.
      taxonBiomartFile - The biomart file for given taxon
      Returns:
      Map of BioMartEnsembleNcbi value objects for the given taxon keyed on ensembl peptide id
    • getBioMartFileName

      public File getBioMartFileName()
      Returns:
      Get the biomart file name can be null if retrieving all taxons
    • setBioMartFileName

      public void setBioMartFileName(File bioMartFileName)
      Set a biomart file name can be null if retrieving all taxons
      Parameters:
      bioMartFileName - biomart file name
    • getBioMartEnsemblNcbiFetcher

      public BiomartEnsemblNcbiFetcher getBioMartEnsemblNcbiFetcher()
      Should be set
      Returns:
      the bioMartEnsemblNcbiFetcher
    • setBioMartEnsemblNcbiFetcher

      public void setBioMartEnsemblNcbiFetcher(BiomartEnsemblNcbiFetcher biomartEnsemblNcbiFetcher)
      Parameters:
      biomartEnsemblNcbiFetcher - the bioMartEnsemblNcbiFetcher to set