Class BiomartEnsemblNcbiObjectGenerator


  • public class BiomartEnsemblNcbiObjectGenerator
    extends Object
    Class that is responsible for generating a map of BioMartEnsembleNcbiObject value objects which are keyed on ensemble protein id. This BioMartEnsembleNcbiObject object represents a mapping between ensemble protein ids, ensemble gene ids and entrez gene ids. If a bioMartFileName is supplied then biomart fetcher is not called and provided filename is used for parsing, in this scenario only 1 taxon can be processed. If the bioMartFileName is null then all eligible taxa files are downloaded from biomart. Eligible taxa are those that are in gemma and that have usable genes and that are species. Once files have been downloaded or located then those files are parsed into BioMartEnsembleNcbi value objects Note that Gemma now includes Ensembl ids imported for NCBI genes, using the gene2ensembl file provided by NCBI.
    Author:
    ldonnison
    • Field Detail

      • log

        protected final org.apache.commons.logging.Log log
      • biomartEnsemblNcbiFetcher

        protected BiomartEnsemblNcbiFetcher biomartEnsemblNcbiFetcher
        Fetcher is called to download files if bioMartFileName is null
      • bioMartEnsemblNcbiParser

        protected LineParser<Ensembl2NcbiValueObject> bioMartEnsemblNcbiParser
        A biomart parser which is constructed a new for each taxon due to slight file taxon differences
      • bioMartFileName

        protected File bioMartFileName
        If this file name is set then implies that file is local and no remote call should be made to biomart service
    • Constructor Detail

      • BiomartEnsemblNcbiObjectGenerator

        public BiomartEnsemblNcbiObjectGenerator()
        Constructor ensuring that fetcher is set. Even if not fetching files should be set so as to get attribute header information for that particular taxon.
    • Method Detail

      • generate

        public Map<String,​Ensembl2NcbiValueObject> generate​(Collection<Taxon> validTaxa)
                                                           throws IOException
        Main method to generate a map of biomartEnsembleNcbiIds, involves optional fetch from biomart if no file is provided then returns results of parse method. If the fetcher is called then all then all files for eligible taxon are retrieved and the results returned as a map keyed on taxon. This map can be iterated through and files parsed with the specific parser generated for that particular taxon. Currently only different for human. The results for each taxon parsing are combined into a map of BioMartEnsembleNcbi value objects. * If a bioMartFileName file is provided then no iteration is needed and the file is directly parsed.
        Parameters:
        validTaxa - taxa Taxa to retrieve biomart files for.
        Returns:
        Map of BioMartEnsembleNcbi value objects keyed on ensemble peptide id.
        Throws:
        IOException - if there is a problem while manipulating the file
      • parseTaxonBiomartFile

        public Map<String,​Ensembl2NcbiValueObject> parseTaxonBiomartFile​(Taxon taxon,
                                                                               File taxonBiomartFile)
        Method calls the parse method to parse a biomart file. The parser is configurable based on the taxon.
        Parameters:
        taxon - Taxon for which file is for.
        taxonBiomartFile - The biomart file for given taxon
        Returns:
        Map of BioMartEnsembleNcbi value objects for the given taxon keyed on ensembl peptide id
      • getBioMartFileName

        public File getBioMartFileName()
        Returns:
        Get the biomart file name can be null if retrieving all taxons
      • setBioMartFileName

        public void setBioMartFileName​(File bioMartFileName)
        Set a biomart file name can be null if retrieving all taxons
        Parameters:
        bioMartFileName - biomart file name
      • getBioMartEnsemblNcbiFetcher

        public BiomartEnsemblNcbiFetcher getBioMartEnsemblNcbiFetcher()
        Should be set
        Returns:
        the bioMartEnsemblNcbiFetcher
      • setBioMartEnsemblNcbiFetcher

        public void setBioMartEnsemblNcbiFetcher​(BiomartEnsemblNcbiFetcher biomartEnsemblNcbiFetcher)
        Parameters:
        biomartEnsemblNcbiFetcher - the bioMartEnsemblNcbiFetcher to set