Class BiomartEnsembleNcbiParser
- java.lang.Object
-
- ubic.gemma.core.loader.util.parser.BasicLineMapParser<K,T>
-
- ubic.gemma.core.loader.util.parser.LineMapParser<String,Ensembl2NcbiValueObject>
-
- ubic.gemma.core.loader.util.biomart.BiomartEnsembleNcbiParser
-
- All Implemented Interfaces:
LineParser<Ensembl2NcbiValueObject>
,Parser<Ensembl2NcbiValueObject>
public class BiomartEnsembleNcbiParser extends LineMapParser<String,Ensembl2NcbiValueObject>
Parser for BioMart file. The taxon and the attributes in the file are essential for construction so that the parser is configured to parse the file in the correct fashion for the taxon. The biomart file is taxon spefic which means that the file is generated from bioamrt after providing taxon as a query parameter. It is of the gemma type LineMapParser which means that after parsing a Map of BioMartEnsembleNcbi value objects are returned keyed on ensembl peptide id. Parsing is triggered by calling super class method parse which then calls child method parse oneline.- Author:
- ldonnison
-
-
Field Summary
-
Fields inherited from class ubic.gemma.core.loader.util.parser.BasicLineMapParser
COMMENT_MARK, log
-
Fields inherited from interface ubic.gemma.core.loader.util.parser.LineParser
MIN_PARSED_LINES_FOR_UPDATE, PARSE_ALERT_TIME_FREQUENCY_MS
-
Fields inherited from interface ubic.gemma.core.loader.util.parser.Parser
PARSE_ALERT_FREQUENCY
-
-
Constructor Summary
Constructors Constructor Description BiomartEnsembleNcbiParser(Taxon taxon, String[] attributesInFile)
Class needs to be initialised with taxon and which attributes have been used in query for biomart and thus what columns are in this file.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
containsKey(String key)
Method that returns a particular BioMartEnsembleNcbi based on a peptide id.Ensembl2NcbiValueObject
createBioMartEnsembleNcbi(String[] fields)
Given an array of strings representing the line to parse then create a BioMartEnsembleNcbi value object with some validation.Ensembl2NcbiValueObject
get(String key)
Method that returns a particular BioMartEnsembleNcbi based on a peptide id.String[]
getBioMartFields()
int
getBioMartFieldsPerRow()
Based on what attributes were set on the original file then calculate how many columns should be in file.Collection<String>
getKeySet()
Getter for values in map that is BioMartEnsembleNcbi value objects associated with the parsing of this fileMap<String,Ensembl2NcbiValueObject>
getMap()
Collection<Ensembl2NcbiValueObject>
getResults()
Getter for values in map that is BioMartEnsembleNcbi value objects associated with the parsing of this fileEnsembl2NcbiValueObject
parseOneLine(String line)
Method to parse one biomart line, note that there is a many to many relationship between ensemble ids and entrez gene ids.void
setBioMartFields(String[] bioMartFields)
void
setTaxon(Taxon taxon)
-
Methods inherited from class ubic.gemma.core.loader.util.parser.LineMapParser
getKey, parse, put
-
Methods inherited from class ubic.gemma.core.loader.util.parser.BasicLineMapParser
parse, parse
-
-
-
-
Constructor Detail
-
BiomartEnsembleNcbiParser
public BiomartEnsembleNcbiParser(Taxon taxon, String[] attributesInFile)
Class needs to be initialised with taxon and which attributes have been used in query for biomart and thus what columns are in this file.- Parameters:
taxon
- Taxon for the current file being processedattributesInFile
- The attributes that were queried for in Biomart
-
-
Method Detail
-
containsKey
public boolean containsKey(String key)
Method that returns a particular BioMartEnsembleNcbi based on a peptide id.- Specified by:
containsKey
in classBasicLineMapParser<String,Ensembl2NcbiValueObject>
- Returns:
- boolean to indicate whether map contains particular peptide key.
-
get
public Ensembl2NcbiValueObject get(String key)
Method that returns a particular BioMartEnsembleNcbi based on a peptide id.- Specified by:
get
in classBasicLineMapParser<String,Ensembl2NcbiValueObject>
- Returns:
- BioMartEnsembleNcbi associated with that peptide id.
-
getKeySet
public Collection<String> getKeySet()
Getter for values in map that is BioMartEnsembleNcbi value objects associated with the parsing of this file- Specified by:
getKeySet
in classBasicLineMapParser<String,Ensembl2NcbiValueObject>
- Returns:
- Collection of Strings representing the peptide ids in the map
-
getResults
public Collection<Ensembl2NcbiValueObject> getResults()
Getter for values in map that is BioMartEnsembleNcbi value objects associated with the parsing of this file- Specified by:
getResults
in interfaceParser<Ensembl2NcbiValueObject>
- Specified by:
getResults
in classBasicLineMapParser<String,Ensembl2NcbiValueObject>
- Returns:
- Collection of BioMartEnsembleNcbi value objects
-
parseOneLine
public Ensembl2NcbiValueObject parseOneLine(String line)
Method to parse one biomart line, note that there is a many to many relationship between ensemble ids and entrez gene ids.- Specified by:
parseOneLine
in interfaceLineParser<Ensembl2NcbiValueObject>
- Specified by:
parseOneLine
in classBasicLineMapParser<String,Ensembl2NcbiValueObject>
- Parameters:
line
- line to parse- Returns:
- BioMartEnsembleNcbi Value object representing the line parsed
-
createBioMartEnsembleNcbi
public Ensembl2NcbiValueObject createBioMartEnsembleNcbi(String[] fields) throws NumberFormatException, FileFormatException
Given an array of strings representing the line to parse then create a BioMartEnsembleNcbi value object with some validation. That is if a duplicate record keyed on peptide id is found then that means that it maps to more than one entrez gene id. As such check that the duplicate and currently processed record share the same ensemble gene id as a sanity check. Add the entrez gene to the existing collection of entrez genes.- Parameters:
fields
- Parsed line split on delimiter- Returns:
- BioMartEnsembleNcbi value object
- Throws:
NumberFormatException
- Parsing a number that is not oneFileFormatException
- Validation than when a duplicate record is found then the peptide id is the same the ensemble gene id should be the same.
-
getBioMartFields
public String[] getBioMartFields()
-
setBioMartFields
public void setBioMartFields(String[] bioMartFields)
-
getBioMartFieldsPerRow
public int getBioMartFieldsPerRow()
Based on what attributes were set on the original file then calculate how many columns should be in file.- Returns:
- Number of columns in file.
-
getMap
public Map<String,Ensembl2NcbiValueObject> getMap()
-
setTaxon
public void setTaxon(Taxon taxon)
-
-