Class BiomartEnsembleNcbiParser
java.lang.Object
ubic.gemma.core.loader.util.parser.BasicLineMapParser<String,Ensembl2NcbiValueObject>
ubic.gemma.core.loader.util.parser.LineMapParser<String,Ensembl2NcbiValueObject>
ubic.gemma.core.loader.util.biomart.BiomartEnsembleNcbiParser
- All Implemented Interfaces:
LineParser<Ensembl2NcbiValueObject>
,Parser<Ensembl2NcbiValueObject>
Parser for BioMart file. The taxon and the attributes in the file are essential for construction so that the parser
is configured to parse the file in the correct fashion for the taxon. The biomart file is taxon spefic which means
that the file is generated from bioamrt after providing taxon as a query parameter. It is of the gemma type
LineMapParser which means that after parsing a Map of BioMartEnsembleNcbi value objects are returned keyed on ensembl
peptide id.
Parsing is triggered by calling super class method parse which then calls child method parse oneline.
- Author:
- ldonnison
-
Field Summary
Fields inherited from class ubic.gemma.core.loader.util.parser.BasicLineMapParser
COMMENT_MARK, log
Fields inherited from interface ubic.gemma.core.loader.util.parser.LineParser
MIN_PARSED_LINES_FOR_UPDATE, PARSE_ALERT_TIME_FREQUENCY_MS
Fields inherited from interface ubic.gemma.core.loader.util.parser.Parser
PARSE_ALERT_FREQUENCY
-
Constructor Summary
ConstructorsConstructorDescriptionBiomartEnsembleNcbiParser
(Taxon taxon, String[] attributesInFile) Class needs to be initialised with taxon and which attributes have been used in query for biomart and thus what columns are in this file. -
Method Summary
Modifier and TypeMethodDescriptionboolean
containsKey
(String key) Method that returns a particular BioMartEnsembleNcbi based on a peptide id.createBioMartEnsembleNcbi
(String[] fields) Given an array of strings representing the line to parse then create a BioMartEnsembleNcbi value object with some validation.Method that returns a particular BioMartEnsembleNcbi based on a peptide id.String[]
int
Based on what attributes were set on the original file then calculate how many columns should be in file.Getter for values in map that is BioMartEnsembleNcbi value objects associated with the parsing of this filegetMap()
Getter for values in map that is BioMartEnsembleNcbi value objects associated with the parsing of this fileparseOneLine
(String line) Method to parse one biomart line, note that there is a many to many relationship between ensemble ids and entrez gene ids.void
setBioMartFields
(String[] bioMartFields) void
Methods inherited from class ubic.gemma.core.loader.util.parser.LineMapParser
getKey, parse, put
Methods inherited from class ubic.gemma.core.loader.util.parser.BasicLineMapParser
parse, parse
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface ubic.gemma.core.loader.util.parser.Parser
getUniqueResult
-
Constructor Details
-
BiomartEnsembleNcbiParser
Class needs to be initialised with taxon and which attributes have been used in query for biomart and thus what columns are in this file.- Parameters:
taxon
- Taxon for the current file being processedattributesInFile
- The attributes that were queried for in Biomart
-
-
Method Details
-
containsKey
Method that returns a particular BioMartEnsembleNcbi based on a peptide id.- Specified by:
containsKey
in classBasicLineMapParser<String,
Ensembl2NcbiValueObject> - Returns:
- boolean to indicate whether map contains particular peptide key.
-
get
Method that returns a particular BioMartEnsembleNcbi based on a peptide id.- Specified by:
get
in classBasicLineMapParser<String,
Ensembl2NcbiValueObject> - Returns:
- BioMartEnsembleNcbi associated with that peptide id.
-
getKeySet
Getter for values in map that is BioMartEnsembleNcbi value objects associated with the parsing of this file- Specified by:
getKeySet
in classBasicLineMapParser<String,
Ensembl2NcbiValueObject> - Returns:
- Collection of Strings representing the peptide ids in the map
-
getResults
Getter for values in map that is BioMartEnsembleNcbi value objects associated with the parsing of this file- Specified by:
getResults
in interfaceParser<Ensembl2NcbiValueObject>
- Specified by:
getResults
in classBasicLineMapParser<String,
Ensembl2NcbiValueObject> - Returns:
- Collection of BioMartEnsembleNcbi value objects
-
parseOneLine
Method to parse one biomart line, note that there is a many to many relationship between ensemble ids and entrez gene ids.- Specified by:
parseOneLine
in interfaceLineParser<Ensembl2NcbiValueObject>
- Specified by:
parseOneLine
in classBasicLineMapParser<String,
Ensembl2NcbiValueObject> - Parameters:
line
- line to parse- Returns:
- BioMartEnsembleNcbi Value object representing the line parsed
-
createBioMartEnsembleNcbi
public Ensembl2NcbiValueObject createBioMartEnsembleNcbi(String[] fields) throws NumberFormatException, FileFormatException Given an array of strings representing the line to parse then create a BioMartEnsembleNcbi value object with some validation. That is if a duplicate record keyed on peptide id is found then that means that it maps to more than one entrez gene id. As such check that the duplicate and currently processed record share the same ensemble gene id as a sanity check. Add the entrez gene to the existing collection of entrez genes.- Parameters:
fields
- Parsed line split on delimiter- Returns:
- BioMartEnsembleNcbi value object
- Throws:
NumberFormatException
- Parsing a number that is not oneFileFormatException
- Validation than when a duplicate record is found then the peptide id is the same the ensemble gene id should be the same.
-
getBioMartFields
-
setBioMartFields
-
getBioMartFieldsPerRow
public int getBioMartFieldsPerRow()Based on what attributes were set on the original file then calculate how many columns should be in file.- Returns:
- Number of columns in file.
-
getMap
-
setTaxon
-