Class NcbiGeneHistoryParser

java.lang.Object
ubic.gemma.core.loader.util.parser.BasicLineMapParser<String,NcbiGeneHistory>
ubic.gemma.core.loader.genome.gene.ncbi.NcbiGeneHistoryParser
All Implemented Interfaces:
LineParser<NcbiGeneHistory>, Parser<NcbiGeneHistory>

public class NcbiGeneHistoryParser extends BasicLineMapParser<String,NcbiGeneHistory>
Parse the NCBI "gene_history" file. File format : tax_id, GeneID,Discontinued_GeneID, Discontinued_Symbol, Discontinue_Date; (tab is used as a separator, pound sign - start of a comment) File is obtained from ftp.ncbi.nih.gov.gene/DATA See ncbi readme There are two kinds of lines. Lines with a "-" for the GeneID (the majority) seems to be used when the record was withdrawn (Field is defined as "the current unique identified for a gene"). Lines with a symbol means it was replaced, so far as I can tell.
Author:
paul