Package ubic.gemma.core.analysis.service
Interface ArrayDesignAnnotationService
- All Known Implementing Classes:
ArrayDesignAnnotationServiceImpl
public interface ArrayDesignAnnotationService
Methods to generate annotations for array designs, based on information already in the database. This can be used to
generate annotation files used for ermineJ, for example. The file format:
- The file is tab-delimited text. Comma-delimited files or Excel spreadsheets (for example) are not supported.
- There is a one-line header included in the file for readability.
- The first column contains the probe identifier
- The second column contains a gene symbol(s). Clusters are delimited by '|' and genes within clusters are delimited by ','
- The third column contains the gene names (or description). Clusters are delimited by '|' and names within clusters are delimited by '$'
- The fourth column contains a delimited list of GO identifiers. These include the "GO:" prefix. Thus they read "GO:00494494" and not "494494". Delimited by '|'.
Note that for backwards compatibility, GO terms are not segregated by gene cluster.
- Author:
- paul
-
Nested Class Summary
Nested Classes -
Field Summary
Fields -
Method Summary
Modifier and TypeMethodDescriptionvoidcreate(ArrayDesign inputAd, Boolean useGO, boolean deleteOtherFiles) Create (or update) all the annotation files for the given platform.voiddeleteExistingFiles(ArrayDesign arrayDesign) intgenerateAnnotationFile(Writer writer, Collection<Gene> genes, Boolean useGO) Generate an annotation for a list of genes, instead of probes.Obtain the directory where platform annotations are located.readAnnotationFile(ArrayDesign arrayDesign) This tries to read one of the annotation files (noparents, bioprocess or regular) to get the gene information - GO annotations are not part of the result.
-
Field Details
-
ANNOTATION_FILE_SUFFIX
- See Also:
-
BIO_PROCESS_FILE_SUFFIX
- See Also:
-
NO_PARENTS_FILE_SUFFIX
- See Also:
-
STANDARD_FILE_SUFFIX
String included in file names for standard (default) annotation files. These include GO terms and all parents.- See Also:
-
-
Method Details
-
getAnnotDataDir
Path getAnnotDataDir()Obtain the directory where platform annotations are located. -
deleteExistingFiles
-
readAnnotationFile
This tries to read one of the annotation files (noparents, bioprocess or regular) to get the gene information - GO annotations are not part of the result.- Parameters:
arrayDesign- array design- Returns:
- Map of composite sequence ids to an array of delimited strings: [probe name,genes symbol, gene Name, gemma gene id, ncbi id] for a given probe id. format of string is geneSymbol then geneNames same as found in annotation file.
- Throws:
IOException
-
create
Create (or update) all the annotation files for the given platform. Side effect: any expression experiment data files that use this platform will be deleted. Format details: There is a one-line header. The columns are:- Probe name
- Gene symbol. Genes located at different genome locations are delimited by "|"; multiple genes at the same location are delimited by ",". Both can happen simultaneously.
- Gene name, delimited as for the symbol except '$' is used instead of ','.
- GO terms, delimited by '|'; multiple genes are not handled specially (for compatibility with ermineJ) -- unless useGO is false
- Gemma's gene ids, delimited by '|'
- NCBI gene ids, delimited by '|'
- Ensembl gene ids, delimited by '|'
- Parameters:
inputAd- platform to processuseGO- if true, GO terms will be includeddeleteOtherFiles- if true, other files conaining the annotations for this platform will be deleted, such as DEA results and data flat files.- Throws:
IOException
-
generateAnnotationFile
Generate an annotation for a list of genes, instead of probes. The second column will contain the NCBI id, if available. Will generate the 'short' version.- Parameters:
writer- the writergenes- genesuseGO- if true, GO terms will be included- Returns:
- code
-