Class BlatAssociationScorer
- java.lang.Object
-
- ubic.gemma.core.analysis.sequence.BlatAssociationScorer
-
public class BlatAssociationScorer extends Object
Given a set of BlatAssociations that might be redundant, clean them up and score them.- Author:
- pavlidis
-
-
Constructor Summary
Constructors Constructor Description BlatAssociationScorer()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static double
computeOverlapFraction(BlatAssociation blatAssociation)
Compute how much the BLAT alignment with the target gene product is as a fraction of the query sequence length.static BlatAssociation
scoreResults(Collection<BlatAssociation> blatAssociations)
From a collection of BlatAssociations from a single BioSequence, reduce redundancy, fill in the specificity and score and pick the one with the best scoring statistics.
-
-
-
Method Detail
-
computeOverlapFraction
public static double computeOverlapFraction(BlatAssociation blatAssociation)
Compute how much the BLAT alignment with the target gene product is as a fraction of the query sequence length. Assumes that the overlap with a transcript has already been computed.- Parameters:
blatAssociation
- blat assoc- Returns:
- double
-
scoreResults
public static BlatAssociation scoreResults(Collection<BlatAssociation> blatAssociations)
From a collection of BlatAssociations from a single BioSequence, reduce redundancy, fill in the specificity and score and pick the one with the best scoring statistics. This is a little complicated because a single sequence can yield many BlatResults to the same gene and/or gene product. We reduce the results down to a single (best) result for any given gene product. We also score specificity by the gene: if a sequence 'hits' multiple genes, then the specificity of the generated associations will be less than 1.- Parameters:
blatAssociations
- for a single sequence.- Returns:
- the highest-scoring result (if there are ties this will be a random one). Note that this return value is not all that useful because it assumes there is a "clear winner". The passed-in blatAssociations will be pruned to remove redundant entries, and will have score information filled in as well. It is intended that these 'refined' BlatAssociations will be used in further analysis.
- Throws:
IllegalArgumentException
- if the blatAssociations are from multiple biosequences.
-
-