Class BlatAssociationScorer
- java.lang.Object
-
- ubic.gemma.core.analysis.sequence.BlatAssociationScorer
-
public class BlatAssociationScorer extends Object
Given a set of BlatAssociations that might be redundant, clean them up and score them.- Author:
- pavlidis
-
-
Constructor Summary
Constructors Constructor Description BlatAssociationScorer()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static double
computeOverlapFraction(BlatAssociation blatAssociation)
Compute how much the BLAT alignment with the target gene product is as a fraction of the query sequence length.static Double
identity(BlatResult blatResult)
Fraction identity computation, as in psl.c.static Double
score(BlatResult blatResult)
Based on the JKSrc method in psl.c, but without double-penalizing for mismatches.static BlatAssociation
scoreResults(Collection<BlatAssociation> blatAssociations)
From a collection of BlatAssociations from a single BioSequence, reduce redundancy, fill in the specificity and score and pick the one with the best scoring statistics.
-
-
-
Method Detail
-
computeOverlapFraction
public static double computeOverlapFraction(BlatAssociation blatAssociation)
Compute how much the BLAT alignment with the target gene product is as a fraction of the query sequence length. Assumes that the overlap with a transcript has already been computed.- Parameters:
blatAssociation
- blat assoc- Returns:
- double
-
scoreResults
public static BlatAssociation scoreResults(Collection<BlatAssociation> blatAssociations)
From a collection of BlatAssociations from a single BioSequence, reduce redundancy, fill in the specificity and score and pick the one with the best scoring statistics. This is a little complicated because a single sequence can yield many BlatResults to the same gene and/or gene product. We reduce the results down to a single (best) result for any given gene product. We also score specificity by the gene: if a sequence 'hits' multiple genes, then the specificity of the generated associations will be less than 1.- Parameters:
blatAssociations
- for a single sequence.- Returns:
- the highest-scoring result (if there are ties this will be a random one). Note that this return value is not all that useful because it assumes there is a "clear winner". The passed-in blatAssociations will be pruned to remove redundant entries, and will have score information filled in as well. It is intended that these 'refined' BlatAssociations will be used in further analysis.
- Throws:
IllegalArgumentException
- if the blatAssociations are from multiple biosequences.
-
score
public static Double score(BlatResult blatResult)
Based on the JKSrc method in psl.c, but without double-penalizing for mismatches. We also consider repeat matches to be the same as regular matches.- Parameters:
blatResult
-- Returns:
- Value between 0 and 1, representing the fraction of matches, minus a gap penalty.
-
identity
public static Double identity(BlatResult blatResult)
Fraction identity computation, as in psl.c. Modified to INCLUDE repeat matches in the match count. See Blat4 at UCSC.- Parameters:
blatResult
-- Returns:
- Value between 0 and 1.
-
-