Class BlatAssociationScorer


  • public class BlatAssociationScorer
    extends Object
    Given a set of BlatAssociations that might be redundant, clean them up and score them.
    Author:
    pavlidis
    • Constructor Detail

      • BlatAssociationScorer

        public BlatAssociationScorer()
    • Method Detail

      • computeOverlapFraction

        public static double computeOverlapFraction​(BlatAssociation blatAssociation)
        Compute how much the BLAT alignment with the target gene product is as a fraction of the query sequence length. Assumes that the overlap with a transcript has already been computed.
        Parameters:
        blatAssociation - blat assoc
        Returns:
        double
      • scoreResults

        public static BlatAssociation scoreResults​(Collection<BlatAssociation> blatAssociations)
        From a collection of BlatAssociations from a single BioSequence, reduce redundancy, fill in the specificity and score and pick the one with the best scoring statistics. This is a little complicated because a single sequence can yield many BlatResults to the same gene and/or gene product. We reduce the results down to a single (best) result for any given gene product. We also score specificity by the gene: if a sequence 'hits' multiple genes, then the specificity of the generated associations will be less than 1.
        Parameters:
        blatAssociations - for a single sequence.
        Returns:
        the highest-scoring result (if there are ties this will be a random one). Note that this return value is not all that useful because it assumes there is a "clear winner". The passed-in blatAssociations will be pruned to remove redundant entries, and will have score information filled in as well. It is intended that these 'refined' BlatAssociations will be used in further analysis.
        Throws:
        IllegalArgumentException - if the blatAssociations are from multiple biosequences.