Class SequenceManipulation


  • public class SequenceManipulation
    extends Object
    Convenient methods for manipulating BioSequences and PhysicalLocations
    Author:
    pavlidis
    • Constructor Detail

      • SequenceManipulation

        public SequenceManipulation()
    • Method Detail

      • blatFormatChromosomeName

        public static String blatFormatChromosomeName​(String chromosome)
        Puts "chr" prefix on the chromosome name, if need be.
        Parameters:
        chromosome - chromosome
        Returns:
        formatted name
      • stripPolyAorT

        public static String stripPolyAorT​(String sequence,
                                           int thresholdLength)
        Remove a 3' polyA or 5' polyT tail. The entire tail is removed.
        Parameters:
        sequence - sequence
        thresholdLength - to trigger removal.
        Returns:
        processed sequence
      • blatLocationsToIntArray

        public static int[] blatLocationsToIntArray​(String blatLocations)
        Convert a psl-formatted list (comma-delimited) to an int[].
        Parameters:
        blatLocations - locations
        Returns:
        locations
      • collapse

        public static BioSequence collapse​(Collection<Reporter> sequences)
        Convert a CompositeSequence's immobilizedCharacteristics into a single sequence, using a simple merge-join strategy.
        Parameters:
        sequences - sequences
        Returns:
        BioSequence. Not all fields are filled in and must be set by the caller.
      • deBlatFormatChromosomeName

        public static String deBlatFormatChromosomeName​(String chromosome)
        Removes "chr" prefix from the chromosome name, if it is there.
        Parameters:
        chromosome - chromosome
        Returns:
        formatted name
      • findCenter

        public static int findCenter​(String starts,
                                     String sizes)
        Find where the center of a query location is in a gene. This is defined as the location of the center base of the query sequence relative to the 3' end of the gene.
        Parameters:
        sizes - sizes
        starts - starts
        Returns:
        center
      • getGeneExonOverlaps

        public static int getGeneExonOverlaps​(String chromosome,
                                              String starts,
                                              String sizes,
                                              String strand,
                                              Gene gene)
        Given a gene, find out how much of it overlaps with exons provided as starts and sizes. This could involve more than one exon.
        Parameters:
        chromosome - , as "chrX" or "X".
        starts - of the locations we are testing.
        sizes - of the locations we are testing.
        strand - to consider. If null, strand is ignored.
        gene - Gene we are testing
        Returns:
        Number of bases which overlap with exons of the gene. A value of zero indicates that the location is entirely within an intron. If multiple GeneProducts are associated with this gene, the best (highest) overlap is reported).
      • getGeneProductExonOverlap

        public static int getGeneProductExonOverlap​(String starts,
                                                    String sizes,
                                                    String strand,
                                                    GeneProduct geneProduct)
        Compute the overlap of a physical location with a transcript (gene product). This assumes that the chromosome is already matched.
        Parameters:
        starts - of the locations we are testing (in the target, so on the same coordinates as the geneProduct location is scored)
        sizes - of the locations we are testing.
        strand - the strand to look on. If null, strand is ignored.
        geneProduct - GeneProduct we are testing. If strand of PhysicalLocation is null, we ignore strand.
        Returns:
        Total number of bases which overlap with exons of the transcript. A value of zero indicates that the location is entirely within an intron, or the strand is wrong.
      • rightHandOverlap

        public static int rightHandOverlap​(BioSequence target,
                                           BioSequence query)
        Compute just any overlap the compare sequence has with the target on the right side.
        Parameters:
        query - query
        target - target
        Returns:
        right overlap
      • reverseComplement

        public static String reverseComplement​(String sequence)
      • totalSize

        public static int totalSize​(String sizes)
        Parameters:
        sizes - Blat-formatted string of sizes (comma-delimited)
        Returns:
        total size
      • computeOverlap

        public static int computeOverlap​(PhysicalLocation a,
                                         PhysicalLocation b)
        Compute the overlap between two physical locations. If both do not have a length the overlap is zero unless they point to exactly the same nucleotide location, in which case the overlap is 1.
        Parameters:
        a - a
        b - b
        Returns:
        overlap