Class SequenceManipulation
- java.lang.Object
-
- ubic.gemma.core.analysis.sequence.SequenceManipulation
-
public class SequenceManipulation extends Object
Convenient methods for manipulating BioSequences and PhysicalLocations- Author:
- pavlidis
-
-
Constructor Summary
Constructors Constructor Description SequenceManipulation()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static String
blatFormatChromosomeName(String chromosome)
Puts "chr" prefix on the chromosome name, if need be.static int[]
blatLocationsToIntArray(String blatLocations)
Convert a psl-formatted list (comma-delimited) to an int[].static BioSequence
collapse(Collection<Reporter> sequences)
Convert a CompositeSequence's immobilizedCharacteristics into a single sequence, using a simple merge-join strategy.static int
computeOverlap(PhysicalLocation a, PhysicalLocation b)
Compute the overlap between two physical locations.static String
deBlatFormatChromosomeName(String chromosome)
Removes "chr" prefix from the chromosome name, if it is there.static int
findCenter(String starts, String sizes)
Find where the center of a query location is in a gene.static int
getGeneExonOverlaps(String chromosome, String starts, String sizes, String strand, Gene gene)
Given a gene, find out how much of it overlaps with exons provided as starts and sizes.static int
getGeneProductExonOverlap(String starts, String sizes, String strand, GeneProduct geneProduct)
Compute the overlap of a physical location with a transcript (gene product).static String
reverseComplement(String sequence)
static int
rightHandOverlap(BioSequence target, BioSequence query)
Compute just any overlap the compare sequence has with the target on the right side.static String
stripPolyAorT(String sequence, int thresholdLength)
Remove a 3' polyA or 5' polyT tail.static int
totalSize(String sizes)
-
-
-
Method Detail
-
blatFormatChromosomeName
public static String blatFormatChromosomeName(String chromosome)
Puts "chr" prefix on the chromosome name, if need be.- Parameters:
chromosome
- chromosome- Returns:
- formatted name
-
stripPolyAorT
public static String stripPolyAorT(String sequence, int thresholdLength)
Remove a 3' polyA or 5' polyT tail. The entire tail is removed.- Parameters:
sequence
- sequencethresholdLength
- to trigger removal.- Returns:
- processed sequence
-
blatLocationsToIntArray
public static int[] blatLocationsToIntArray(String blatLocations)
Convert a psl-formatted list (comma-delimited) to an int[].- Parameters:
blatLocations
- locations- Returns:
- locations
-
collapse
public static BioSequence collapse(Collection<Reporter> sequences)
Convert a CompositeSequence's immobilizedCharacteristics into a single sequence, using a simple merge-join strategy.- Parameters:
sequences
- sequences- Returns:
- BioSequence. Not all fields are filled in and must be set by the caller.
-
deBlatFormatChromosomeName
public static String deBlatFormatChromosomeName(String chromosome)
Removes "chr" prefix from the chromosome name, if it is there.- Parameters:
chromosome
- chromosome- Returns:
- formatted name
-
findCenter
public static int findCenter(String starts, String sizes)
Find where the center of a query location is in a gene. This is defined as the location of the center base of the query sequence relative to the 3' end of the gene.- Parameters:
sizes
- sizesstarts
- starts- Returns:
- center
-
getGeneExonOverlaps
public static int getGeneExonOverlaps(String chromosome, String starts, String sizes, String strand, Gene gene)
Given a gene, find out how much of it overlaps with exons provided as starts and sizes. This could involve more than one exon.- Parameters:
chromosome
- , as "chrX" or "X".starts
- of the locations we are testing.sizes
- of the locations we are testing.strand
- to consider. If null, strand is ignored.gene
- Gene we are testing- Returns:
- Number of bases which overlap with exons of the gene. A value of zero indicates that the location is entirely within an intron. If multiple GeneProducts are associated with this gene, the best (highest) overlap is reported).
-
getGeneProductExonOverlap
public static int getGeneProductExonOverlap(String starts, String sizes, String strand, GeneProduct geneProduct)
Compute the overlap of a physical location with a transcript (gene product). This assumes that the chromosome is already matched.- Parameters:
starts
- of the locations we are testing (in the target, so on the same coordinates as the geneProduct location is scored)sizes
- of the locations we are testing.strand
- the strand to look on. If null, strand is ignored.geneProduct
- GeneProduct we are testing. If strand of PhysicalLocation is null, we ignore strand.- Returns:
- Total number of bases which overlap with exons of the transcript. A value of zero indicates that the location is entirely within an intron, or the strand is wrong.
-
rightHandOverlap
public static int rightHandOverlap(BioSequence target, BioSequence query)
Compute just any overlap the compare sequence has with the target on the right side.- Parameters:
query
- querytarget
- target- Returns:
- right overlap
-
totalSize
public static int totalSize(String sizes)
- Parameters:
sizes
- Blat-formatted string of sizes (comma-delimited)- Returns:
- total size
-
computeOverlap
public static int computeOverlap(PhysicalLocation a, PhysicalLocation b)
Compute the overlap between two physical locations. If both do not have a length the overlap is zero unless they point to exactly the same nucleotide location, in which case the overlap is 1.- Parameters:
a
- ab
- b- Returns:
- overlap
-
-