Class SequenceManipulation
java.lang.Object
ubic.gemma.core.analysis.sequence.SequenceManipulation
Convenient methods for manipulating BioSequences and PhysicalLocations
- Author:
- pavlidis
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic intbinFromRange(int start, int end) static StringblatFormatChromosomeName(String chromosome) Puts "chr" prefix on the chromosome name, if need be.static int[]blatLocationsToIntArray(String blatLocations) Convert a psl-formatted list (comma-delimited) to an int[].static BioSequencecollapse(Collection<Reporter> sequences) Convert a CompositeSequence's immobilizedCharacteristics into a single sequence, using a simple merge-join strategy.static intcomputeOverlap(long starta, long enda, long startb, long endb) static intCompute the overlap between two physical locations.static StringdeBlatFormatChromosomeName(String chromosome) Removes "chr" prefix from the chromosome name, if it is there.static intfindCenter(String starts, String sizes) Find where the center of a query location is in a gene.static intGiven a gene, find out how much of it overlaps with exons provided as starts and sizes.static intgetGeneProductExonOverlap(String starts, String sizes, String strand, GeneProduct geneProduct) Compute the overlap of a physical location with a transcript (gene product).static StringreverseComplement(String sequence) static intrightHandOverlap(BioSequence target, BioSequence query) Compute just any overlap the compare sequence has with the target on the right side.static StringstripPolyAorT(String sequence, int thresholdLength) Remove a 3' polyA or 5' polyT tail.static int
-
Constructor Details
-
SequenceManipulation
public SequenceManipulation()
-
-
Method Details
-
blatFormatChromosomeName
Puts "chr" prefix on the chromosome name, if need be.- Parameters:
chromosome- chromosome- Returns:
- formatted name
-
stripPolyAorT
Remove a 3' polyA or 5' polyT tail. The entire tail is removed.- Parameters:
sequence- sequencethresholdLength- to trigger removal.- Returns:
- processed sequence
-
blatLocationsToIntArray
Convert a psl-formatted list (comma-delimited) to an int[].- Parameters:
blatLocations- locations- Returns:
- locations
-
collapse
Convert a CompositeSequence's immobilizedCharacteristics into a single sequence, using a simple merge-join strategy.- Parameters:
sequences- sequences- Returns:
- BioSequence. Not all fields are filled in and must be set by the caller.
-
deBlatFormatChromosomeName
Removes "chr" prefix from the chromosome name, if it is there.- Parameters:
chromosome- chromosome- Returns:
- formatted name
-
findCenter
Find where the center of a query location is in a gene. This is defined as the location of the center base of the query sequence relative to the 3' end of the gene.- Parameters:
starts- startssizes- sizes- Returns:
- center
-
getGeneExonOverlaps
public static int getGeneExonOverlaps(String chromosome, String starts, String sizes, String strand, Gene gene) Given a gene, find out how much of it overlaps with exons provided as starts and sizes. This could involve more than one exon.- Parameters:
chromosome- , as "chrX" or "X".starts- of the locations we are testing.sizes- of the locations we are testing.strand- to consider. If null, strand is ignored.gene- Gene we are testing- Returns:
- Number of bases which overlap with exons of the gene. A value of zero indicates that the location is entirely within an intron. If multiple GeneProducts are associated with this gene, the best (highest) overlap is reported).
-
getGeneProductExonOverlap
public static int getGeneProductExonOverlap(String starts, String sizes, String strand, GeneProduct geneProduct) Compute the overlap of a physical location with a transcript (gene product). This assumes that the chromosome is already matched.- Parameters:
starts- of the locations we are testing (in the target, so on the same coordinates as the geneProduct location is scored)sizes- of the locations we are testing.strand- the strand to look on. If null, strand is ignored.geneProduct- GeneProduct we are testing. If strand of PhysicalLocation is null, we ignore strand.- Returns:
- Total number of bases which overlap with exons of the transcript. A value of zero indicates that the location is entirely within an intron, or the strand is wrong.
-
computeOverlap
public static int computeOverlap(long starta, long enda, long startb, long endb) -
rightHandOverlap
Compute just any overlap the compare sequence has with the target on the right side.- Parameters:
target- targetquery- query- Returns:
- right overlap
-
reverseComplement
-
totalSize
- Parameters:
sizes- Blat-formatted string of sizes (comma-delimited)- Returns:
- total size
-
computeOverlap
Compute the overlap between two physical locations. If both do not have a length the overlap is zero unless they point to exactly the same nucleotide location, in which case the overlap is 1.- Parameters:
a- ab- b- Returns:
- overlap
-
binFromRange
public static int binFromRange(int start, int end) - Parameters:
start- startend- end- Returns:
- bin that this start-end segment is in
-