Class GoldenPathSequenceAnalysis

java.lang.Object
ubic.gemma.core.goldenpath.GoldenPath
ubic.gemma.core.goldenpath.GoldenPathSequenceAnalysis
All Implemented Interfaces:
AutoCloseable

public class GoldenPathSequenceAnalysis extends GoldenPath
Using the Goldenpath databases for comparing sequence alignments to gene locations.
Author:
pavlidis
  • Constructor Details

    • GoldenPathSequenceAnalysis

      public GoldenPathSequenceAnalysis(Taxon taxon)
  • Method Details

    • findAssociations

      public Collection<BlatAssociation> findAssociations(String chromosome, Long queryStart, Long queryEnd, String starts, String sizes, String strand, ThreePrimeDistanceMethod method, ProbeMapperConfig config)
      Given a physical location, identify overlapping genes or predicted genes.
      Parameters:
      chromosome - The chromosome name (the organism is set by the constructor)
      queryStart - The start base of the region to query (the start of the alignment to the genome)
      queryEnd - The end base of the region to query (the end of the alignment to the genome)
      starts - Locations of alignment block starts in target. (comma-delimited from blat)
      sizes - Sizes of alignment blocks (comma-delimited from blat)
      strand - Either + or - indicating the strand to look on, or null to search both strands.
      method - The constant representing the method to use to locate the 3' distance.
      config - configuration
      Returns:
      A list of BioSequence2GeneProduct objects. The distance stored by a ThreePrimeData will be 0 if the sequence overhangs the found genes (rather than providing a negative distance). If no genes are found, the result is null; These are transient instances, not from Gemma's database
    • findClosestGene

      public Gene findClosestGene(String chromosome, Long queryStart, Long queryEnd, String strand, int maxWindow)
      Given a location, find the nearest gene on the same strand, including only "known", "refseq" or "ensembl" transcripts.
      Parameters:
      chromosome - chromosome
      queryStart - start
      queryEnd - end
      strand - Either '+' or '-'
      maxWindow - the number of bases on each side to look, at most, in addition to looking inside the given region.
      Returns:
      the Gene closest to the given location. This is a transient instance, not from Gemma's database.
    • findESTs

      public Collection<Gene> findESTs(String chromosome, Long regionStart, Long regionEnd, String strand)
      Check to see if there are ESTs that overlap with this region. We provisionally promote the ESTs to the status of genes for this purpose.
      Parameters:
      chromosome - chromosome
      regionStart - the region to be checked
      regionEnd - end
      strand - the strand
      Returns:
      The ESTs which overlap the query region. (using the all_est table)
    • findKnownGenesByLocation

      public Collection<GeneProduct> findKnownGenesByLocation(String chromosome, Long start, Long end, String strand)
      Find "Known" genes contained in or overlapping a region. Note that the NCBI symbol may be blank, when the gene is not a refSeq gene.
      Parameters:
      chromosome - chromosome
      start - start
      end - end
      strand - strand
      Returns:
      This is a collection of transient instances, not from Gemma's database.
    • findRefGenesByLocation

      public Collection<GeneProduct> findRefGenesByLocation(String chromosome, Long start, Long end, String strand)
      Find RefSeq genes contained in or overlapping a region.
      Parameters:
      chromosome - chromosome
      start - start
      end - end
      strand - strand
      Returns:
      This is a collection of transient instances, not from Gemma's database.
    • findRNAs

      public Collection<Gene> findRNAs(String chromosome, Long regionStart, Long regionEnd, String strand)
      Check to see if there are mRNAs that overlap with this region. We promote the mRNAs to the status of genes for this purpose.
      Parameters:
      chromosome - chromosome
      regionStart - the region to be checked
      regionEnd - end
      strand - the strand
      Returns:
      The mRNAs which overlap the query region.
    • findSequenceLocations

      public Collection<BlatResult> findSequenceLocations(String identifier)
      Parameters:
      identifier - A Genbank accession referring to an EST or mRNA. For other types of queries this will not return any results.
      Returns:
      Set containing Lists of PhysicalLocation representing places GoldenPath says the sequence referred to by the identifier aligns. If no results are found the Set will be empty.
    • getThreePrimeDistances

      public Collection<? extends BioSequence2GeneProduct> getThreePrimeDistances(BlatResult br, ThreePrimeDistanceMethod method)
      Given a physical location, find how close it is to the 3' end of a gene it is in, using default mapping settings.
      Parameters:
      br - BlatResult holding the parameters needed.
      method - The constant representing the method to use to locate the 3' distance.
      Returns:
      a collection of distances
    • getThreePrimeDistances

      public Collection<BioSequence2GeneProduct> getThreePrimeDistances(String identifier, ThreePrimeDistanceMethod method)
      Uses default mapping settings
      Parameters:
      identifier - identifier
      method - the method
      Returns:
      bio seq 2 gene producs