Class GoldenPathSequenceAnalysis


  • public class GoldenPathSequenceAnalysis
    extends GoldenPath
    Using the Goldenpath databases for comparing sequence alignments to gene locations.
    Author:
    pavlidis
    • Constructor Detail

      • GoldenPathSequenceAnalysis

        public GoldenPathSequenceAnalysis​(Taxon taxon)
    • Method Detail

      • findAssociations

        public Collection<BlatAssociation> findAssociations​(String chromosome,
                                                            Long queryStart,
                                                            Long queryEnd,
                                                            String starts,
                                                            String sizes,
                                                            String strand,
                                                            ThreePrimeDistanceMethod method,
                                                            ProbeMapperConfig config)
        Given a physical location, identify overlapping genes or predicted genes.
        Parameters:
        chromosome - The chromosome name (the organism is set by the constructor)
        queryStart - The start base of the region to query (the start of the alignment to the genome)
        queryEnd - The end base of the region to query (the end of the alignment to the genome)
        starts - Locations of alignment block starts in target. (comma-delimited from blat)
        sizes - Sizes of alignment blocks (comma-delimited from blat)
        strand - Either + or - indicating the strand to look on, or null to search both strands.
        method - The constant representing the method to use to locate the 3' distance.
        config - configuration
        Returns:
        A list of BioSequence2GeneProduct objects. The distance stored by a ThreePrimeData will be 0 if the sequence overhangs the found genes (rather than providing a negative distance). If no genes are found, the result is null; These are transient instances, not from Gemma's database
      • findClosestGene

        public Gene findClosestGene​(String chromosome,
                                    Long queryStart,
                                    Long queryEnd,
                                    String strand,
                                    int maxWindow)
        Given a location, find the nearest gene on the same strand, including only "known", "refseq" or "ensembl" transcripts.
        Parameters:
        chromosome - chromosome
        queryStart - start
        queryEnd - end
        strand - Either '+' or '-'
        maxWindow - the number of bases on each side to look, at most, in addition to looking inside the given region.
        Returns:
        the Gene closest to the given location. This is a transient instance, not from Gemma's database.
      • findESTs

        public Collection<Gene> findESTs​(String chromosome,
                                         Long regionStart,
                                         Long regionEnd,
                                         String strand)
        Check to see if there are ESTs that overlap with this region. We provisionally promote the ESTs to the status of genes for this purpose.
        Parameters:
        chromosome - chromosome
        regionStart - the region to be checked
        regionEnd - end
        strand - the strand
        Returns:
        The ESTs which overlap the query region. (using the all_est table)
      • findKnownGenesByLocation

        public Collection<GeneProduct> findKnownGenesByLocation​(String chromosome,
                                                                Long start,
                                                                Long end,
                                                                String strand)
        Find "Known" genes contained in or overlapping a region. Note that the NCBI symbol may be blank, when the gene is not a refSeq gene.
        Parameters:
        chromosome - chromosome
        start - start
        end - end
        strand - strand
        Returns:
        This is a collection of transient instances, not from Gemma's database.
      • findRefGenesByLocation

        public Collection<GeneProduct> findRefGenesByLocation​(String chromosome,
                                                              Long start,
                                                              Long end,
                                                              String strand)
        Find RefSeq genes contained in or overlapping a region.
        Parameters:
        chromosome - chromosome
        start - start
        strand - strand
        end - end
        Returns:
        This is a collection of transient instances, not from Gemma's database.
      • findRNAs

        public Collection<Gene> findRNAs​(String chromosome,
                                         Long regionStart,
                                         Long regionEnd,
                                         String strand)
        Check to see if there are mRNAs that overlap with this region. We promote the mRNAs to the status of genes for this purpose.
        Parameters:
        chromosome - chromosome
        regionStart - the region to be checked
        regionEnd - end
        strand - the strand
        Returns:
        The mRNAs which overlap the query region.
      • findSequenceLocations

        public Collection<BlatResult> findSequenceLocations​(String identifier)
        Parameters:
        identifier - A Genbank accession referring to an EST or mRNA. For other types of queries this will not return any results.
        Returns:
        Set containing Lists of PhysicalLocation representing places GoldenPath says the sequence referred to by the identifier aligns. If no results are found the Set will be empty.
      • getThreePrimeDistances

        public Collection<? extends BioSequence2GeneProduct> getThreePrimeDistances​(BlatResult br,
                                                                                    ThreePrimeDistanceMethod method)
        Given a physical location, find how close it is to the 3' end of a gene it is in, using default mapping settings.
        Parameters:
        br - BlatResult holding the parameters needed.
        method - The constant representing the method to use to locate the 3' distance.
        Returns:
        a collection of distances