Class GeoBrowser


  • public class GeoBrowser
    extends Object
    Gets records from GEO and compares them to Gemma. This is used to identify data sets that are new in GEO and not in Gemma.

    See Programmatic access to GEO for some information.

    Author:
    pavlidis
    • Constructor Detail

      • GeoBrowser

        public GeoBrowser()
    • Method Detail

      • getAllGEOPlatforms

        public Collection<GeoRecord> getAllGEOPlatforms()
                                                 throws IOException
        A bit hacky, can be improved. Limited to human, mouse, rat, is not guaranteed to get everything, though as of 7/2021, this is sufficient (~8000 platforms)
        Returns:
        all relevant platforms up to single-query limit of NCBI
        Throws:
        IOException
      • getGeoRecordsBySearchTerm

        public List<GeoRecord> getGeoRecordsBySearchTerm​(String searchTerms,
                                                         int start,
                                                         int pageSize,
                                                         boolean detailed,
                                                         Collection<String> allowedTaxa,
                                                         Collection<String> limitPlatforms)
                                                  throws IOException
        Provides more details than getRecentGeoRecords. Performs an E-utilities query of the GEO database with the given searchTerms (search terms can be omitted). Returns at most pageSize records. Does some screening of results for expression studies, and (optionally) taxa. This is used for identifying data sets for loading.
        Parameters:
        start - start an offset to retrieve batches
        pageSize - page size how many to retrive
        searchTerms - search terms in NCBI Entrez query format
        detailed - if true, additional information is fetched (slower)
        allowedTaxa - if not null, data sets not containing any of these taxa will be skipped
        limitPlatforms - not null or empty, platforms to limit the query to (combining with searchTerms not supported yet)
        Returns:
        list of GeoRecords
        Throws:
        IOException - if there is a problem obtaining or manipulating the file (some exceptions are not thrown and just logged)
      • getRecentGeoRecords

        public List<GeoRecord> getRecentGeoRecords​(int startPage,
                                                   int pageSize)
                                            throws IOException
        Retrieves and parses tab delimited file from GEO. File contains pageSize GEO records starting from startPage. The retrieved information is pretty minimal.
        Parameters:
        startPage - start page
        pageSize - page size
        Returns:
        list of GeoRecords
        Throws:
        IOException - if there is a problem while manipulating the file