G2C::Software

GeneNomenclatureUtils: Tools for annotating genes and comparing gene lists with community resources.

by Mike DR Croning and Seth GN Grant


Tool categories


NOMENCLATURE FILE DOWNLOAD
--------------------------

Use this script to automatically download all the
nomenclature and other bioinformatic database files
utilised by the GeneNomenclatureUtils package.

    download_nomenclature_files



GENE AND PROTEIN NOMENCLATURE UTILITIES
---------------------------------------

These scripts use the Entrez, HGNC, MGI, and UniProt
resources to do identifier checking, by going from
gene symbol to gene ID (or vice versa), or to fetch
othologous gene IDs when going from mouse <-> human
genomes.

Approved names and synonyms can be added to files of
gene symbols or IDs.

Protein sequences and reported mouse alleles can be
fetched for mouse genes.

   
Entrez Gene: http://www.ncbi.nlm.nih.gov/gene

    check_entrez_gene_ids
    extract_entrez_gene_info_by_tax_id
    extract_from_ncbi_gene2_accession


HGNC: http://www.genenames.org

    add_hgnc_id_by_hgnc_symbol


MGI: http://www.informatics.jax.org

    add_approved_name_by_mgi_id
    add_mgi_id_by_hgnc_id
    add_mgi_id_by_mgi_symbol
    add_mgi_symbol_by_mgi_id
    add_ortho_gene_id_by_mgi_id
    add_uniprot_acc_by_mgi_id
    add_synonyms_by_mgi_id
    combine_by_mgi_id_column
    generate_interpro_report_from_mgi_id
    get_mgi_alleles_by_mgi_id
    get_prot_seqs_by_mgi_id


UniProt: http://www.uniprot.org

    check_uniprot_ids
    parse_uniprot_accs



LIST COMPARISON
---------------

Allows an arbitrary number of lists of genes or
symbols to be compared eaily using a configuation
file system so that 10s or 100s of lists can be 
managed easily.

    list_comparator



MEDLINE CACHE
-------------

A system to locally cache MEDLINE records,
automatically fetched from PubMed, using a MySQL
database to both coordinate the requests for the
MEDLINE records, and subsequently supply them.

A simple object-orientated interface to the
database is provided to utilise the cache:
package MedlineCacheDB;

    medline_cache_create_db_tables
    medline_cache_dump
    medline_cache_pubmed_fetcher
    medline_cache_requester



OMIM DISEASE - http://www.omim.org
------------

Used to add OMIM IDs and OMIM entry titles to a file
of human genes specified by Entrez Gene ID. The
various 'gene' and/or 'phenotype' records can be
added. 
 
    add_omim_by_entrez_gene_id



PubMed
------

Uses the orthologous gene identifiers assembled in a
file to combine the synonyms and human mutation terms 
to generate a PubMed URL search, to look for reports
of association to disease of human gene mutations 

   generate_pubmed_disease_searches_from_gene_nomenclature_ids



TAB-DELIMITED FILE MANIPULATION
-------------------------------

A comprehensive set of utilities to manipulate tab-
delimited text files.

    add_row_numbers
    categorise_by_column
    compare_columns_output_differences
    count_column_occurrences
    count_distinct_values_in_column
    count_row_occurrences
    count_rows_and_columns
    dump_column_as_unique
    extract_from_file_by_id_column
    extract_from_file_by_column_numerical_range
    filter_by_column
    pad_tabs_to_column_width
    rearrange_and_cut_columns
    remove_trailing_whitespace
    sort_file_by_column
    txt_file_format_sniffer
    write_spreadsheet
© G2C 2014. The Genes to Cognition Programme received funding from The Wellcome Trust and the EU FP7 Framework Programmes:
EUROSPIN (FP7-HEALTH-241498), SynSys (FP7-HEALTH-242167) and GENCODYS (FP7-HEALTH-241995).

Cookies Policy | Terms and Conditions. This site is hosted by Edinburgh University and the Genes to Cognition Programme.