Help

There are following six different tools available to query, retrieve and analyze the information stored in this Bio Resource. These tools provide multiple options for analyzing results of microarray experiments and developing a biological interpretation of the results.

Genes : Collective information from various public biological resources for genes

  • Input Fields:

    • This search option requires input of data as Genbank Accessions, Unigene Cluster IDs, Gene names or aliases. Please select an organism and the type of input data you are going to provide before submitting the data.

    • For batch search, copy and paste the input data or upload a text file using the Browse button. Please make sure all the accessions are in one column and each line has only one accession.

  • Output Fields:

    • Since the resource contains information associated with genes represented in Unigene or Locus Link, if the input data is not associated with any of these, the search script will return result as Accession not present in the database.

    • The data will be returned in a table form. The first column of the table is the user given gene identifier followed by data collected from Unigene, Locus link, OMIM, NCBI dbEST, protein domains from NCBI CDD, Gene Ontology, Pathways (Kegg, Genmapp and Biocarta) and Protein interactions (BIND). The data is hyper linked to more detail information.

    • The default option displays the data in the order of the input data. Currently the only sort option that is available is by chromosome location but there will be more in future.

    • To download the results in an excel format, go to the link at the top of the page.


Go terms : Classify Genes based on Gene Ontology Terms


  • Input Fields:

    • This requires input of data as Genbank accessions in a column with each accession on a separate line. For batch search, copy and paste the input data or upload a text file using the Browse button.

    • Along with the accessions you have an option to upload the signal to reference (log2)ratios for your experiments. Up to four columns of values are allowed along with the accessions. To upload the data in an excel file: save the excel file with column of accessions followed by the experimental values as a tab delimited file and then upload using the Browse button.

    • If you are uploading the experimental values make sure you choose the 'yes' option and the number of columns of experimental values that are being entered (this does not take into account the accession column).

  • Output & Explanation of results :

    • This will return back the GO terms associated with each accession, organized by cellular component, process and function ontologies. The output is returned in three different formats:

    • a) Output 1: The results will be organized based on GO terms and the accessions associated with it will have the expression value (if given by the user) associated with it in color according to the chosen scheme.

    • Gene Ontology being a hierarchical classification, there are multiple biological functions that are associated with any given gene, these other associated GO terms with a particular gene are displayed alongside in a second column. Clicking on any of these terms will take you to the accessions grouped under those terms.

    • b) Output 2: The Gene Ontology terms are organized based on the order of number of accessions from the given dataset that are associated with Go terms for any ontology. This will give you the most represented Gene Ontology terms mapped to the accessions in your dataset. If expression values are uploaded then the GO terms will be colored based on the expression value and will help you in identifying the associated upregulated and the downregulated GO functions. If more than one experimental dataset values are provided, coloring would be done based on the first set of values.

    • Both a) and b) are downloadable in two different excel formats.



Tissues : View tissue specific expression for genes

  • Input Fields:

    • This search option requires input of data as Genbank Accessions/Unigene Cluster IDs/Gene names or aliases. Please select an organism and the type of input data you are going to provide before submitting the data.

    • For batch search, copy and paste the input data or upload a text file using the Browse button. Please make sure all the accessions are in one column and each line has only one accession.

  • Output & Explanation of results :

    • The results for tissue/organ specific expression of each gene are shown in a graph format. The unigene and the number of est sequences (total count) associated with the unigene is shown at the top, followed by the graph. If the accessions you provided are not associated with any unigene then the results will not be returned.

    • The data for tissue specific expression has been compiled from NCBI dbEST and Unigene sets for the human and mouse genes. Tissues have been further grouped by using the Anatomy Concept hierarchy of Medical Subject Headings MESH terms that is used to index citations in the MEDLINE database. The 17 MESH headings are shown in different colors in the graph legend. Details about specific tissues/organs under a heading and count of ESTs that make up the distribution are represented on the left of the graph.

    • Some of the ESTs that fall into undecided section are those for which dbest reports didn't provide proper tissue identification. Such ESTs are counted under Data not available.

    • If an EST belongs to more than one or many different tissues/MESH Headings then it is counted in all those tissues. Thus for some of the genes, the total count shown at the top (that represents number of ESTs present in a Unigene) may be less than the total number of sequences shown to be present for all the tissues.

    • The X-axis of the graph consists of the 17 terms that are part of the concept hierarchy Anatomy and the Y axis represents the number of sequences. Y Scale is based upon the max number of ESTs under a Mesh Heading found in that particular gene. This tool is particularly useful for finding genes that are uniquely or maximally expressed in any one given tissue or organ.



Pathway Miner - Classify and Extract Network of Associated Genes/Proteins based on Kegg, Biocarta and GenMapp Pathways.

  • 1. Input Fields:

    • There are two ways to query the pathway information:

      • a) Search by experimental data sets and analyze gene expression profiles based on pathways. Input requires accession numbers and gene expression values.

      • b) Search by Keyword - which will search pathway names for the user given term and return back the list of genes involved in the pathway.

    • Search by accessions requires input of data as Genbank accession number in a column with each number on a separate line. For batch search, copy and paste the input data or upload a text file using the Browse button. A sample file is provided on the page as an example to test.

    • Along with accessions there is an option to upload the signal to reference (log2) ratios for the experiments. Up to four columns of values are allowed along with the accession numbers in the first column. Every column of log ratio is referred as an independent experiment. To upload the data in an excel file: save the excel file with column of accessions followed by the experimental values as a tab-delimited file and then upload using the Browse button.

    • If uploading the experimental values, make sure you choose the 'yes' option and the enter number of columns of experimental values that are being provided not counting the accession column. Filter by fold is an option to filter genes based on fold changes. The chosen fold criteria applies to both increases and decreases of expression. By default this option is set to no filter.

    • Option to perform statistical test: Gene expression profiles can be analyzed for significant pathways representation based on a Fisher exact statistical test. This feature ranks the pathways by a P value. P values reflect the observed change in the pathway with respect to what would have been expected by random chance. This test assumes that pathways are represented in the data set to the same degree as they are in the pathway database. Details of the statistical analysis and a note of caution about interpreting the P values are discussed below in the Output options for analyzing the results. The statistical test is applied for pathways from all three independent resources KEGG, BIOCARTA and GenMAPP. By default the statistical test option is set to No.

      • A yes option should be chosen if there is a large dataset and the goal is to find out which pathways are altered most significantly based on gene expression changes and reduce the data to focus on a few significant pathways. The option to include a fold change cutoff should be selected if yes is chosen to perform the statistical test.

      • A no option is more approproiate for a small list of genes or dataset that already has been reduced significantly to an interesting set of genes based on other types of data analysis.


  • 2. Color Scheme :

    • This represents fold based changes in increase or decrease expression of genes. Everything equal to or more than 1.5, 2, 3 and 4 fold increase or decrease is represented by a red or green color scheme. No change is represented by black. No criteria met is for the genes that show less than 1.5 fold change or the genes that have been filtered out if a fold change criteria was chosen. Grey represent the missing value for the gene.



  • 3. Output options for analyzing pathways information after above choices submitted :

    • Metabolic, Cellular and Regulatory process pathways for human and mouse gene products from three different open source pathway resources Kegg, BioCarta and GenMapp are stored in the bio resource database. The user supplied dataset is searched against the pathways from these three resources. Accessions belonging to unique genes in the input data set are mapped on to the pathways.

    • Two options are provided to display the pathways found:
      1. Produce a graph that displays gene associations in pathways as gene product association networks from the pathways.
      2. Browse the pathway maps for pathway profiles with gene products highlighted showing the changes in expression.

      There are two types of output based on whether the statistical test was chosen or not:

  • 3.1. Output when the statistical test was not chosen:

    • The output shows genes classified by participation of gene products in Metabolic, and/or Regulatory and Cellular processes. All genes are shown in pathway maps and lists of classified genes; on a network display, only genes above the filter fold cutoff are shown. Two interactive html outputs are provided - one output is a list of Pathways from each resource and the genes associated with these pathways. This shows which pathways are represented and by how many genes in the dataset. If experimental values are provided, the color scheme will show the expression behavior of genes in the dataset for any pathway in a given experiment. The other output is the list of Genes and the pathways associated with these genes from each resource. Genes are identified by their gene names and the accessions or UG_ids (unigene identifiers) are provided. These data provide the information if a particular gene is involved in one or many pathways.

    • Gene-pathway relationships can be analyzed further using the two options that are provided: browse maps or browse networks.

  • 3.2. Output when the statistical test was chosen:

    • With this option, the pathways are ranked by P value. The P value reflects the probability that the observed changes in the pathway have occurred by chance alone. A default level of significance of P=0.05 has been chosen but this value can be changed using a drop down menu. A link is provided at the bottom of the page to a list all pathways that did not pass the 0.05 or chosen level of significance along with their P values.

  • 3.2.1. What happens when multiple columns of log2 ratios are entered?

    • Multiple columns in the input data set are treated as independent experiments. In this manner, the results of repeats of the same experiment or multiple samples of the same experiment, e.g. different time points, can be compared. The results are downloadable as Excel spread sheets. All the experiments and their data are provided in multiple sheets.

  • 4. How the ranking of pathways is performed:

    • When a pathway ranking analysis is performed, a unique gene list is created from the user input dataset. If multiple accessions are present for a single gene in the data set i.e. the gene is represented more than once by different probes on the microarray chip and the fold level changes agree, then the largest value is used based on the assumption that this probe is the best one for the gene. If conflicting expression values (expression levels of multiple accessions don't agree) are found, then the gene is removed from the dataset. The user criterion for fold change cutoff is used to filter out genes from this unique gene list. A one sided Fisher exact test is applied to the filtered, unique gene list to measure the probability that a pathway is significantly altered or changed. A link to the genes removed is provided so that corrections can be made and data reentered if needed.

  • 4.1. Statistical test:

    • The Fisher exact test is performed by building 2x2 contingency tables for number of genes that are changed in the dataset and belong to a pathway. This is achieved in the same manner as has been done for GO (gene ontology) Miner at http://discover.nci.nih.gov/gominer/. The test is based on the following analysis of the unique gene list derived from the user input data set:
      (1) total no. of genes (N),
      (2) total no. of genes changed in the dataset (Nd) that passed the fold cutoff,
      (3) no. of genes found for a particular pathway in the user dataset (n), and
      (4) no. of genes that have changed in the pathway based on fold cutoff, (nd).

      The 2x2 table for the two exclusive categories is detailed below.

       Genes that passed the cutoffGenes that did not pass the cutoffTotal genes
      Genes in pathwayndn-nd n
      Genes not in pathwayNd-nd(N-n)-(Nd-nd) (N-n)
      Total genes changed Nd N-Nd N

    • A javascript by Øyvind Langsrud (at http://www.matforsk.no/ola/fisher.htm) has been integrated into the pathway mining tool and is used to calculate a one sided Fisher exact test.

    • CAUTIONARY NOTE: The Fisher test is used to rank pathways but, due to assumptions that have to be made in the analysis, e.g. genes on microarray chip are representative of pathways in the pathway database, genes act in only one pathway, etc., the test only provides an approximate P value. For this reason, Fisher test P values should not be taken literally as an accurate statistical evaluation of the involvement of a pathway in the data set.

  • 5. Gene Association Networks:

    • A graph network is drawn to illustrate gene associations in one or more pathways. The network is based on the genes in the user-supplied dataset and shows which gene products are known to function together as part of the same pathway. The gene products-pathway relationships are collected and compiled for each gene and for each representative pathway in the input dataset by mining the pathway data available from each of the pathways resources (Kegg, Biocarta and Genmapp). Gene association networks using gene association information from each resource are built separately for the user data set.

      The network is extracted from the information present in each resource database. Relationships are found between the genes in the user dataset by their co-occurrence in a pathway. For each gene, all pathways are found, and, for each pathway, all genes in the same pathway and present in the user dataset, are found. If genes A and B co-occur together in pathway X, they have a relationship. If genes B and C appear together in pathway Y, then the relationship A-B-C is established. If C and A also appear together in pathway X, then the network shown in the accompanying graph can be produced.


    • In the graph network, nodes represent genes and edges represent gene relationships as found in any pathway. Edges do not have a direction associated with them and edge length has no meaning. Edges are given weights based on the number of pathways in which the associating nodes (gene products) co-occur together in the selected pathway resource. Node locations can be moved using the mouse to improve viewing of edge relationships. The nodes are labeled by locus IDs or gene names and are colored based on the user supplied log odds expression values. These values are converted to fold change differences and the matching color scheme is superimposed on the nodes. These graph networks are not scored.

    • Networks are drawn to interpret and analyze global gene expression changes within pathways and among interacting pathways through common gene products. Most importantly, the networks show how a pathway is being altered based on expression changes of participating genes and how a set of genes might be influencing multiple pathways. Multiple ways of filtering and selecting features of the network for analyzing the expression behavior of genes associated with pathways are provided. These filters are described in the network browser help file.

  • 5.1. Network browser Application:

    • The network is drawn as a two-dimensional graph using the Neato program in the Graphviz software (http://www.research.att.com/sw/tools/graphviz/) and is displayed in a network browser that runs as a Java applet. Sub graphs found are displayed separately. Graphviz provides multiple options for analyzing the graph network. For more details on the network browser application and how to analyze a dataset using all the features click here.

  • 5.2. Options that vary the graph display of the gene networks:

    • The user has two options that influence the gene network graph. First, the user can provide a cutoff score for expression values, in which case only genes passing the fold change cutoff are shown on the graph. Second, the user can perform a statistical analysis to rank pathways. If this choice has been made then, on the network graph, an option allows the user to select a P value for displaying only those edges representing pathways that pass this significance test. Those edges that pass the test are colored red. If more than one pathway is presented by an edge and not all represented pathways pass the significance test, then the edge is colored as black. The remaining grey edges represent pathways that did not pass the significance test.

  • 6. Browsing the pathway maps for pathway profiles with genes highlighted on them:

    • In all output lists of pathways outputs, clicking on the pathway names goes to the pathway maps produced by the Kegg, Biocarta, Genmapp resources. The genes from the input dataset will be highlighted on the maps. If no expression values are provided, then genes are highlighted on the map as a rectangular grey box. Alternatively, if expression values are provided, a colored representation of the gene is produced using the color scheme shown on the input Web form. If only one data set is provided, a signal colored box is shown. If more than one data set is provided, then colored compartments will be shown in the gene box. Up to four different samples can be displayed in this manner.


    Disease finder: Search for genes associated with Disease

    • Input Fields:

      • This search option requires input of data as Genbank Accessions/Unigene Cluster IDs/Gene names or aliases.

      • For batch search, copy and paste the input data or upload a text file using the Browse button. Please make sure all the accessions are in one column and each line has only one accession.

    • Output & Explanation of results :

      • The data for genes associated with a disease has been compiled from OMIM (Online Mendelian inheritance in Man) database and NCBI dbEST for the human unigene set. In addition Disease Concept hierarchy of Medical Subject Headings MESH terms that is used to index citations in the MEDLINE database has been used to group ESTS from Unigene sets under 23 main disease headings.

      • The result displays the information from OMIM and the information collected from the ESTs mapped to Disease MESH terms. The default output displays the result arranged by chromosomal location of the genes. This output can be further sorted by MESH Disease terms or MESH Tissue terms.

      • 'View nearby gene option' displays the genes that are neighbor to the given gene on the chromosome. The 'View OMIM gene map' option displays the cytogenetic map location of only disease genes and other expressed genes described in OMIM.

      • The MESH disease terms column shows the total count of the ESTs for that given gene and shows how many ESTs have been mapped to any disease terms based on the tissue information from dbEST (eg : Of 95 ESTs 37 belongs to Retinoblastomaà). For the remaining ESTs for that gene dbest reports didn't provide proper tissue identification or they could be from normal tissues.

      • The MESH disease terms column shows the total count of the ESTs for that given gene and shows how many ESTs have been mapped to any disease terms based on the tissue information from dbEST (eg : Of 95 ESTs 37 belongs to Retinoblastomaà). For the remaining ESTs for that gene dbest reports didn't provide proper tissue identification or they could be from normal tissues.

      • OMIM Information:

      • This returns results for genes that belong to OMIM and are organized based on the cytogenetic location. This also shows the disorders that are associated with any gene. Related OMIMs for any genes are shown along with their cytogenetic location which is very helpful to find out if the related disease genes are clustered together on the chromosome.

      • This page provides multiple options to analyze the data. You can search for a disease term that will be looked up in the OMIM disorder, Title and Clinical Synopsis. Alternatively you can select a Clinical Synopsis or Chromosome to analyze the data.

      • Clinical Synopsis are compiled from the OMIM resource. To start you should select "all" from the dropdown menu to see what all Clinical Synopsis exist with the genes in your dataset. Then go back and choose any specific one that you might be interested in.

      • We are not mining the text details for each OMIM ID so for more detail or in depth information about any of these OMIMs click on OMIM to go to the OMIM database.


     

    Protein Interactions & Complexes

      This is in process of getting updated with more new data and features.

    • Input Fields:

      • This search option requires input of data as Genbank Accessions/Unigene Cluster IDs/Gene names or aliases. Please select an organism and the type of input data you are going to provide before submitting the data.

      • For batch search, copy and paste the input data or upload a text file using the Browse button. Please make sure all the accessions are in one column and each line has only one accession.

    • Output & Explanation of results :

      • The search is done for the genes/proteins submitted against the chosen organism dataset from the BIND source. A list of interactions and complexes are returned. For the genes/proteins for which no interactions or complexes are reported, a second effort is made to look for similar genes/proteins in other organisms in BIND (Yeast, Fly, Rat, E. coli).

      • Data is presented first from the chosen organism followed by the data found for similar genes/proteins in other organisms.

      • The Interaction column shows the Pair A and Pair B of the Pair wise Interaction and the one representing the input gene/protein is highlighted next to the query. If the gene/protein is present in a complex the details are present in the complex column. All the complex components and their interactions are shown.

      • Graph Interaction network
      • A graph network is drawn for protein-protein interactions that are found in your dataset. This is based on the result that was presented in tabular form. The graph is drawn using an open source software graphviz. The proteins are drawn as nodes. The interacting pair are connected by an arc. The nodes are labeled by Name of the interacting pair. Sub graphs are displayed separately. The graph networks are not scored.

      • If one of the interacting pair is a molecule other than protein (DNA, complex, other molecule), the arc is shown by a different color.

      • If the interaction was not found for the chosen organism set but was found by looking up similar genes/proteins in other organisms the interacting pair (nodes) are shown by a different color.

      • A text file showing all the proteins and their one or many pair wise interactions can be downloaded.


For any comments or questions contact Dr. Ritu Pandey or Prof. David Mount.
BioRag database is maintained by the Bioinformatics group at Arizona Cancer Center. The material presented here is compiled from different public databases. BioRag is hosted by the Biotechnology Computing Facility of the University of Arizona.© 2002 University of Arizona. All Rights Reserved.