Skip to content
Site Tools
Narrow screen resolution Wide screen resolution Auto-adjust screen resolution Increase font size Decrease font size Default font size
You are here: Home arrow Journals arrow Bioinformatics
Bioinformatics PDF Print E-mail
User Rating: / 0
PoorBest 
Written by bioXplorer   
Oct 07, 2007 at 11:39 AM

  • Genome annotation in the presence of insertional RNA editing

    Motivation: Insertional RNA editing renders gene prediction very difficult compared to organisms without such RNA editing. A case in point is the mitochondrial genome of Physarum polycephalum in which only about one-third of the number of genes that are to be expected given its length are annotated. Thus, gene prediction methods that explicitly take into account insertional editing are needed for successful annotation of such genomes.

    Results: We annotate the mitochondrial genome of P.polycephalum using several different approaches for gene prediction in organisms with insertional RNA editing. We computationally validate our annotations by comparing the results from different methods against each other and as proof of concept experimentally validate two of the newly predicted genes. We more than double the number of annotated putative genes in this organism and find several intriguing candidate genes that are not expected in a mitochondrial genome.

    Availability: The C source code of the programs described here are available upon request from the corresponding author.

    Contact:



  • Phylogenetic distances are encoded in networks of interacting pathways

    Motivation: Although metabolic reactions are unquestionably shaped by evolutionary processes, the degree to which the overall structure and complexity of their interconnections are linked to the phylogeny of species has not been evaluated in depth. Here, we apply an original metabolome representation, termed Network of Interacting Pathways or NIP, with a combination of graph theoretical and machine learning strategies, to address this question. NIPs compress the information of the metabolic network exhibited by a species into much smaller networks of overlapping metabolic pathways, where nodes are pathways and links are the metabolites they exchange.

    Results: Our analysis shows that a small set of descriptors of the structure and complexity of the NIPs combined into regression models reproduce very accurately reference phylogenetic distances derived from 16S rRNA sequences (10-fold cross-validation correlation coefficient higher than 0.9). Our method also showed better scores than previous work on metabolism-based phylogenetic reconstructions, as assessed by branch distances score, topological similarity and second cousins score. Thus, our metabolome representation as network of overlapping metabolic pathways captures sufficient information about the underlying evolutionary events leading to the formation of metabolic networks and species phylogeny. It is important to note that precise knowledge of all of the reactions in these pathways is not required for these reconstructions. These observations underscore the potential for the use of abstract, modular representations of metabolic reactions as tools in studying the evolution of species.

    Contact:

    Supplementary information:Supplementary data are available at Bioinformatics online.



  • Gene set enrichment analysis using linear models and diagnostics

    Motivation: Gene-set enrichment analysis (GSEA) can be greatly enhanced by linear model (regression) diagnostic techniques. Diagnostics can be used to identify outlying or influential samples, and also to evaluate model fit and explore model expansion.

    Results: We demonstrate this methodology on an adult acute lymphoblastic leukemia (ALL) dataset, using GSEA based on chromosome-band mapping of genes. Individual residuals, grouped or aggregated by chromosomal loci, indicate problematic samples and potential data-entry errors, and help identify hyperdiploidy as a factor playing a key role in expression for this dataset. Subsequent analysis pinpoints suspected DNA copy number abnormalities of specific samples and chromosomes (most prevalent are chromosomes X, 21 and 14), and also reveals significant expression differences between the hyperdiploid and diploid groups on other chromosomes (most prominently 19, 22, 3 and 13)—differences which are apparently not associated with copy number.

    Availability: Software for the statistical tools demonstrated in this article is available as Bioconductor package GSEAlm.

    Contact:

    Supplementary information:Supplementary data are available at Bioinformatics online.



  • Bayesian learning of biological pathways on genomic data assimilation

    Motivation: Mathematical modeling and simulation, based on biochemical rate equations, provide us a rigorous tool for unraveling complex mechanisms of biological pathways. To proceed to simulation experiments, it is an essential first step to find effective values of model parameters, which are difficult to measure from in vivo and in vitro experiments. Furthermore, once a set of hypothetical models has been created, any statistical criterion is needed to test the ability of the constructed models and to proceed to model revision.

    Results: The aim of our research is to present a new statistical technology towards data-driven construction of in silico biological pathways. The method starts with a knowledge-based modeling with hybrid functional Petri net. It then proceeds to the Bayesian learning of model parameters for which experimental data are available. This process exploits quantitative measurements of evolving biochemical reactions, e.g. gene expression data. Another important issue that we consider is statistical evaluation and comparison of the constructed hypothetical pathways. For this purpose, we have developed a new Bayesian information–theoretic measure that assesses the predictability and the biological robustness of in silico pathways.

    Availability: The FORTRAN source codes are available at the URL http://daweb.ism.ac.jp/~yoshidar/GDA/

    Supplementary information:Supplementary data are available at Bioinformatics online.

    Contact:



  • Functional modules integrating essential cellular functions are predictive of the response of leukaemia cells to DNA damage

    Motivation: Childhood B-precursor lymphoblastic leukaemia (ALL) is the most common paediatric malignancy. Despite the fact that 80% of ALL patients respond to anti-cancer drugs, the patho-physiology of this disease is still not fully understood. mRNA expression-profiling studies that have been performed have not yet provided novel insights into the mechanisms behind cellular response to DNA damage. More powerful data analysis techniques may be required for identifying novel functional pathways involved in the cellular responses to DNA damage.

    Results: In order to explore the possibility that unforeseen biological processes may be involved in the response to DNA damage, we have developed and applied a novel procedure for the identification of functional modules in ALL cells. We have discovered that the overall activity of functional modules integrating protein degradation and mRNA processing is predictive of response to DNA damage.

    Availability:Supplementary material including R code, additional results, experimental datasets, as well as a detailed description of the methodology are available at http://www.bip.bham.ac.uk/vivo/fumo.html.

    Contact:

    Supplementary information:Supplementary data are available at Bioinformatics online.



  • Physical protein-protein interactions predicted from microarrays

    Motivation: Microarray expression data reveal functionally associated proteins. However, most proteins that are associated are not actually in direct physical contact. Predicting physical interactions directly from microarrays is both a challenging and important task that we addressed by developing a novel machine learning method optimized for this task.

    Results: We validated our support vector machine-based method on several independent datasets. At the same levels of accuracy, our method recovered more experimentally observed physical interactions than a conventional correlation-based approach. Pairs predicted by our method to very likely interact were close in the overall network of interaction, suggesting our method as an aid for functional annotation. We applied the method to predict interactions in yeast (Saccharomyces cerevisiae). A Gene Ontology function annotation analysis and literature search revealed several probable and novel predictions worthy of future experimental validation. We therefore hope our new method will improve the annotation of interactions as one component of multi-source integrated systems.

    Contact:

    Supplementary information:Supplementary data are available at Bioinformatics online.



  • Can sugars be produced from fatty acids? A test case for pathway analysis tools

    Motivation: In recent years, several methods have been proposed for determining metabolic pathways in an automated way based on network topology. The aim of this work is to analyse these methods by tackling a concrete example relevant in biochemistry. It concerns the question whether even-chain fatty acids, being the most important constituents of lipids, can be converted into sugars at steady state. It was proved five decades ago that this conversion using the Krebs cycle is impossible unless the enzymes of the glyoxylate shunt (or alternative bypasses) are present in the system. Using this example, we can compare the various methods in pathway analysis.

    Results: Elementary modes analysis (EMA) of a set of enzymes corresponding to the Krebs cycle, glycolysis and gluconeogenesis supports the scientific evidence showing that there is no pathway capable of converting acetyl-CoA to glucose at steady state. This conversion is possible after the addition of isocitrate lyase and malate synthase (forming the glyoxylate shunt) to the system. Dealing with the same example, we compare EMA with two tools based on graph theory available online, PathFinding and Pathway Hunter Tool. These automated network generating tools do not succeed in predicting the conversions known from experiment. They sometimes generate unbalanced paths and reveal problems identifying side metabolites that are not responsible for the carbon net flux. This shows that, for metabolic pathway analysis, it is important to consider the topology (including bimolecular reactions) and stoichiometry of metabolic systems, as is done in EMA.

    Contact: ;

    Supplementary information:Supplementary data are available at Bioinformatics online.



  • MPI-LIT: a literature-curated dataset of microbial binary protein--protein interactions

    Prokaryotic protein–protein interactions are underrepresented in currently available databases. Here, we describe a ‘gold standard’ dataset (MPI-LIT) focusing on microbial binary protein–protein interactions and associated experimental evidence that we have manually curated from 813 abstracts and full texts that were selected from an initial set of 36 852 abstracts. The MPI-LIT dataset comprises 1237 experimental descriptions that describe a non-redundant set of 746 interactions of which 659 (88%) are not reported in public databases. To estimate the curation quality, we compared our dataset with a union of microbial interaction data from IntAct, DIP, BIND and MINT. Among common abstracts, we achieve a sensitivity of up to 66% for interactions and 75% for experimental methods. Compared with these other datasets, MPI-LIT has the lowest fraction of interaction experiments per abstract (0.9) and the highest coverage of strains (92) and scientific articles (813). We compared methods that evaluate functional interactions among proteins (such as genomic context or co-expression) which are implemented in the STRING database. Most of these methods discriminate well between functionally relevant protein interactions (MPI-LIT) and high-throughput data.

    Availability:http://www.jcvi.org/mpidb/interaction.php?dbsource=MPI-LIT.

    Contact:

    Supplementary information:Supplementary data are available at Bioinformatics online.



  • GOSLING: a rule-based protein annotator using BLAST and GO

    Summary: GOSLING is a web-based protein function annotator that uses a decision tree-derived rule set to quickly predict Gene Ontology terms for a protein. A score is assigned to each term prediction that is indicative of the accuracy of the prediction. Due to its speed and accuracy GOSLING is ideally suited for high-throughput annotation tasks.

    Availability:https://www.sapac.edu.au/gosling

    Contact:



  • Profile Comparer: a program for scoring and aligning profile hidden Markov models

    Summary: Profile Comparer (PRC) is a stand-alone program for scoring and aligning profile hidden Markov models (HMMs) of protein families. PRC can read models produced by SAM and HMMER, two popular profile HMM packages, as well as PSI-BLAST checkpoint files. This application note provides a brief description of the profile–profile algorithm used by PRC.

    Availability: The C source code licensed under the GNU General Public Licence and Linux and Mac OS X binaries can be downloaded from http://supfam.org/PRC.

    Contact:

    Supplementary information:Supplementary data are available at Bioinformatics online.



 

Last Updated ( Jul 23, 2008 at 05:10 PM )

Recomended Sites

Who's Online

We have 12 guests online and 3 members online

Login

Username

Password

Remember me
Password Reminder
No account yet? Create one