Pubs&Presentations
The annual workshops are intended to be a thorough and intensive
introduction to eukaryotic pathogen database resources that are
part of the
EuPathDB Bioinformatics
Resource Center (including
OrthoMCL DB).
Four day bioinformatics workshop for thirty scientists,
focusing on effective use of EuPathDB and its component sites
AmoebaDB,CryptoDB, GiardiaDB, MicrosporidiaDB, PiroplasmaDB, PlasmoDB, ToxoDB, TrichDB, TriTrypDB and OrthoMCL
Jun 17-20, 2012
Includes doi, PMID, and PMCID links.
show more...
PATRIC 2.0
Ammerman NC, Gillespie JJ, Neuwald AF, Sobral BW, Azad AF. (2009) A typhus group-specific protease defies reductive evolution in rickettsiae. J. Bacteriol. doi: 10.1128/JB.01077-09. PMID: 19820087. PMCID: PMC2786609.
Ananiadou Sophia, Dan Sullivan, William Black, Gina-Anne Levow, Joseph J. Gillespie,
Chunhong Mao, Sampo Pyysalo, BalaKrishna Kolluru, Junichi Tsujii, Bruno Sobral. (2011). Named Entity Recognition for Bacterial Type IV Secretion Systems. PLoS ONE. March 29, 2011. doi:10.1371/journal.pone.0014780. PMID: 21468321. PMCID: PMC3066171.
Dreher-Lesnick SM, Ceraul SM, Lesnick SC, Gillespie JJ, Anderson JM, Jochim RC, Valenzuela JG, Azad AF. (2010) Analysis of Rickettsia typhi-infected and uninfected cat flea (Ctenocephalides felis) midgut cDNA libraries: deciphering molecular pathways involved in host response to R. typhi infection. Insect Mol. Biol. doi: 10.1111/j.1365-2583.2009.00978.x. PMID: 20017753. PMCID: PMCID: PMC3179627.
Driscoll Timothy, Joseph L. Gabbard, Chunhong Mao, Oral Dalay, Maulik Shukla, Clark C. Freifeld, Anne Gatewood Hoen, John S. Brownstein JS, Bruno W. Sobral. Integration and visualization of host-pathogen data related to infectious diseases. Bioinformatics. 2011 Jun 27. doi: 10.1093/bioinformatics/btr391. PMID: 21712250. PMCID: PMC3150046 [Available on 2012/8/15].
Dyer MD, Neff C, Dufford M, Rivera CG, Shattuck D, et al. 2010 The Human-Bacterial Pathogen Protein Interaction Networks of Bacillus anthracis, Francisella tularensis, and Yersinia pestis. PLoS ONE 5(8): e12089. doi:10.1371/journal.pone.0012089. PMID: 20711500. PMCID: PMC2918508.
Gillespie, J.J.*, Joardar, V.*, Williams, K.P., Driscoll, T., Hostetler, J.B., Nordberg, E.K., Shukla, M., Walenz, B., Hill, C.A., Nene, V.M., Azad, A.F., Sobral, B.W., Caler, E. (2012). A Rickettsia genome overrun by mobile genetic elements provides insight into the acquisition of genes characteristic of an obligate intracellular lifestyle. Journal of Bacteriology 194: 376-394. *equal author contribution. doi: 10.1128/ JB.06244-11. PMID: 22056929. PMCID: PMC3256634.
Gillespie Joseph J. , Alice R. Wattam, Stephen A. Cammer, Joseph Gabbard, Maulik P. Shukla, Oral Dalay, Timothy Driscoll, Deborah Hix, Shrinivasrao P. Mane, Chunhong Mao, Eric K. Nordberg, Mark Scott, Julie R. Schulman, Eric E. Snyder, Daniel E. Sullivan, Chunxia Wang, Andrew Warren, Kelly P. Williams, Tian Xue, Hyun Seung Yoo, Chengdong Zhang, Yan Zhang, Rebecca Will, Ronald W. Kenyon, and Bruno W. Sobral (2011). “PATRIC: The Comprehensive Bacterial Bioinformatics Resource with a Focus on Human Pathogenic Species”
Infect. Immun 79 (11): 4286-98. doi:10.1128/IAI.00207-11. PMC 3257917. PMID 21896772.
Gillespie JJ, Brayton KA, Williams KP, Diaz MA, Brown WC, Azad AF, Sobral BW. (2010) Phylogenomics reveals a diverse Rickettsiales type IV secretion system. Infect. Immun. doi: 10.1128/IAI.01384-09. PMID: 20176788. PMCID: PMC2863512.
Pyysalo Sampo, Tomoko Ohta, Rafal Rak, Dan Sullivan, Chunhong Mao, Chunxia Wang, Bruno Sobral, Jun’ichi Tsujii, Sophia Ananiadou (2011). Overview of the Infectious Diseases (ID) task of BioNLP Shared Task 2011. Proceedings of BioNLP Shared Task 2011 Workshop, pages 26–35, Portland, Oregon, USA, 24 June, 2011. (View Full Paper at Association of Computational Linguistics Website).
Pyysalo Sampo, Tomoko Ohta, Han-Cheol Cho, Dan Sullivan, Chunhong Mao, Bruno Sobral,
Jun’ichi Tsujii and Sophia Ananiadou (2010). Towards Event Extraction from Full Texts on Infectious Diseases. Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, ACL
2010, pages 132–140, Uppsala, Sweden, 15 July 2010.(View Full Paper at the Association of Computational Linguistics Website).
Rafal Rak, Andrew Rowley, William Black, and Sophia Ananiadou (2012). “Argo: an integrative, interactive, text mining-based workbench supporting curation” Oxford Journals. Published online 2012 Feburary 13. doi:10.1093/database/bas010. PMID: 22434844. PMC 3308166.
Sobral, B. and A. Wattam (2011). Comparative genomics and phylogenomics of the Brucella. Book chapter in “Brucella: Molecular Microbiology and Genetics”. I. L.-G. D. O’Callaghan, Horizon Scientific Press.
Sullivan DE, Gabbard JL, Shukla M, Sobral B. (2010) Data integration for dynamic and sustainable systems biology resources: challenges and lessons learned. Chem. Biodivers. doi: 10.1002/cbdv.200900317. PMID: 20491070. PMCID: PMC2894471.
Sutten EL, Norimine J, Beare PA, Heinzen RA, Lopez JE, Morse K, Brayton KA, Gillespie JJ, Brown WC. (2010) Anaplasma marginale type IV secretion system proteins VirB2, VirB7, VirB11, and VirD4 are immunogenic components of a protective bacterial membrane vaccine. Infect. Immun. doi: 10.1128/IAI.01207-09. PMID: 20065028. PMCID: PMC2825951.
Williams KP, Gillespie JJ, Sobral BW, Nordberg EK, Snyder EE, Shallom JM, Dickerman AW. (2010) Phylogeny of gammaproteobacteria. J. Bacteriol. doi: 10.1128/JB.01480-09. PMID: 20207755. PMCID: PMC2863478.
PATRIC 1.0
Snyder EE, Kampanya N, Lu J, Nordberg EK, Karur HR, Shukla M, Soneja J, Tian Y, Xue T, Yoo H, Zhang F, Dharmanolla C, Dongre NV, Gillespie JJ, Hamelius J, Hance M, Huntington KI, Jukneliene D, Koziski J, Mackasmiel L, Mane SP, Nguyen V, Purkayastha A, Shallom J, Yu G, Guo Y, Gabbard J, Hix D, Azad AF, Baker SC, Boyle SM, Khudyakov Y, Meng XJ, Rupprecht C, Vinje J, Crasta OR, Czar MJ, Dickerman A, Eckart JD, Kenyon R, Will R, Setubal JC, Sobral BW. (2007) PATRIC: the VBI PathoSystems Resource Integration Center. Nucleic Acids Res. doi: 10.1093/nar/gkl858. PMID: 17142235. PMCID: PMC1669763.
Click to view Abstract and link to full text.
show more...
Rafal Rak, Andrew Rowley, William Black, and Sophia Ananiadou (2012). “Argo: an integrative, interactive, text mining-based workbench supporting curation” Oxford Journals. Published online 2012 Feburary 13. doi:10.1093/database/bas010. PMC 3308166.
Abstract
Curation of biomedical literature is often supported by the automatic analysis of textual content that generally involves a sequence of individual processing components. Text mining (TM) has been used to enhance the process of manual biocuration, but has been focused on specific databases and tasks rather than an environment integrating TM tools into the curation pipeline, catering for a variety of tasks, types of information and applications. Processing components usually come from different sources and often lack interoperability. The well established Unstructured Information Management Architecture is a framework that addresses interoperability by defining common data structures and interfaces. However, most of the efforts are targeted towards software developers and are not suitable for curators, or are otherwise inconvenient to use on a higher level of abstraction. To overcome these issues we introduce Argo, an interoperable, integrative, interactive and collaborative system for text analysis with a convenient graphic user interface to ease the development of processing workflows and boost productivity in labour-intensive manual curation. Robust, scalable text analytics follow a modular approach, adopting component modules for distinct levels of text analysis. The user interface is available entirely through a web browser that saves the user from going through often complicated and platform-dependent installation procedures. Argo comes with a predefined set of processing components commonly used in text analysis, while giving the users the ability to deposit their own components. The system accommodates various areas and levels of user expertise, from TM and computational linguistics to ontology-based curation. One of the key functionalities of Argo is its ability to seamlessly incorporate user-interactive components, such as manual annotation editors, into otherwise completely automatic pipelines. As a use case, we demonstrate the functionality of an in-built manual annotation editor that is well suited for in-text corpus annotation tasks.
Influenza Sequence Feature Variant Type (Flu-SFVT) analysis: evidence for a role of NS1 in influenza host range restriction Noronha JM, Liu M, Squires RB, Pickett BE, Hale BG, Air GM, Galloway SE, Takimoto T, Schmolke M, Hunt V, Klem E, Garc�a-Sastre A, McGee M, Scheuermann RH. J Virol. 2012 March doi: 10.1128/JVI.06901-11 PMID: 22398283
Click to view Abstract and link to full text.
show more...
Gillespie, J.J.*, Joardar, V.*, Williams, K.P., Driscoll, T., Hostetler, J.B., Nordberg, E.K., Shukla, M., Walenz, B., Hill, C.A., Nene, V.M., Azad, A.F., Sobral, B.W., Caler, E. (2012). A Rickettsia genome overrun by mobile genetic elements provides insight into the acquisition of genes characteristic of an obligate intracellular lifestyle. Journal of Bacteriology 194: 376-394. *equal author contribution. PMID: 22056929
Abstract
We present the draft genome for the Rickettsia endosymbiont of Ixodes scapularis (REIS), a symbiont of the deer tick vector of Lyme disease in North America. Among Rickettsia species (Alphaproteobacteria: Rickettsiales) REIS has the largest genome sequenced to date (>2Mb) and contains 2309 genes across the chromosome and four plasmids (pREIS1-4). The most remarkable finding within the REIS genome is the extraordinary proliferation of mobile genetic elements (MGEs), which contributes to a limited synteny with other Rickettsia genomes. In particular, an integrative conjugative element named RAGE (Rickettsiales amplified genetic element), previously identified in scrub typhus rickettsiae (Orientia tsutsugamushi) genomes, is present on both the REIS chromosome and plasmids. Unlike the pseudogene-laden RAGEs of O. tsutsugamushi, REIS encodes nine conserved RAGEs that include F-like type IV secretion systems similar to the tra genes encoded in the R. bellii and R. massiliae genomes. An unparalleled abundance of encoded transposases (>650) relative to genome size, together with the RAGEs and other MGEs, comprise ∼35% of the total genome, making REIS one of the most plastic and repetitive bacterial genomes sequenced to date. We present evidence that conserved rickettsial genes associated with an intracellular lifestyle were acquired via MGEs, especially the RAGE, through a continuum of genomic invasions. Robust phylogeny estimation suggests REIS is ancestral to the virulent spotted fever group rickettsiae. As REIS is not known to invade vertebrate cells and has no known pathogenic effects on I. scapularis, its genome sequence provides insight on the origin of mechanisms of rickettsial pathogenicity.
View Abstract or download Manuscript on the ASM Website.
BioHealthBase: informatics support in the elucidation of influenza virus host-pathogen interactions and virulence Squires B, Macken C, Garcia-Sastre A, Godbole S, Noronha J, Hunt V, Chang R, Larsen CN, Klem E, Biersack K, Scheuermann RH. Nucleic Acids Res. (Database issue) 2008:D497-503
In?uenza Research Database: an integrated bioinformatics resource for in?uenza research and surveillance Squires RB, Noronha J, Hunt V, Garc�a-Sastre A, Macken C, Baumgarth N, Suarez D, Pickett BE, Zhang Y, Larsen CN, Ramsey A, Zhou L, Zaremba S, Kumar S, Deitrich J, Klem E, Scheuermann RH. Influenza Other Respi Viruses. 2012 Jan 20. doi: 10.1111/j.1750-2659.2011.00331.x. [Epub ahead of print] PMID: 22260278
Working with Parasite Database Resources
21-26 October 2012 Application deadline: 29 June 2012
This residential workshop held at the Wellcome Trust Genome Campus, Hinxton, Cambridge
aims to provide experimental biologists with hands-on experience in genomic-scale data analysis, including genome browsers and comparison tools, methods for data integration, and resources for sophisticated data mining. Examples and exercises will be drawn primarily from Plasmodium and kinetoplastida parasites but are likely also to include other organisms contained in
EuPathDB and / or
GeneDB, thus applicants working on any protozoan parasite from these resources will benefit.
Send questions to: advancedcourses@hinxton.wellcome.ac.uk
EuPathDB will be involved in planning and running the workshop
October 21-26, 2012
The American Society for Tropical Medicine and Hygiene will hold its 60th annual meeting from December 4-8 at the the Philadelphia Marriott Downtown, Philadelphia, PA, USA.
This year EuPathDB will join other bioinformatics centers at booth #417. Come visit us during exhibit and poster session hours:
Sunday, December 4
7:30 p.m. to 9:30 p.m. Opening Reception
Monday, December 5
9:30 a.m. to 10:30 a.m. Exhibits
Noon to 1:45 p.m. Poster Session A Presentations and Exhibits
3:15 p.m. to 4:15 p.m. Exhibits
Tuesday, December 6
9:30 a.m. to 10:30 a.m. Exhibits
Noon to 1:45 p.m. Poster Session C Presentations and Exhibits
3:15 p.m. to 4:15 p.m. Exhibits
Wednesday, December 7
9:30 a.m. to 10:30 a.m. Exhibits
Noon to 2:30 p.m.Exhibits
EuPathDB team members will be present throughout the meeting and available at booth #417 during exhibit hours.
Dec. 4-8, 2011
ViPR: an open bioinformatics database and analysis resource for virology research Pickett BE, Sadat EL, Zhang Y, Noronha JM, Squires RB, Hunt V, Liu M, Kumar S, Zaremba S, Gu Z, Zhou L, Larson C, Dietrich J, Klem EB, Scheuermann RH. Nucleic Acids Res. 2011 Oct 17. [Epub ahead of print]. PMID: 22006842
Full day tutorial entitled "Quantitative and Qualitative Methods for Human-Subject Visualization Experiments" was well attended and received.
show more...
Course Title: Quantitative and Qualitative Methods for Human-Subject Visualization Experiments.
Course Authors: Joseph L. Gabbard (Virginia Tech), J. Edward Swan II (Mississippi State University) & Chris North (Virgina Tech)
Course Description: This tutorial is for researchers and engineers, working in the field of visualization, who wish to conduct user-based visualization experiments with a specific aim of promoting both traditional quantitative human-subject experiments and qualitative methods for assessing usability and insight. This tutorial presents both quantitative and qualitative approaches to human-subject experiments of visualizations. It covers (1) the basic principles of experimental design and analysis, with an emphasis on human-subject experiments in visualization; (2) formative evaluation methods for iteratively assessing and improving visualization user interfaces; and (3) approaches to designing and conducting qualitative studies that aim to measure the degree to which specific visualization designs afford insight formation.
Who should attend: Researchers and engineers, working in the visualization fields (Vis, InfoVis, VAST, BioVis), who wish to either (1) conduct evaluation experiments with human subjects, and / or (2) gain a better understanding of the basic terminology of experimental design and analysis (e.g., the precise meaning of statements such as [F(2,45) = 5.67, p = .023]), and / or (3) are researching or developing visualizations that can benefit from qualitative user-based assessment (e.g., visualizations that are at or beyond prototyping phases and are readying for potential broader use). Level of expertise: All Levels. This material is useful to attendees with multiple levels of expertise.
Presentation entitled “PATRIC: A resource for infectious disease research. Real-life examples used to drive software development”.
show more...
The seminar was held at Virginia Bioinformatics Institute in conjunction with the Genetics, Bioinformatics, and Computational Biology (GBCB) Seminar Series. An associated hands-on workshop followed. Details below.
What PATRIC Offers:
- Consistent annotations across all sequenced bacterial species from GenBank.
- Searches based on taxonomy, gene name, locus tag, protein function/families, pathways, EC numbers, GO terms, and more.
- Data and analysis results are free and typically downloadable.
- Organism summaries; pre-sorted PubMed articles; interactive phylogenetic trees; virulence and disease information, and experimental data including, transcriptomics, proteomics, protein structure, and protein-protein interactions.
- Numerous interactive visualizations for genome browsing, multi-genome comparisons, pathway comparisons, protein family comparisons, phylogenetic trees with coupled MSAs, 3D protein structures, and feature group comparisons.
Event includes free pizza dinner. For registration information, agenda, and location details please see PATRIC VT Workshop Flyer.
If you have a suggestion for a future workshop topic and/or location, please send it to us via the “contact us” link at the bottom of this page.
IRD and ViPR Hands-on Workshop (Workshop presentation/ booklet, JCVI-GSCID/NIAID Workshop: Empowering Genomics in Southern Africa ? Applications to Infectious Disease, University of Limpopo, Turfloop Campus, South Africa) ? May 30 ? Jun. 1, 2011
IRD and ViPR Hands-on Workshop (Workshop presentation/ booklet, JCVI-GSCID/NIAID Workshop: Empowering Genomics in Southern Africa ? Applications to Infectious Disease, University of Limpopo, Turfloop Campus, South Africa) ? May 30 ? Jun. 1, 2011
Conserved Epitope Regions (Presentation, ASV 2011, the 30th Annual Meeting for the American Society for Virology, Minneapolis MN) ? Jul. 16-20, 2011
Presentation entitled “Informatics, Infectious Diseases and Human-Microbe-Environment Interactions”.
show more...
Click to view Abstract and link to full text.
show more...
Joseph J. Gillespie, Alice R. Wattam, Stephen A. Cammer, Joseph Gabbard, Maulik P. Shukla, Oral Dalay, Timothy Driscoll, Deborah Hix, Shrinivasrao P. Mane, Chunhong Mao, Eric K. Nordberg, Mark Scott, Julie R. Schulman, Eric E. Snyder, Daniel E. Sullivan, Chunxia Wang, Andrew Warren, Kelly P. Williams, Tian Xue, Hyun Seung Yoo, Chengdong Zhang, Yan Zhang, Rebecca Will, Ronald W. Kenyon, and Bruno W. Sobral (2011). “PATRIC: The Comprehensive Bacterial Bioinformatics Resource with a Focus on Human Pathogenic Species”
Infect. Immun 79 (11): 4286-98. doi:10.1128/IAI.00207-11. PMC 3257917. PMID 21896772.
Abstract
Funded by the National Institute of Allergy and Infectious Diseases, the Pathosystems Resource Integration Center (PATRIC) is a genomics-centric relational database and bioinformatics resource designed to assist scientists in infectious disease research. Specifically, PATRIC provides scientists with 1. a comprehensive bacterial genomics database, 2. a plethora of associated data relevant to genomic analysis, and 3. an extensive suite of computational tools and platforms for bioinformatics analysis. While the primary focus of PATRIC is to advance the knowledge underlying the biology of human pathogens, all publicly available genome-scale data for bacteria are compiled and continually updated, thereby enabling comparative analyses to reveal the basis for differences between infectious free-living and commensal species. Herein we summarize the major features available at PATRIC, dividing the resources into two major categories: ‘Organisms, Genomes and Comparative Genomics’ and ‘Recurrent Integration of Community-Derived Associated Data’. Additionally, we present two experimental designs typical of bacterial genomics research and execute both projects using only PATRIC data and tools. These applications encompass a broad range of the data and analysis tools available, illustrating practical uses of PATRIC for the biologist. Finally, a summary is provided of PATRIC’s outreach activities, collaborative endeavors, and future research directions.
View Abstract or download Manuscript via the ASM Website.
EuPathDB team members will be present throughout the meeting and available at a booth during all poster sessions.
Sept 11-15, 2011
Infectious disease research is generating an increasing amount of disparate data on pathogenic systems. There is a growing need for resources that effectively integrate, analyze, deliver and visualize these data, both to improve our understanding of infectious diseases and to facilitate the development of strategies for disease control and prevention.
Click to view Abstract and link to full text.
show more...
Tim Driscoll, Joseph L. Gabbard, Chunhong Mao, Oral Dalay, Maulik Shukla, Clark C. Freifeld, Anne Gatewood Hoen, John S. Brownstein JS, Bruno W. Sobral. Integration and visualization of host-pathogen data related to infectious diseases. Bioinformatics. 2011 Jun 27. [Epub ahead of print] PubMed PMID: 21712250.
Abstract
Infectious disease research is generating an increasing amount of disparate data on pathogenic systems. There is a growing need for resources that effectively integrate, analyze, deliver and visualize these data, both to improve our understanding of infectious diseases and to facilitate the development of strategies for disease control and prevention.
We have developed Disease View, an online hostpathogen resource that enables infectious disease-centric access, analysis, and visualization of host-pathogen interactions. In this resource, we associate infectious diseases with corresponding pathogens, provide information on pathogens, pathogen virulence genes and the genetic and chemical evidences for the human genes that are associated with the diseases. We also deliver the relationships between pathogens, genes, and diseases in an interactive graph and provide the geolocation reports of associated diseases around the globe in real time. Unlike many other resources, we have applied an iterative, user-centered design process to the entire resource development, including data acquisition, analysis, and visualization. Availability and Implementation: Freely available at http://www.patricbrc.org; all major web browsers supported.
View Full Paper at PubMed
Presentation entitled “Informatics-driven biological research: Infectious diseases as an example”.
show more...
Abstract
Infectious disease researchers have to deal with diverse types of data in order to develop hypotheses about candidate macromolecules that can be used to design countermeasures (diagnostics, vaccines and therapeutics). Even considering only molecular data, it is a challenge to access all the public information available to them and implement workflows that support their analysis needs. I will use the example of Brucella spp to illustrate how public, open, freely available resources can be designed, developed, and implemented in support of such infectious disease research and development goals. There are now 40 genomes of Brucella (Alphaproteobacteria; Rhizobiales) sequenced, sampling all known species and biovars of this facultative intracellular pathogen. A phylogenomic analysis of these genomes united all Brucella when they were compared to outgroups including Ochrobactrum, Bartonella, Mesorhizobium and Agrobacterium. Although the Brucella genomes are united, there is some interesting diversity. A well-studied group of Brucella species are united in a clade separated by a long branch. This large clade has little phylogenetic depth, but species within this group are known to have specific host preferences. Sub-branching patterns within this group reflect these host preferences and specific protein families unique to, and absent from, these groups were identified. Two novel strains, Brucella inopinata BO1T and BO2, were recently identified and isolated from human patients. Genomes from these strains, as well as two new isolates isolated from Australian rodents (Brucella spp. NF2653 and 83/13), are quite unique and separated from the rest of the Brucella. Although they share many similarities with the other Brucella species, they are missing many large areas of their genomes that can be seen in other Brucella. Analyses of these areas using new bioinformatic tools and data resources have shown that some of these missing areas include previously identified genomic islands and virulence factors, and there are novel findings as well that impact biochemical pathways and unique changes in the synthesis of lipopolysaccharide.
Presentation will be posted following event.
“The American Society for Nutrition (ASN), American Society of Animal Science (ASAS) and American Dairy Science Association (ADSA) are collaborating on a one-day pre-conference event: Agri-Medical Research: Providing Dual Benefit for Agriculture and Human Health, Saturday, July 9, in New Orleans.”
“This ASN-ASAS-ADSA pre-conference to the 2011 ADSA-ASAS Joint Annual Meeting will cover biomedical and agricultural interventions or therapies to improve both human health, companion animal health and farm animal health and production. Themes include metabolism, developmental origin of adult disease and infectious (zoonotic) diseases and nutritional impact of pro-inflammatory response. Each symposium will span the spectrum of mechanistic to applied and include implications for a variety of animal species. Keynotes from outside, related disciplines will bring a unique view to the context of the presented science.”
2011 ASN-ASAS-ADSA Pre-conference
This residential workshop to be held at the Wellcome Trust Genome Campus, Hinxton, Cambridge, UK from 3-7 October 2011. The aim of this workshop is to provide experimental biologists with hands-on experience in genomic-scale data analysis, including genome browsers and comparison tools, methods for data integration, and resources for sophisticated data mining. Examples and exercises will be drawn primarily from Plasmodium and kinetoplastida parasites but are likely also to include other organisms contained in EuPathDB.org and / or GeneDB.org, thus applicants working on any protozoan parasite from these resources will benefit.
Taught by a collaboration between the Parasite Genomics Group (Wellcome Trust Sanger Institute, UK) and the Eukaryotic Pathogen Bioinformatics Resource Center (University of Georgia & University of Pennsylvania, US), the programme will include lectures on genomics and bioinformatics techniques, interspersed with hands-on exercises.
Application deadline: 1 July 2011
EuPathDB Workshop
October 3-7, 2011
Chapter Title: Comparative genomics and phylogenomics of the Brucella.
show more...
Sobral, B. and A. Wattam (2011). Comparative genomics and phylogenomics of the Brucella. Book chapter in “Brucella: Molecular Microbiology and Genetics”. I. L.-G. D. O’Callaghan, Horizon Scientific Press.
Introduction
Brucella species are characterized by extremely high levels of nucleotide similarity and yet vary in microbial and disease phenotypes, as well as in pathogenicity and host preference. These variations initially resulted in classification of six species; B. abortus, B. canis, B. melitensis, B. neotomae, B. ovis and B. suis. The lack of sequence diversity has inhibited molecular studies, but the development of new techniques and the recent availability of genome sequences have revealed interesting differences, including the expansion of the known Brucella species.
Recently, the Brucella genus has expanded by the discovery of four new species, B. ceti and B. pinnipedialis from marine mammals, B. inopinata from new human isolates, and B. microti, isolated from a rodent in the Czech Republic . An as yet unnamed isolate from Australian rodents might also be classified as a new species. This review summarizes the current state of knowledge of the genetic diversity within Brucella, including both the classical and new species, with particular emphasis on comparing the genome sequences, the phylogenies produced by a variety of methods and specific families of genes in particular functions and pathways.
Presenters included Dr. Bruno Sobral, Dr. Mane Shrinivasrao, and Eric Nordberg.
show more...
Overview
Bruno Sobral, Shrinivasrao Mane, and Eric Nordberg gave invited presentations at a “Next Generation Sequencing: Transformative Technology for Biodiversity Science” workshop in Washington, DC on April 18-20, sponsored by the The Smithsonian Institution (SI), American Museum of Natural History (AMNH) and the Food and Drug Administration (FDA). The purpose of the workshop was to invite software programmers who are actively working on NGS analysis software and pipelines to talk about integration of software for comparative genomics and metagenomics, including data input and outputs and future directions, to directly improve the software pipelines used at the FDA, USDA and CDC as well as others. The PATRIC team members are participating in the Bioinformatic Software for Comparative Genomics and Metagenomics sections. The discussion topics for software integration and future development include contig assembly and alignment, annotation and homology determination, phylogenomics, targeted resequencing, population genetics, metagenomics, and analysis of expression data (transcriptomes).
The PATRIC team gave three presentations entitled:
- PATRIC, Bacterial Resources, Web-Enabled Data/Analytical Capabilities and Challenges of Interoperability, Standardization, and Automation by Dr. Bruno Sobral.
- Workflows for Next-gen Sequence Analysis at VBI by Dr. Mane Shrinivasrao
- Phylogenomics at VBI by Eric Nordberg
Regarding the workshop, Dr. Sobral commented, “The opportunity to apply high-throughput data to the analysis of diversity as well as toward the mapping of phenotypic diversity to genetic diversity is enormously exciting scientifically and of great value to our understanding of the ecology of health, whether it be of our planet or our bodies. The leadership of the organizations that brought together this meeting has provided a truly transdisciplinary nexus in which to discuss how to make such data accessible to a variety of communities in the most effective manner possible. In the context of the PATRIC and Pathoge Portal projects, we can easily see how what we are going can be made of greatest relevance and utility to the constituencies present and are excited to be able to work with them more closely in the future to ensure that everything we build and deploy is also contributing to their interests and needs in the most effective and scalable manner possible.”
Click to view Abstract and link to full text.
show more...
Sophia Ananiadou, Dan Sullivan, William Black, Gina-Anne Levow, Joseph J. Gillespie,
Chunhong Mao, Sampo Pyysalo, BalaKrishna Kolluru, Junichi Tsujii, Bruno Sobral. (2011). Named Entity Recognition for Bacterial Type IV Secretion Systems. PLoS ONE. March 29, 2011.
Abstract
Research on specialized biological systems is often hampered by a lack of consistent terminology, especially across species. In bacterial Type IV secretion systems genes within one set of orthologs may have over a dozen different names. Classifying research publications based on biological processes, cellular components, molecular functions, and microorganism species should improve the precision and recall of literature searches allowing researchers to keep up with the exponentially growing literature, through resources such as the Pathosystems Resource Integration Center (PATRIC, patricbrc.org). We developed named entity recognition (NER) tools for four entities related to Type IV secretion systems: 1) bacteria names, 2) biological processes, 3) molecular functions, and 4) cellular components. These four entities are important to pathogenesis and virulence research but have received less attention than other entities, e.g., genes and proteins. Based on an annotated corpus, large domain terminological resources, and machine learning techniques, we developed recognizers for these entities. High accuracy rates (>80%) are achieved for bacteria, biological processes, and molecular function. Contrastive experiments highlighted the effectiveness of alternate recognition strategies; results of term extraction on contrasting document sets demonstrated the utility of these classes for identifying T4SS-related documents.
View Full Paper at the PLoS One Website.
Presentation entitled "Informatics Enabled Infectious Disease
Research and Development".
show more...
Overview
Dr. Sobral’s presentation, “Informatics Enabled Infectious Disease Research and Development”, focuses on the conceptual framework for development of informatics resources supporting multi-stakeholder infectious disease communities, highlighting the NIAID-funded PATRIC information system, and its applicability to bacterial infectious disease research. The presentation was requested under the Cooperative Biological Engagement Program as part of the Science and Disease Surveillance Review. In attendance are members of the Defense Threat Reduction Agency (DTRA) of the Department of Defense (DoD) as well as US scientists and scientists and representatives from the Russian Federation states. Dr. Sobral also has started his service at this meeting as a science advisor to the Central Public Health Reference Laboratory of the Republic of Georgia.
Presentation entitled “Leveraging Bacterial Bioinformatics Resource Center Data for HMP Research”.
show more...
Abstract
Genomics, transcriptomics, and proteomics research in pathogenic bacteria is related to and complementary to research into the human microbiome. The PAThosystems Resource Center (PATRIC, patricbrc.org), for example, aggregates a wide range of organism specific information on genomes, proteomes, metabolic pathways, diseases, and related literature. Using metagenomic abundance profiles, one can link metagenomes to their constituent organisms and perform analysis on a wide range of data related to those organisms. Important biological questions require both metagenomic and genome-specific data to be resolved. For example: are there potential reservoirs for antibiotic resistance in the human microbiome, to what extent is lateral transfer a factor in the genomics of the microbiome organisms, and which bacteria are communal in some host sites while pathogenic in other sites? The first step to answering questions such as these is to integrate detailed genomic and metagenomic data. This poster describes functionality under development in the PATRIC resource to link broad sets of organism specific data to publicly available data in PATRIC and MG-RAST metagenomic analysis results.
International Human Microbiome Congress
The 11th International Congress on Toxoplasmosis will be held in Ottawa, Canada. The meeting site is a spectacular hotel originating in the golden days of the railroad. The Fairmont Chateau Laurier overlooks several national landmarks, including the Rideau Canal, the Ottawa River, the Canadian Parliament and its Library, Major's Hill Park, Gatineau Park, the ByWard Market, and the National Gallery. The meeting site is surrounded on 3 sides by green spaces and easily accessible bike paths, and bike rentals.
The meeting will start with an evening session on Saturday June 25th, and will end in the evening of Tuesday June 28th, with departure on Wednesday June 29th after breakfast.
The registration website is up and running, and we would like to invite all of you to visit http://www.toxomeeting.org, register your participation, and submit your abstract.
Abstracts are due March 31, 2011
Deadline for registration and payment is April 15, 2011
ToxoDB help desk
June 25-29, 2011
Presentation entitled “Informatics-Driven Infectious Disease Research”.
show more...
Abstract
Informatics-driven approaches change how research and development are conducted, who participates, and enable systems-oriented views science and research. The CyberInfrastructure Division of the Virginia Bioinformatics Institute at Virginia Tech is a highly transdisciplinary, informatics-based team that researches, develops, deploys and uses information systems in support of diverse communities, with a strong historical and current focus in infectious diseases (ID). Most life sciences researchers have a very strong desire for the full integration of data and analysis tools delivered through a single interface. Data analysis, visualization, interpretation, and integration from the perspective of a given research community and its interests is best handled through specific and close interaction with that community and interoperation with major comprehensive data resources, such as those at EMBL or NCBI.
ID research and development provides a uniquely challenging and high impact opportunity to develop resources that interoperate (syntax) with comprehensive resources while integrating (semantics) various types of data and analysis systems for the specific needs of a global community. The biological complexity of infectious disease systems, which are composed of multiple scales of interactions between potential pathogens, hosts (and vectors) and the environment, challenges information resources because of the breadth of organism-organism and organism-environment interactions that are needed to understand outcomes such as disease, asymptomatic carrying, and disease resistance. Beyond research, applications of integrated data for ID serves a variety of constituencies, such as clinical, diagnostic, drug and vaccine development, and epidemiological, which are very important applied areas of data utilization. Thus there is a complexity represented by the data users and their needs and workflows, making ID an opportune area in which to develop, deploy and use CyberInfrastructure.
BIOSTEC 2011 Conference Link
The biennial KMCB Meeting provides a forum for discussion of the Molecular Cell Biology of Trypanosomes, Leishmania, and related model organisms, without restrictions on attendance.
On-line registration is open and an outline program is available at http://www.mbl.edu/kmcb/2011/index.php. The abstract deadline is midnight (in New York) on WEDNESDAY February 16th.
The meeting will commence with registration on the afternoon of Friday April 8, with dinner followed by a scientific session and social mixer, and will conclude by noon on Tuesday April 12. Please plan to stay until the end. Because the 2011 meeting runs for 4 days, there will be a free afternoon on Sunday April 10.
TriTrypDB help desk
April 8-12, 2011
Registration is open for the 7th Annual BioMalPar Conference:
Biology and Pathology of the Malaria Parasite, 16 - 18 May 2011
Please visit the conference website: www.embl.de/training/events/2011/BMP11-01.
The conference will take place in Heidelberg, Germany at the EMBL Advanced Training Centre.
Registration and astract deadline: 23:59 CET on 11 February 2011. Please note that registration without an abstract submission is possible.
The purpose of the BioMalPar annual conference is to bring together malaria researchers from Europe and overseas (including Africa, America, Asia and
Australia) in order to present and share recent groundbreaking findings on fundamental malaria research. New insights will also be featured through the
use of poster sessions. This meeting will also provide an enriched environment for researchers at all stages of their career to interact with international
leaders in the field. The meeting will offer an excellent opportunity for sharing ideas and for potential development of new worldwide collaborations.
PlasmoDBDB help desk and Workshop
May 16-18, 2011
Presentation entitled “PATRIC, Pathogen Portal, and Infectious Disease Ontology”.
show more...
The primary goal of this workshop was to explore the potential benefits of using the IDO Infectious Disease Ontology as a controlled vocabulary for promoting consistency in the ways infectious disease data are described.
IDO Workshop Link
Presentation entitled “Comparative Genomics of Brucella spp.”.
show more...
Abstract
There are now 40 genomes of Brucella (Alphaproteobacteria; Rhizobiales)
sequenced, sampling all known species and biovars of this facultative intracellular pathogen. A phylogenomic analysis of these genomes united all Brucella when they were compared to outgroups including Ochrobactrum, Bartonella, Mesorhizobium and Agrobacterium. Although the Brucella genomes are united, there is some interesting diversity.
A well-studied group of Brucella species are united in a clade separated by a long branch. This large clade has little phylogenetic depth, but species within this group are known to have specific host preferences. Sub-branching patterns within this group reflect these host preferences and specific protein families unique to, and absent from, these groups were identified. Phylogenetic trees built using SNPs identified across all genomes had similar topology compared to the robust, multi-protein trees.
Two novel strains, Brucella inopinata BO1 and BO2, were recently identified and isolated from human patients. Genomes from these strains, as well as two new isolates isolated from Australian rodents (Brucella spp. NF2653 and 83/13), are quite unique and separated from the rest of the Brucella. Although they share many similarities with the other Brucella species, they are missing many large areas of their genomes that can be seen in other Brucella. Some of these missing areas include previously identified genomic islands and virulence factors, and there are novel findings as well.
These four genomes, segregated on two separate long branches, also have genes that are unique, often assembled together in stretches indicative of lateral transfer. Most of the unique proteins on the Brucella spp. 83/13 and NF2653 branch are hypothetical proteins that have not been identified in any other bacterial genome to date. The protein families unique to Brucella spp. BO1 and BO2 are shared among other bacterial genomes and some were probably acquired by lateral transfer. Both BO1 and BO2 have a weak ability, or total lack, to agglutinate the specific antisera for the LPS-O-antigens. Analysis of the protein families shows that BO1 is missing two crucial genes known to be vital for making smooth LPS in Brucella, and BO2 is missing almost all of them. BO2 has four genes necessary for making a rhamnose-based LPS, something that is unique to this genome.
Presentation entitled “PATRIC – PathoSystems Resource Integration Center”.
show more...
This presentation introduced the workshop attendees to the PATRIC website and established the connection between PATRIC and RAST.
Presentation entitled “The Impacts of New Sequencing Technologies on Infectious Disease Research”.
show more...
Abstract
The first 454 pyrosequencers shipped in 2005 and the first GS FLX shipped in January, 2007. By the end of 2007, there were three commercial “next-generation” sequencing systems available and “next-next-generation” systems on the horizon and Nature Methods had declared next-generation sequencing the Method of the Year 2007. In these few years, new sequencing technologies have transformed infectious disease research field, moving from offering “simply” genome sequencing for less money and in less time, to being used for epigenetics, transcriptomics, and metagenomics. My talk today will discuss the impacts we have seen on research, bioinformatics resources and collaborations in that time and some considerations about the implications for the future.
Conference Link
Genome sequencing of the M & S molecular forms of
An. gambiae were published today showing that speciation is more advanced than previously thought. This greater divergence betwe
en the genomes has highlights the need to identify those genes critical for initiating this process. A companion paper describes the development of a SNP genotyping platform for investigating gene flow between the
incipient species.
Genome browsers for the M & S molecular forms are available through Ve
ctorBase:
NIAID Bioinformatics Resource Centers: New Assets for Pathogen Informatics Greene J., Collins F., Lefkowitz E., Roos D., Scheuermann R., Sobral, B., Stevens R., White, O., Di Francesco, V.
The mosquito Culex quinquefasciatus poses a substantial threat to human and veterinary health as a primary vector of West Nile virus (WNV), the filarial worm Wuchereria bancrofti, and an avian malaria parasite. Comparative phylogenomics revealed an expanded canonical C. quinquefasciatus immune gene repertoire compared with those of Aedes aegypti and Anopheles gambiae. Transcriptomic analysis of C. quinquefasciatus genes responsive to WNV, W. bancrofti, and non-native bacteria facilitated an unprecedented meta-analysis of 25 vector-pathogen interactions involving arboviruses, filarial worms, bacteria, and malaria parasites, revealing common and distinct responses to these pathogen types in three mosquito genera. Our findings provide support for the hypothesis that mosquito-borne pathogens have evolved to evade innate immune responses in three vector mosquito species of major medical importance.
Culex quinquefasciatus (the southern house mosquito) is an important mosquito vector of viruses such as West Nile virus and St. Louis encephalitis virus, as well as of nematodes that cause lymphatic filariasis. C. quinquefasciatus is one species within the Culex pipiens species complex and can be found throughout tropical and temperate climates of the world. The ability of C. quinquefasciatus to take blood meals from birds, livestock, and humans contributes to its ability to vector pathogens between species. Here, we describe the genomic sequence of C. quinquefasciatus: Its repertoire of 18,883 protein-coding genes is 22% larger than that of Aedes aegypti and 52% larger than that of Anopheles gambiae with multiple gene-family expansions, including olfactory and gustatory receptors, salivary gland genes, and genes associated with xenobiotic detoxification.
Dr. Sobral’s presentation entitled “Informatics-Driven Infectious Disease Research: PATRIC, An all-bacterial Bioinformatics Resource Center”.
show more...
Dr. Sobral’s presentation entitled “Informatics-Driven Infectious Disease Research: PATRIC, An all-bacterial Bioinformatics Resource Center”. Followed by a PATRIC workshop with Dr. Stephen Cammer.
Abstract
ID research and development provides a uniquely challenging and high impact opportunity to develop resources that interoperate (syntax) with comprehensive resources while integrating (semantics) various types of data and analysis systems for the specific needs of a global community. The biological complexity of infectious disease systems, which are composed of multiple scales of interactions between potential pathogens, hosts (and vectors) and the environment, challenges information resources because of the breadth of organism-organism and organism-environment interactions that are needed to understand outcomes such as disease, asymptomatic carrying, and disease resistance. Beyond research, applications of integrated data for ID serves a variety of constituencies, such as clinical, diagnostic, drug and vaccine development, and epidemiological, which are very important applied areas of data utilization. Thus there is a complexity represented by the data users and their needs and workflows, making ID an opportune area in which to develop, deploy and use CyberInfrastructure. In this presentation I will showcase an example or two of how taking an informatics approach enabled by PATRIC can help scientists interact with data to develop hypotheses and test them.
Dr Cammer’s workshop showcases PATRIC tools, and demonstrates several types of analyses, which illustrate how PATRIC can become an integral part of current research
show more...
Dr Cammer’s workshop showcases PATRIC tools, and demonstrates several types of analyses, which illustrate how PATRIC can become an integral part of current research.
Abstract
PATRIC provides the computational resources for research on all sequenced bacterial genomes, with some emphasis placed on NIAID watchlist genera. Currently, PATRIC enables multifaceted analysis using the Genome Finder, Feature Finder, Protein Family Sorter, and Comparative Pathway Tool; and BLAST. In addition, PATRIC facilitates post-genomic data retrieval, literature review, taxonomy browsing, phylogenetic tree viewing, and Google searches. We will utilize these tools to execute several analyses aimed at illustrating how PATRIC can become an integral part of our research. Our workshop will first introduce PATRIC, focusing on available genomes, genome annotation using the subsystems approach, the resulting protein family classification of all coded sequences, and different annotations at PATRIC. PATRIC enables research on the more than 2000 bacterial genomes available by integrating numerous data and meta-data sources. We will describe PATRIC’s data-aware approach that further facilitates research by linking a researcher to many external repositories. We will next explore bacterial genomics using the Genome Finder. We will focus on searching and taxonomy based browsing, as well as the Watchlist Genera. We will then examine and explore the context-aware tabular browsing provided for each genomic context when using Genome Finder. These tabs connect a researcher to rich sources for relevant literature, web links, taxonomy, phylogenetic trees, genomic features, pathways, and post-genomic experimental data. Next we will utilize the Feature Finder to find specific proteins across different genomes, examine positional clustering of protein coding sequences along a genome, and identify RNA coding genes. In this part PATRIC’s flexible table sorting will be used to manipulate and retrieve data. Since protein family classification is paramount in understanding orthology, comparative genomics, and genome structure, the Protein Family Sorter will then be employed to compare genomes and investigate pan-genomics across a genus. The use of the Protein Family Sorter enables analysis of the proteomic repertoire of all sequenced bacteria. We will conduct further comparative studies using the Comparative Pathway Tool. The tool will be used to browse the annotated pathways of a specific organism and different annotations PATRIC provides for specific pathways. We will then perform comparative analysis of pathway distributions across different organisms and quality assessment using different annotations available for pathway proteins.
Finally, we will save a few minutes for open questions and discussion on use of the PATRIC resource and development of the infrastructure.
Bacillus anthracis, Francisella tularensis, and Yersinia pestis are bacterial pathogens that can cause anthrax, lethal acute pneumonic disease, and bubonic plague, respectively, and are listed as NIAID Category A priority pathogens for possible use as biological weapons. However, the interactions between human proteins and proteins in these bacteria remain poorly characterized leading to an incomplete understanding of their pathogenesis and mechanisms of immune evasion. In this study, we used a high-throughput yeast two-hybrid assay to identify physical interactions between human proteins and proteins from each of these three pathogens.
show more...
Dyer MD, Neff C, Dufford M, Rivera CG, Shattuck D, et al. 2010 The Human-Bacterial Pathogen Protein Interaction Networks of Bacillus anthracis, Francisella tularensis, and Yersinia pestis. PLoS ONE 5(8): e12089. doi:10.1371/journal.pone.0012089.
View Full Paper at the PLoS One Website.
Presentation entitled “Prokaryotic Annotation Status”.
show more...
Abstract
Annotation of prokaryotic genomes has changed dramatically with the decrease of costs of genomic sequencing and the increasingly decentralized and democratized manner in which prokaryotic genomes are sequenced. Data generation is out pacing our ability to comprehensively analyze it rendering valuable scientific knowledge inaccessible. Because of the decentralization of data generation, which is likely to continue as technologies evolve further, data are increasingly distributed, and centralizing and standardizing is both unaffordable and too time consuming to be practical. Furthermore, it is likely that a very significant portion of prokaryotic genomes are not made public in a timely manner. Although this situation is intractable, I believe we have a responsibility as a scientific community to tackle it.
“Annotation” comes from the Latin “annotatio” and the verb “annotare”, which means to “add a mark”. Generally, in genomic databases, this means to add explanations or comments to genomic sequences. These markings can be of various types, including, but not limited to, functional aspects of the genes/proteins known or predicted to be encoded by the sequences. In the current technological environment, prokyarotic annotations is typically done by a series of algorithms that markup the genome with respect to genes, proteins, RNAs and other such features. Any annotation of course is simply a hypothesis, whether the approach includes computational experiments, wet chemistry experiments, or some mixture of the two. Given this situation, one important feature that bioinformatics resources can provide is the ability to support multiple hypotheses with respect to annotation. One example where this is achieved is at the PaThosystems Resource Integration Center (PATRIC), one of NIAID’s Bioinformatics Resource Centers (BRCs) that deals with prokaryotes from the perspective of infectious diseases.
The American Society for Tropical Medicine and Hygiene will hold its 59th annual meeting from November 3rd-7th at the Marriott Atlanta Marquis Hotel in Atlanta, Georgia, USA
EuPathDB team members will be present throughout the meeting and available at a booth during all poster sessions.
Nov. 3-7, 2010
Presentation entitled “Dealing with integration and interoperation: Bioinformatics resource center for bacterial infectious disease research”.
show more...
Abstract
Public genome-scale data are deposited in globally distributed resources that have varying quality and annotation standards and data model for storage and querying. Often, these public resources are focused on data acquisition from large-scale data generation efforts such as major DNA sequencing centers, protein structure determination centers, and so on. Because of the breadth of most of these repositories, they tend to focus on data acquisition and dissemination to a broad audience in a timely manner. These major repositories play a fundamental role, but they cannot be highly focused on the needs of any specific community of data consumers for the purposes of computer assisted reasoning and research. The strength of these resources is their comprehensiveness. Their challenge is the lack of connectivity to the specific communities that are focused on data utilization (instead of generation).
Most researchers have a very strong desire for the full integration of data and analysis tools through a single interface. Data analysis, visualization, interpretation, and integration from the perspective of a given research community and its interests is best handled through specific and close interaction with that community and interoperation with major comprehensive data resources. Perhaps the historically best example of resources that are closely knit with their communities are those represented by the model organism information resources. Infectious disease research and development provides a uniquely challenging and high impact opportunity to develop resources that interoperate with comprehensive resources while integrating various types of data and analysis systems for the specific needs of a global community. The biological complexity of infectious disease systems, which are composed of interactions between potential pathogens, hosts (and vectors) and the environment, challenges information resources because of the breadth of organism-organism and organism-environment interactions that are needed to understand outcomes such as disease, asymptomatic carrying, and disease resistance. Beyond research, applications of integrated data for infectious diseases could serve a variety of constituencies, such as clinical, diagnostic, drug and vaccine development, and epidemiological, which are very important applied areas of data utilization. Thus there is a complexity represented by the data users and their needs and workflows as well.
In this talk I will discuss interoperability (syntactical) and integration (semantic) aspects of developing and deploying distributed information systems that serve the bacterial infectious disease community through a single interface, using the PAThosystems Resource Integration Center (PATRIC) as an example.
The XIIth International Congress of Parasitology (ICOPA) will be held in Melbourne, Australia, from 15-20th August 2010 at the new Exhibition and Convention Centre.
EuPathDB will present a lunch workshop on August 15th and will operate a booth (#30) in the exhibit hall every day of the conference.
Aug 15-20, 2010
This workshop will take place from October 18-19, 2010.
Workshop and presentation
October 18-19, 2010
The 10th annual international coccidiosis conference(ICC-10) will be held in Guangzhou, China from October 8th to the 13th.
EuPathDB team members will be present throughout the meeting and will run a help desk during poster sessions and conduct a full day workshop on October 13th.
October 8-13, 2010
EuPathDB team members will be present throughout the meeting and available at a booth during all poster sessions.
Sept 12-16, 2010
Event extraction approaches based on expressive structured representations of extracted information have been a significant focus of research in recent biomedical natural language processing studies. However, event extraction efforts have so far been limited to publication abstracts, with most studies further considering only the specific transcription factor-related subdomain of molecular biology of the GENIA corpus. To establish the broader relevance of the event extraction approach and proposed methods, it is necessary to expand on these constraints. In this study, we propose an adaptation of the event extraction approach to a subdomain related to infectious diseases and present analysis and initial experiments on the feasibility of event extraction from domain full text publications.
show more...
Sampo Pyysalo, Tomoko Ohta, Han-Cheol Cho, Dan Sullivan, Chunhong Mao, Bruno Sobral,
Jun’ichi Tsujii and Sophia Ananiadou (2010). Towards Event Extraction from Full Texts on Infectious Diseases. Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, ACL
2010, pages 132–140, Uppsala, Sweden, 15 July 2010.(View Full Paper at the Association of Computational Linguistics Website
As an obligatory parasite of humans, the body louse (Pediculus humanus humanus) is an important vector for human diseases, including epidemic typhus, relapsing fever, and trench fever. Here, we present genome sequences of the body louse and its primary bacterial endosymbiont Candidatus Riesia pediculicola. The body louse has the smallest known insect genome, spanning 108 Mb. Despite its status as an obligate parasite, it retains a remarkably complete basal insect repertoire of 10,773 protein-coding genes and 57 microRNAs. Representing hemimetabolous insects, the genome of the body louse thus provides a reference for studies of holometabolous insects. Compared with other insect genomes, the body louse genome contains significantly fewer genes associated with environmental sensing and response, including odorant and gustatory receptors and detoxifying enzymes. The unique architecture of the 18 minicircular mitochondrial chromosomes of the body louse may be linked to the loss of the gene encoding the mitochondrial single-stranded DNA binding protein. The genome of the obligatory louse endosymbiont Candidatus Riesia pediculicola encodes less than 600 genes on a short, linear chromosome and a circular plasmid. The plasmid harbors a unique arrangement of genes required for the synthesis of pantothenate, an essential vitamin deficient in the louse diet. The human body louse, its primary endosymbiont, and the bacterial pathogens that it vectors all possess genomes reduced in size compared with their free-living close relatives. Thus, the body louse genome project offers unique information and tools to use in advancing understanding of coevolution among vectors, symbionts, and pathogens.
Background
Transposable elements (TEs) are mobile sequences found in nearly all eukaryotic genomes. They have the ability to move and replicate within a genome, often influencing genome evolution and gene expression. The identification of TEs is an important part of every genome project. The number of sequenced genomes is rapidly rising, and the need to identify TEs within them is also growing. The ability to do this automatically and effectively in a manner similar to the methods used for genes is of increasing importance. There exist many difficulties in identifying TEs, including their tendency to degrade over time and that many do not adhere to a conserved structure. In this work, we describe a homology-based approach for the automatic identification of high-quality consensus TEs, aimed for use in the analysis of newly sequenced genomes.
Results
We describe a homology-based approach for the automatic identification of TEs in genomes. Our modular approach is dependent on a thorough and high-quality library of representative TEs. The implementation of the approach, named TESeeker, is BLAST-based, but also makes use of the CAP3 assembly program and the ClustalW2 multiple sequence alignment tool, as well as numerous BioPerl scripts. We apply our approach to newly sequenced genomes and successfully identify consensus TEs that are up to 99% identical to manually annotated TEs.
Conclusions
While TEs are known to be a major force in the evolution of genomes, the automatic identification of TEs in genomes is far from mature. In particular, there is a lack of automated homology-based approaches that produce high-quality TEs. Our approach is able to generate high-quality consensus TE sequences automatically, requiring the user to only provide a few basic parameters. This approach is intentionally modular, allowing researchers to use components separately or iteratively. Our approach is most effective for TEs with intact reading frames. The implementation, TESeeker, is available for download as a virtual appliance, while the library of representative TEs is available as a separate download.
Dr. Sobral Visits Moscow, Russia to Promote PATRIC's Bioinformatics Tools and Techniques
show more...
Dr. Bruno Sobral is invited to attend this first joint NIAID-ISTC conference on “Bioinformatics Tools and Techniques for Allergy and Infectious Disease Research”. The purpose of the conference is to showcase the initiatives of each organization and identify touch points for future collaborations.
With an obligate intracellular lifestyle, Alphaproteobacteria of the order Rickettsiales have inextricably coevolved with their various eukaryotic hosts, resulting in small, reductive genomes and strict dependency on host resources. Unsurprisingly, large portions of Rickettsiales genomes encode proteins involved in transport and secretion. One particular transporter that has garnered recent attention from researchers is the type IV secretion system (T4SS).
show more...
Gillespie JJ, Brayton KA, Williams KP, Diaz MA, Brown WC, Azad AF, Sobral BW. (2010) Phylogenomics reveals a diverse Rickettsiales type IV secretion system. Infect. Immun. (View Abstract at PubMed)
Systems-biology and infectious-disease (host-pathogen-environment) research and development is becoming increasingly dependent on integrating data from diverse and dynamic sources. Maintaining integrated resources over long periods of time presents distinct challenges. This review describes experiences and lessons learned from integrating data in two five-year projects focused on pathosystems biology
show more...
Sullivan DE, Gabbard JL, Shukla M, Sobral B. (2010) Data integration for dynamic and sustainable systems biology resources: challenges and lessons learned. Chem. Biodivers. (View Abstract at PubMed)
Murine typhus is a flea-borne febrile illness that is caused by the obligate intracellular bacterium, Rickettsia typhi. The cat flea, Ctenocephalides felis, acquires R. typhi by imbibing a bloodmeal from a rickettsemic vertebrate host. To explore which transcripts are expressed in the midgut in response to challenge with R. typhi, cDNA libraries of R. typhi-infected and uninfected midguts of C. felis were constructed.
show more...
Dreher-Lesnick SM, Ceraul SM, Lesnick SC, Gillespie JJ, Anderson JM, Jochim RC, Valenzuela JG, Azad AF. (2010) Analysis of Rickettsia typhi-infected and uninfected cat flea (Ctenocephalides felis) midgut cDNA libraries: deciphering molecular pathways involved in host response to R. typhi infection. Insect Mol. Biol. (View Abstract at PubMed)
Dr. Bruno Sobral travels to Europe to feature PATRIC and Pathogen Portal resources and to discuss opportunities for surveillance and epidemiological information and data exchange with PATRIC.
show more...
National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands, April 14-25, 2010
Dr. Bruno Sobral makes a presentation featuring the PATRIC and Pathogen Portal resources and discussing opportunities for surveillance and epidemiological information and data exchange with PATRIC.
We are developing a set of ontologies dealing with vector-borne diseases as well as the arthropod vectors that transmit them. After building ontologies for mosquito and tick anatomy we continued this project with an ontology of insecticide resistance followed by a series of ontologies that describe malaria as well as physiological processes of mosquitoes that are relevant to, and involved in, disease transmission. These will later be expanded to encompass other vector-borne diseases as well as non-mosquito vectors. The aim of the whole undertaking, which is worked out in the frame of the international IDO (Infectious Disease Ontology) project, is to provide the community with a set of ontological tools that can be used both in the development of specific databases and, most importantly, in the construction of decision support systems (DSS) to control these diseases.
TriTrypDB: a functional genomic resource for the Trypanosomatidae
Nucleic Acids Research 2010 38(Database issue):D457-D462; doi:10.1093/nar/gkp851
Aslett M, Aurrecoechea C, Berriman M, Brestelli J, Brunk BP, Carrington M,
Depledge DP, Fischer S, Gajria B, Gao X, Gardner MJ, Gingle A, Grant G,
Harb OS, Heiges M, Hertz-Fowler C, Houston R, Innamorato F, Iodice J,
Kissinger JC, Kraemer E, Li W, Logan FJ, Miller JA, Mitra S, Myler PJ,
Nayak V, Pennington C, Phan I, Pinney DF, Ramasamy G, Rogers MB, Roos DS,
Ross C, Sivam D, Smith DF, Srinivasamoorthy G, Stoeckert CJ Jr.,
Subramanian S, Thibodeau R, Tivey A, Treatman C, Velarde G, Wang H.
TriTrypDB (http://tritrypdb.org) is an integrated database
providing access to genome-scale datasets for kinetoplastid
parasites, and supporting a variety of complex queries
driven by research and development needs. TriTrypDB is a
collaborative project, utilizing the GUS/WDK computational
infrastructure developed by the Eukaryotic Pathogen
Bioinformatics Resource Center (EuPathDB.org) to integrate
genome annotation and analyses from GeneDB and elsewhere
with a wide variety of functional genomics datasets made
available by members of the global research community, often
pre-publication. Currently, TriTrypDB integrates datasets
from Leishmania braziliensis, L. infantum, L. major, L.
tarentolae, Trypanosoma brucei and T. cruzi. Users may
examine individual genes or chromosomal spans in their
genomic context, including syntenic alignments with other
kinetoplastid organisms. Data within TriTrypDB can be
interrogated utilizing a sophisticated search strategy
system that enables a user to construct complex queries
combining multiple data types. All search strategies are
stored, allowing future access and integrated searches.
'User Comments' may be added to any gene page, enhancing
available annotation; such comments become immediately
searchable via the text search, and are forwarded to
curators for incorporation into the reference annotation
when appropriate.
PMID: 19843604
Anaplasma and related Ehrlichia spp. are important tick-borne, Gram-negative bacterial pathogens of livestock and humans that cause acute infection and disease and can persist. Immunization of cattle with an Anaplasma marginale fraction enriched in outer membranes (OM) can provide complete protection against disease and persistent infection. Serological responses of OM vaccinees to the OM proteome previously identified over 20 antigenic proteins, including three type IV secretion system (T4SS) proteins, VirB9-1, VirB9-2, and VirB10.
show more...
Sutten EL, Norimine J, Beare PA, Heinzen RA, Lopez JE, Morse K, Brayton KA, Gillespie JJ, Brown WC. (2010) Anaplasma marginale type IV secretion system proteins VirB2, VirB7, VirB11, and VirD4 are immunogenic components of a protective bacterial membrane vaccine. Infect. Immun. (View Abstract at PubMed)
Dr. Stephen Cammer makes a presentation at the 3rd Center for Structural Genomics of Infectious Diseases (CSGID) Annual Meeting in Chicago, IL.
show more...
Dr. Stephen Cammer makes a presentation at the 3rd Center for Structural Genomics of Infectious Diseases (CSGID) Annual Meeting in Chicago, IL. Dr. Cammer’s presentation focused on the PATRIC resource and its integration of protein structural data.
The phylogeny of the large bacterial class Gammaproteobacteria has been difficult to resolve. Here we apply a telescoping multiprotein approach to the problem for 104 diverse gammaproteobacterial genomes, based on a set of 356 protein families for the whole class and even larger sets for each of four cohesive subregions of the tree. Although the deepest divergences were resistant to full resolution, some surprising patterns were strongly supported.
show more...
Williams KP, Gillespie JJ, Sobral BW, Nordberg EK, Snyder EE, Shallom JM, Dickerman AW. (2010) Phylogeny of gammaproteobacteria. J. Bacteriol. (View Abstract at PubMed)
EuPathDB is organizing a workshop in Montevideo, Uruguay entitled "
Working with Pathogen Genomes"
(March 16-19) that will be held right after the International Society
for Computational Biology Latin-America Conference. The
workshop will include hands-on training sessions on the use of EuPathDB
databases (ie.
PlasmoDB,
TriTrypDB,
OrthoMCL, etc.),
metabolic pathway reconstruction (sponsored by NIAID),
TDRtargets and
SchistoDB.
The deadline for this
workshop is January 20th. Application forms may be downloaded here
and emailed to help@eupathdb.org
EupathDB instructors will teach the effective use of EuPathDB and its component sites CryptoDB, GiardiaDB, PlasmoDB, ToxoDB, TrichDB and TriTrypDB
March 16-19, 2010
Working with Parasite Database Resources Workshop -- note: this workshop was postponed due to volcano activity.
EupathDB instructors will teach the effective use of EuPathDB and its component sites CryptoDB, GiardiaDB, PlasmoDB, ToxoDB, TrichDB and TriTrypDB
October 22-26, 2010
Phylogenomics reveals extreme gene loss in typhus group (TG) rickettsiae relative to the levels for other rickettsial lineages. We report here a curious protease-encoding gene (ppcE) that is conserved only in TG rickettsiae. As a possible determinant of host pathogenicity, ppcE warrants consideration in the development of therapeutics against epidemic and murine typhus.
show more...
Ammerman NC, Gillespie JJ, Neuwald AF, Sobral BW, Azad AF. (2009) A typhus group-specific protease defies reductive evolution in rickettsiae. J. Bacteriol. (View Abstract at PubMed)
EuPathDB members will be at the 58th annual American Society for
Tropical Medicine and Hygiene (ASTMH) meeting that will be held in
Washington D.C., USA.
Come see us at our booth
in the exhibit hall.
Exhibit Hall Hours:
Nov 18, 2009 7:30pm -
9:30pm
Nov 19, 2009 9:30am - 10:30am
Nov 19, 2009 12:00pm -
1:30pm
Nov 19, 2009 3:00pm -
4:00pm
Nov 20, 2009 9:30am - 10:30am
Nov 20, 2009 12:00pm -
1:30pm
Nov 20, 2009 3:00pm -
4:00pm
Nov 21, 2009 9:30am - 10:30am
Nov 21, 2009 12:00pm -
2:30pm
EuPathDB team members will be present throughout the meeting and available at a booth during all poster sessions.
Nov. 18-22, 2009
The annual workshops are intended to be a thorough and intensive introduction to eukaryotic pathogen database resources that are part of the
EuPathDB Bioinformatics Resource Center (
AmoebaDB,
CryptoDB,
GiardiaDB,
MicrosporidiaDB,
PlasmoDB,
ToxoDB,
TrichDB,
TriTrypDB and
OrthoMCL DB).
Four day bioinformatics workshop for thirty scientists, focusing on effective use of EuPathDB and its component sites CryptoDB, GiardiaDB, PlasmoDB, ToxoDB, TrichDB and TriTrypDB
Jun 6-9, 2010
Dr. Rebecca Wattam gives a presentation entitled, "Pathosystems Resource Integration Center for Bacterial Diseases" and presents a poster with the same title.
show more...
17th Annual Microbial Genomics Conference, Rocky Gap State Park, MD, October 11, 2009.
Dr. Rebecca Wattam gives a presentation entitled, “Pathosystems Resource Integration Center for Bacterial Diseases” and presents a poster with the same title.
The 20th annual molecular parasitology meeting will be held in Woods Hole, MA from September 13th to the 17th.
EuPathDB team members will be present throughout the meeting and available at a booth during all poster sessions.
Sept 13-17, 2009
We are developing a set of ontologies that deal with vector-borne diseases and the arthropod vectors that transmit them. For practical reasons (application priorities), we initiated this project with an ontology of insecticide resistance followed by a series of ontologies that describe malaria as well as physiological processes of mosquitoes that are relevant to, and involved in, disease transmission. These will be expanded to encompass other vector-borne diseases as well as non-mosquito vectors. The aim of the whole undertaking, which is worked out in the frame of the international IDO (Infectious Disease Ontology) project, is to provide the community with a set of ontological tools that can be used both in the development of specific databases and, most importantly, in the construction of decision support systems to control these diseases.
PlasmoDB: a functional genomic database for malaria parasites
Nucleic Acids Res. 2009. 37:D539-D543
Aurrecoechea, C., J. Brestelli, B. P. Brunk, J. Dommer, S. Fischer,
B. Gajria, X. Gao, A. Gingle, G. Grant, O. S. Harb, M. Heiges, F. Innamorato,
J. Iodice, J. C. Kissinger, E. Kraemer, W. Li, J. A. Miller, V. Nayak,
C. Pennington, D. F. Pinney, D. S. Roos, C. Ross, C. J. Stoeckert, Jr.,
C. Treatman, and H. Wang
PlasmoDB (http://PlasmoDB.org) is a functional genomic
database for Plasmodium spp. that provides a resource for
data analysis and visualization in a gene-by-gene or
genome-wide scale. PlasmoDB belongs to a family of genomic
resources that are housed under the EuPathDB
(http://EuPathDB.org) Bioinformatics Resource Center (BRC)
umbrella. The latest release, PlasmoDB 5.5, contains
numerous new data types from several broad
categories--annotated genomes, evidence of transcription,
proteomics evidence, protein function evidence, population
biology and evolution. Data in PlasmoDB can be queried by
selecting the data of interest from a query grid or drop
down menus. Various results can then be combined with each
other on the query history page. Search results can be
downloaded with associated functional data and registered
users can store their query history for future retrieval or
analysis.
PMID: 18957442
GiardiaDB and TrichDB: integrated genomic resources for the eukaryotic protist pathogens Giardia lamblia and Trichomonas vaginalis
Nucleic Acids Res. 2009. 37:D526-D530
Aurrecoechea, C., J. Brestelli, B. P. Brunk, J. M. Carlton, J. Dommer,
S. Fischer, B. Gajria, X. Gao, A. Gingle, G. Grant, O. S. Harb, M. Heiges,
F. Innamorato, J. Iodice, J. C. Kissinger, E. Kraemer, W. Li, J. A. Miller,
H. G. Morrison, V. Nayak, C. Pennington, D. F. Pinney, D. S. Roos,
C. Ross, C. J. Stoeckert, Jr., S. Sullivan, C. Treatman, and H. Wang
GiardiaDB (http://GiardiaDB.org) and TrichDB
(http://TrichDB.org) house the genome databases for Giardia
lamblia and Trichomonas vaginalis, respectively, and
represent the latest additions to the EuPathDB
(http://EuPathDB.org) family of functional genomic
databases. GiardiaDB and TrichDB employ the same framework
as other EuPathDB sites (CryptoDB, PlasmoDB and ToxoDB),
supporting fully integrated and searchable databases.
Genomic-scale data available via these resources may be
queried based on BLAST searches, annotation keywords and
gene ID searches, GO terms, sequence motifs and other
protein characteristics. Functional queries may also be
formulated, based on transcript and protein expression data
from a variety of platforms. Phylogenetic relationships may
also be interrogated. The ability to combine the results
from independent queries, and to store queries and query
results for future use facilitates complex, genome-wide
mining of functional genomic data.
PMID: 18824479
PlasmoDB Workshop
May 18-20, 2009
Presentation -- Introduction to the EuPathDB Bioinfomatics Resource
Feb 24-28, 2009
A forum for anyone working on or interested in the Molecular Cell Biology of Trypanosomes and Leishmania and related model organisms.
TriTrypDB help desk
April 26-29, 2009
PlasmoDB Presentation and EuPathDB Workshop
May 24-28, 2009
The annual workshops are intended to be a thorough and intensive introduction to eukaryotic pathogen database resources that are part of the
ApiDB/EuPathDB Bioinformatics Resource Center (
CryptoDB,
GiardiaDB,
PlasmoDB,
ToxoDB,
TrichDB,
TriTrypDB and
OrthoMCL DB).
Four day bioinformatics workshop for thirty scientists, focusing on effective use of EuPathDB and its component sites CryptoDB, GiardiaDB, PlasmoDB, ToxoDB, TrichDB and TriTrypDB
Jun 7-10, 2009
PlasmoDB Help Desk
May 28-29, 2009
Presentation and ToxoDB Help Desk
Jun 19-23, 2009
PlasmoDB workshop
Apr 5-7, 2006
Discussion of bioinformatics and annotation standards for microbial genomes
Aug 27-29, 2006
Booth with computers for hands-on experience with EuPathDB/CryptoDB/GiardiaDB/PlasmoDB/ToxoDB/TrichDB
Dec 7-11, 2008
Keynote lecture on parasite genome databases and comparative genomics; ApiDB/CryptoDB/PlasmoDB/ToxoDB help desk
Nov 6-9, 2006
Booth with computers for hands-on experience with ApiDB/CryptoDB/PlasmoDB/ToxoDB
Nov 4-8, 2007
Four day bioinformatics workshop for thirty apicomplexan researchers, focusing on effective use of ApiDB/CryptoDB/PlasmoDB/ToxoDB.
Jun 3-6, 2007
ToxoDB: an integrated Toxoplasma gondii database resource
Nucleic Acids Res. 2007. 36:D553-6
B. Gajria, A. Bahl, J. Brestelli, J. Dommer, S. Fischer,
X. Gao, M. Heiges, J. Iodice, J. C. Kissinger, A. J. Mackey, et al.
ToxoDB (http://ToxoDB.org) is a genome and functional
genomic database for the protozoan parasite Toxoplasma
gondii. It incorporates the sequence and annotation of the
T. gondii ME49 strain, as well as genome sequences for the
GT1, VEG and RH (Chr Ia, Chr Ib) strains. Sequence
information is integrated with various other genomic-scale
data, including community annotation, ESTs, gene expression
and proteomics data. ToxoDB has matured significantly since
its initial release. Here we outline the numerous updates
with respect to the data and increased functionality
available on the website.
PMID: 18003657
ApiDB: integrated resources for the apicomplexan bioinformatics resource center
Nucleic Acids Research. 2007. 35:D427-30
Cristina Aurrecoechea, Mark Heiges, Haiming Wang, Zhiming Wang,
Steve Fischer, Philippa Rhodes, John Miller, Eileen Kraemer,
Christian J. Stoeckert, Jr., David S. Roos and Jessica C. Kissinger
ApiDB (http://ApiDB.org) represents a unified entry point
for the NIH-funded Apicomplexan Bioinformatics Resource
Center (BRC) that integrates numerous database resources and
multiple data types. The phylum Apicomplexa comprises
numerous veterinary and medically important parasitic
protozoa including human pathogenic species of the genera
Cryptosporidium, Plasmodium and Toxoplasma. ApiDB serves not
only as a database in its own right, but as a single
web-based point of entry that unifies access to three major
existing individual organism databases (PlasmoDB.org,
ToxoDB.org and CryptoDB.org), and integrates these databases
with data available from additional sources. Through the
ApiDB site, users may pose queries and search all available
apicomplexan data and tools, or they may visit individual
component organism databases.
PMID: 17098930
The PathoSystems Resource Integration Center (PATRIC) is one of eight Bioinformatics Resource Centers (BRCs) funded by the National Institute of Allergy and Infection Diseases (NIAID) to create a data and analysis resource for selected NIAID priority pathogens, specifically proteobacteria of the genera Brucella, Rickettsia and Coxiella, and corona-, calici- and lyssaviruses and viruses associated with hepatitis A and E.
show more...
Snyder EE, Kampanya N, Lu J, Nordberg EK, Karur HR, Shukla M, Soneja J, Tian Y, Xue T, Yoo H, Zhang F, Dharmanolla C, Dongre NV, Gillespie JJ, Hamelius J, Hance M, Huntington KI, Jukneliene D, Koziski J, Mackasmiel L, Mane SP, Nguyen V, Purkayastha A, Shallom J, Yu G, Guo Y, Gabbard J, Hix D, Azad AF, Baker SC, Boyle SM, Khudyakov Y, Meng XJ, Rupprecht C, Vinje J, Crasta OR, Czar MJ, Dickerman A, Eckart JD, Kenyon R, Will R, Setubal JC, Sobral BW. (2007) PATRIC: the VBI PathoSystems Resource Integration Center. Nucleic Acids Res. (View Abstract at PubMed)
PlasmoDB v5: New Looks, New Genomes
Trends in Parasitology. 2006. 22(12): 543-546
Christian J. Stoeckert Jr., Steve Fisher, Jessica Kissinger,
Mark Heiges, Cristina Aurrecoechea, Bindu Gajria, David S. Roos
Version 5.1 of PlasmoDB, a resource for malaria parasite
genomic and functional genomics datasets, was released in
August 2006. This new release includes additional Plasmodium
genomes and a newly designed website. The new site reflects
the status of PlasmoDB as a member of a linked family of
Apicomplexan databases.
PMID: 17029963
PlasmoDB: The Plasmodium Genomics and Functional Genomics Resource
In silico Genomics and Proteomics: Functional Annotation of Genomes and Proteins. Nicola Mulder and Rolf Apweiler (eds.). Nova Science Publishers. 2006
Patricia L. Whetzel, Shailesh V. Date, Kobby Essien, Martin J. Fraunholz,
Bindu Gajria, Gregory R. Grant, John Iodice, Jessica C. Kissinger,
Philip T. Labo, Arthur J. Milgram, David S. Roos, and Christian J. Stoeckert Jr.
ISBN: 1-59454-995-8
SynView: A GBrowse-compatible Approach to Visualizing Comparative Genome Data
Bioinformatics. 2006. 22(18):2308-2309
Haiming Wang, Yanqi Su, Aaron J. Mackey, Eileen T. Kraemer and Jessica C. Kissinger
SUMMARY: We present SynView, a simple and generic approach
to dynamically visualize multi-species comparative genome
data. It is a light-weight application based on the popular
and configurable web-based GBrowse framework. It can be used
with a variety of databases and provides the user with a
high degree of interactivity. The tool is written in Perl
and runs on top of the GBrowse framework. It is in use in
the PlasmoDB (http://www.PlasmoDB.org) and the CryptoDB
(http://www.CryptoDB.org) projects and can be easily
integrated into other cross-species comparative genome
projects. AVAILABILITY: The program and instructions are
freely available at http://www.ApiDB.org/apps/SynView/
CONTACT: jkissing@uga.edu.
PMID: 16844709
CryptoDB: a Cryptosporidium bioinformatics resource update
Nucleic Acids Res. 2006 Jan 1;34:D419-22
Heiges M, Wang H, Robinson E, Aurrecoechea C, Gao X, Kaluskar N,
Rhodes P, Wang S, He CZ, Su Y, Miller J, Kraemer E, Kissinger JC.
The database, CryptoDB (http://CryptoDB.org), is a community
bioinformatics resource for the AIDS-related
apicomplexan-parasite, Cryptosporidium. CryptoDB integrates
whole genome sequence and annotation with expressed sequence
tag and genome survey sequence data and provides
supplemental bioinformatics analyses and data-mining tools.
A simple, yet comprehensive web interface is available for
mining and visualizing the data. CryptoDB is allied with the
databases PlasmoDB and ToxoDB via ApiDB, an
NIH/NIAID-fundedBioinformatics Resource Center. Recent
updates to CryptoDB include the deposition of annotated
genome sequences for Cryptosporidium parvum and
Cryptosporidium hominis, migration to a relational database
(GUS), a new query and visualization interface and the
introduction of Web services.
PMID: 16381902
OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups
Nucleic Acids Res. 2006 Jan 1;34:D363-8
Feng Chen, Aaron J. Mackey, Christian J. Stoeckert Jr. and David S. Roos
The OrthoMCL database (http://orthomcl.cbil.upenn.edu)
houses ortholog group predictions for 55 species, including
16 bacterial and 4 archaeal genomes representing
phylogenetically diverse lineages, and most currently
available complete eukaryotic genomes: 24 unikonts (12
animals, 9 fungi, microsporidium, Dictyostelium, Entamoeba),
4 plants/algae and 7 apicomplexan parasites. OrthoMCL
software was used to cluster proteins based on sequence
similarity, using an all-against-all BLAST search of each
species' proteome, followed by normalization of
inter-species differences, and Markov clustering. A total of
511,797 proteins (81.6% of the total dataset) were clustered
into 70,388 ortholog groups. The ortholog database may be
queried based on protein or group accession numbers, keyword
descriptions or BLAST similarity. Ortholog groups exhibiting
specific phyletic patterns may also be identified, using
either a graphical interface or a text-based Phyletic
Pattern Expression grammar. Information for ortholog groups
includes the phyletic profile, the list of member proteins
and a multiple sequence alignment, a statistical summary and
graphical view of similarities, and a graphical
representation of domain architecture. OrthoMCL software,
the entire FASTA dataset employed and clustering results are
available for download. OrthoMCL-DB provides a centralized
warehouse for orthology prediction among multiple species,
and will be updated and expanded as additional genome
sequence data become available.
PMID: 16381887
Plasmodium research in the postgenomic era
Trends Parasitol. 2006 22(1):1-4
Duraisingh M, Ferdig MT, Stoeckert CJ, Volkman SK, McGovern VP
The complete genomic sequence of Plasmodium falciparum
strain 3D7 was published in October 2002. At the Next Steps
in Malaria Research meeting in April 2005, the next
practical steps were considered and the priorities ranked
for postgenomic research in Plasmodium. The high-throughput
approaches that will help to answer the major biological
questions regarding Plasmodium should, like the genome
project itself, build community-shared resources, and
efforts must be made to help researchers ready themselves to
use the tools that will become available.
PMID: 16311071
Getting the most out of bioinformatics resources
In: Malaria Parasites, A.P. Waters & C.J. Janse, editors. Horizon, Norfolk UK. 2004
J.C. Kissinger and D.S. Roos
ISBN: 0-9542464-6-2
Composite genome map and recombination parameters derived from three archetypal lineages of Toxoplasma gondii
Nucl. Acids Res. 2005 33:2980-2992
Khan, A., S. Taylor, C. Su, A.J. Mackey, J. Boyle, R. Cole, D. Glover, K. Tang,
I.T. Paulsen, M. Berriman, J.C. Boothroyd, E.R. Pfefferkorn, J.P. Dubey, J.W.
Ajioka, D.S. Roos, J.C. Wootton and L.D. Sibley
Toxoplasma gondii is a highly successful protozoan parasite
in the phylum Apicomplexa, which contains numerous animal
and human pathogens. T.gondii is amenable to cellular,
biochemical, molecular and genetic studies, making it a
model for the biology of this important group of parasites.
To facilitate forward genetic analysis, we have developed a
high-resolution genetic linkage map for T.gondii. The
genetic map was used to assemble the scaffolds from a 10X
shotgun whole genome sequence, thus defining 14 chromosomes
with markers spaced at approximately 300 kb intervals across
the genome. Fourteen chromosomes were identified comprising
a total genetic size of approximately 592 cM and an average
map unit of approximately 104 kb/cM. Analysis of the genetic
parameters in T.gondii revealed a high frequency of closely
adjacent, apparent double crossover events that may
represent gene conversions. In addition, we detected large
regions of genetic homogeneity among the archetypal clonal
lineages, reflecting the relatively few genetic outbreeding
events that have occurred since their recent origin. Despite
these unusual features, linkage analysis proved to be
effective in mapping the loci determining several drug
resistances. The resulting genome map provides a framework
for analysis of complex traits such as virulence and
transmission, and for comparative population genetic
studies.
PMID: 15911631
Themes and variations in apicomplexan parasite biology
Science. 2005 309:72-73
Roos, D.S.
PMID: 15994520
Protozoan genomics for drug discovery
Nature Biotechnol. 2005 23:1089-1091
Chaudhary, K., and D.S. Roos
PMID: 16151400
The transcriptome of Toxoplasma gondii
BMC Bioinformatics 2005:3-26
Radke, J.R., M.S. Behnke, A.J. Mackey, J.B. Radke, D.S. Roos and M.W. White
BACKGROUND: Toxoplasma gondii gives rise to toxoplasmosis,
among the most prevalent parasitic diseases of animals and
man. Transformation of the tachzyoite stage into the latent
bradyzoite-cyst form underlies chronic disease and leads to
a lifetime risk of recrudescence in individuals whose immune
system becomes compromised. Given the importance of tissue
cyst formation, there has been intensive focus on the
development of methods to study bradyzoite differentiation,
although the molecular basis for the developmental switch is
still largely unknown. RESULTS: We have used serial analysis
of gene expression (SAGE) to define the Toxoplasma gondii
transcriptome of the intermediate-host life cycle that leads
to the formation of the bradyzoite/tissue cyst. A broad view
of gene expression is provided by >4-fold coverage from nine
distinct libraries (approximately 300,000 SAGE tags)
representing key developmental transitions in primary
parasite populations and in laboratory strains representing
the three canonical genotypes. SAGE tags, and their
corresponding mRNAs, were analyzed with respect to
abundance, uniqueness, and antisense/sense polarity and
chromosome distribution and developmental specificity.
CONCLUSION: This study demonstrates that phenotypic
transitions during parasite development were marked by
unique stage-specific mRNAs that accounted for 18% of the
total SAGE tags and varied from 1-5% of the tags in each
developmental stage. We have also found that Toxoplasma mRNA
pools have a unique parasite-specific composition with 1 in
5 transcripts encoding Apicomplexa-specific genes
functioning in parasite invasion and transmission.
Developmentally co-regulated genes were dispersed across all
Toxoplasma chromosomes, as were tags representing each
abundance class, and a variety of biochemical pathways
indicating that trans-acting mechanisms likely control gene
expression in this parasite. We observed distinct
similarities in the specificity and expression levels of
mRNAs in primary populations (Day-6 post-sporozoite
infection) that occur prior to the onset of bradyzoite
development that were uniquely shared with the virulent Type
I-RH laboratory strain suggesting that development of RH may
be arrested. By contrast, strains from Type II-Me49B7 and
Type III-VEGmsj contain SAGE tags corresponding to
bradyzoite genes, which suggests that priming of
developmental expression likely plays a role in the greater
capacity of these strains to complete bradyzoite
development.
PMID: 16324218
Functional genomics databases on the web
Cell Microbiol. 2005 7(8), 1053-9
Stoeckert CJ Jr.
Experiments involving high-throughput methods for measuring
transcripts, proteins and metabolites constitute the area of
functional genomics. These experiments are highly context
dependent and require much more detail about the
experimental design, sample and protocols used than in
genomics. Functional genomics databases are needed that
follow established and emerging standards. Functional
genomic databases are not yet very common; however, there
are a few focused on microbial genomes and a couple
integrative systems are available for setting up functional
genomics databases.
PMID: 16008573