[leish-l] Fwd: Leishmania genome project sequencing update

Jeffrey Shaw jeffreyj at usp.br
Fri May 23 12:46:15 BRT 2003


>Sender: csp at sanger.ac.uk
>Date: Fri, 23 May 2003 15:37:13 +0100
>From: Chris peacock <csp at sanger.ac.uk>
>Organization: The Sanger Institute
>
>Subject: Leishmania genome project sequencing update
>
>
>Dear Colleagues,
>Below is an update on the progress of the leishmania genome sequencing
>project and the generic genome database GeneDB (http://www.genedb.org)
>that houses the sequence, annotation and curation. This is also attached
>as a word file.
>
>Sequencing progress.
>Leishmania major Friedlin strain (MHOM/IL/1980/Friedlin) was the
>reference strain chosen as the first Leishmania spp. to be completely
>sequenced by the Leishmania Genome Consortium. Currently, sequence is
>provided  from the Sanger Institute and the Seattle Biomedical Research
>Institute (SBRI); however, many other groups and people have contributed
>to the data provided to the community.
>Of the 36 chromosomes that make up the haploid genome, all are either
>complete or in finishing. Eight chromosomes are now considered to be
>complete to the point that all the unique sequence is present and
>contiguous (there are some instances where the exact number of duplicate
>repeat units or telomeric hexamer repeats are only estimated).
>The finished chromosomes are Chr 1, 2, 3, 4, 5, 15, 24 and 25.
>The rest of the chromosomes are in the finishing stage. Fourteen of
>these are undergoing gap closure of a small number of ordered contigs
>(chr 6, 8, 10-15, 19, 27, 29, 34, 35 and 36) and the remainder exist as
>unordered large contigs that are undergoing ordering and gap closure
>using skims of BAC clones, primer based PCR reactions and optical maps.
>Data from the project can be accessed in three main ways:
>
>1)    The GeneDB database for Leishmania
>(http://www.genedb.org/genedb/leish)
>This database houses annotated and curated data together with a
>user-friendly interface and a collection of facilities to aid in
>searching and defining the data. The database is under constant
>development to improve and add to the features available within the
>database. Community feedback is positively encouraged not only to
>improve the features available but also to add community information for
>the benefit of all users. Feedback forms and email links to the
>appropriate member of the GeneDb team are present on every page.
>
>Data currently available in GeneDB is initially annotated
>semi-automatically. Gene predictions made at the sanger are assigned
>manually using codon usage and the Hexamer gene prediction program.
>genes on finished chromosomes are being given the systematic name of the
>form LmjFXX.nnnn, where the XX is the chromosome number and nnnn is the
>gene number in increments of 10. Putative functional assignments are
>initially made using a number of automatic prediction programs including
>FASTA and BLASTP similarity searches against Swiss-Prot and TrEMBL
>databases, Interpro mapping of protein domains and motifs and
>InterPro2GO mapping for Gene Ontology (GO) annotation. this
>semi-automatic annotation is then replaced with manual annotation and
>curation with publicly available material. Gene predictions made at SBRI
>use a semi-automatic combination of Glimmer, Testcode, GeneScan and
>CodonUsage algorithms, followed by manual editing. New datasets added in
>the last couple of months include the finished chromosomes 15, 25 and
>31.
>
>Features that are currently available include:
>
>i)   Simple search querying by gene name/ID, description, product or
>keywords.
>ii)   Complex Boolean querying of all the data within GeneDB. As well as
>the Leishmania data, users may select query terms and search against any
>combination of the organisms housed within GeneDB
>(http://www.genedb.org/gusapp/serlet?page=boolq&organism=leish).
>iii)   Bulk data downloads using the complex querying to select a list
>of genes and features for downloading in FASTA format. Both DNA and
>protein sequences can be selected for download. A detailed help/guidance
>page explaining the use of both the boolean querying and bulk downloads
>is available at http://genedb.org/genedb/boolean.jsp.
>iv)   Contiguous annotated sequence from any of the contigs or
>chromosomes can be viewed and downloaded from any gene page by selecting
>the "Graphical Display (in Artemis)" button next to the contig maps.
>Users can select how much of the sequence to view and/or download based
>on either contig co-ordinates or the number of bases either side of a
>gene. the Artemis graphical display tool allows users to view intergenic
>features, sequence and annotation
>v) BLAST servers. All the leishmania data, including the unassembled
>shotgun reads, EST database and publically submitted proteins in the
>Swiss-Prot and TrEMBL database, can be searched using the BLAST server
>accessible from the GeneDB front page and all gene pages. A combination
>of more than one dataset, including those from the other organisms
>within GeneDb can be searched using the OMNIBLAST server
>(http://wwwgenedb.org/genedb/seqSearch.jsp?organism=leish).
>
>2)    FTP site (http://ftp.sanger.ac.uk/pub/databases/L.major_sequences)
>The Leishmania major ftp site contains a comprehensive list of all the
>data currently available for the sequencing project. these datasets
>include finished sequence and "sequence in progress" contigs from both
>the Sanger Institute and the SBRI. Genome survey sequences kindly
>provided by Steve Beverly and EST sequences from Jennie Blackwell are
>also available. All the data from GeneDB in the form of a DNA sequence
>and protein sequence databases are placed here after each weekly update.
>More detailed information about the data on the ftp site is available in
>the README.txt file
>(ftp://ftp.sanger.ac.uk/pub/databases/L_major_sequences/README-.txt)
>
>3)      Leishmania major Genome Project Pages
>More information on the Leishmania major genome sequencing project,
>including detailed information on the progress of the sequencing of each
>of the chromosomes, can be accessed at the project pages
>(http://www.sanger.ac.uk/Projects/L_major/).
>
>Please note that the data for the Trypanosoma brucei genome sequencing
>project are also available in GeneDB
>(http://www.genedb.org/genedb/tryp).
>We would welcome any comments regarding the annotation or functionality
>of GeneDB. We look forward to receiving input in to this project.
>
>Please contact :
>Al Ivens (alicat at sanger.ac.uk) for any Sanger Institute
>sequencing-related problems, Peter Myler (mylerpj at sbri.org) for issues
>relating to SBRI data (chr 1,2,3,27,29 and 35), Christopher Peacock
>(csp at sanger.ac.uk) or Al Ivens for any annotation or curation related
>issues and either Christopher Peacock or Martin Aslett
>(maa at sanger.ac.uk) for technical questions about the database.
>
>Best Wishes
>
>The Leishmania Team
>
>Ref: Introducing GeneDB: a generic database. Trends in Parasitology, Vol
>18 (10) 465-67
>--
>--
>Dr Christopher Peacock                  tel +44 (0)1223 494851
>Senior Computer Biologist               email csp at sanger.ac.uk
>Pathogen Sequencing Unit (PSU)
>The Wellcome Trust Sanger Institute
>Hinxton, Cambridge CB10 1SA, UK




More information about the Leish-l mailing list