[leish-l] Tri Tryp sequencing and annotation update

Chris peacock csp at sanger.ac.uk
Fri Oct 31 10:51:26 BRST 2003


Dear colleagues,
        
Please find below an update on the progress of the sequencing and
annotation of the genomes of the three Kinetoplastid species,
Trypanosoma brucei, Leishmania major and Trypanosoma cruzi, which are
now nearing completion.
Briefly,:
1. L. major:
The L. major project is essentially complete; manual finishing is
on-going to precipitate closure of the ca. 150 remaining gaps
(predominantly found in repeated regions and co-migrating chromosomes).
The first "official" release of the L. major genome has been available
since August this year. 

2. T. brucei: 
The sequencing of the T. brucei genome should be completed by the end of
the year. Efforts are currently centred on finishing chr XI and closing
two 100kb sequencing gaps on chrVII. Contigs/BACs from the remaining
chromosomes have been ordered and, wherever possible, pseudomolecules
have been assembled. 

3. T. cruzi:
The sequencing of the T. cruzi genome is essentially complete with 19X
whole genome shotgun coverage obtained as of August 2003. Current
efforts are now focused on optimizing the assembly, including the
development of new algorithms to deal with tandem and dispersed repeats
as well as allele variation (i.e. a diploid with polymorphisms between
numerous alleles).  These data are currently being assembled into large
scaffolds that constitute a "pseudo-genome" as a reference set with
additional contigs and repeats representing variations.  Gene finding
as well as annotation will be highly automated with emphasis placed on
annotation by orthology to T. brucei and L. major. While we expect gene
content to be well determined and characterized by the end of this year,
closure and assembly efforts will continue for several months as we work
diligently at separating the haplotypes and refining the structure of
the genome. 

Intended progress:      
Following the TriTryp meeting at Hinxton in June 2003 and during the
recent meeting at TIGR, the TriTryp sequencing consortium (Karolinska
Institute, Seattle Biomedical Research Institute, The Institute for
Genomic Research and The Wellcome Trust Sanger Institute) and
representatives of the funding bodies have discussed plans for the
annotation and joint publication of these three genomes.
        
After extensive discussions on standardising annotation protocols and
data exchange, annotators at the sequencing centres are currently
completing the initial phase of annotation. All centres will not only
annotate and analyse data using similar techniques and tools but will
also exchange datasets and share results. During efforts to finish and
annotate the sequence, data from the related genomes will (wherever
possible) aid assembly and annotation, avoiding duplication of efforts.

To accommodate the need for stable data releases, against the backdrop
of on-going finishing and annotation efforts, the sequencing centres
have agreed upon a timeline for releasing data in a series of stages.
The centres will share data in mid November.  Public release of the
first major versions (v1.0 in the case of T. brucei and T. cruzi and
v2.0 in the case of L. major) of the data are planned for the beginning
of December. Subsequent changes to only the annotation will be
reflected in incremental version numbers (e.g. v1.1, etc). During this
time,
sequencing will continue behind the scenes and, prior to publication,
another major release of the annotated sequence will be announced.

As previously, data can be accessed and downloaded via TIGR's T. brucei
and T. cruzi databases (http://www.tigr.org/tdb/e2k1/tba1/ and
http://www.tigr.org/tdb/e2k1/tca1/) and GeneDB ( http://www.genedb.org/)
in accordance with the respective sequencing centre's data release
policy for unpublished data. The databases will maintain and update
annotation frequently and the stable data releases of all three
organisms will be available through GeneDB. The sequence and annotation
releases will be clearly labelled as such. 

As mentioned above, the data releases will consist of first pass,
consistent annotation across the three genomes. We would very much
welcome comments, corrections and updates to gene predictions and
annotations at this point, as this will be vital in maintaining
accurate, up-to-date and comprehensive an annotations across the
genomes.

To make the feedback process as efficient as possible, please use the
forms available via either the TIGR or GeneDB databases. Comments will
automatically be sent to annotators at all the sequencing centres. In
addition, the sequencing consortium would like to invite some members of
the T. brucei, T. cruzi and L. major research communities to participate
in the analysis and publication of the genome sequences of these three
organisms. Invitations will be sent out shortly. 

The sequencing centres and funding agencies are committed to the
long-term maintenance of a Kinetoplastid database. GeneDB will remain as
a centralised resource and as such will continue to update genome
annotation, integrating datasets from other public sources and providing
tools for database querying and cross-species comparisons. 
        
If you have any further questions please contact either Najib El-Sayed
(nelsayed at tigr.org), Matt Berriman (mb4 at sanger.ac.uk), Al Ivens
(alicat at sanger.ac.uk), Peter Myler (mylerpj at sbri.org) or Bjorn Andersson
(Bjorn.Andersson at cgb.ki.se).

Regards,

Christiane Hertz Fowler and Christopher Peacock
-- 
Dr Christopher Peacock                  tel +44 (0)1223 494851
Senior Computer Biologist               email csp at sanger.ac.uk
Pathogen Sequencing Unit (PSU)
The Wellcome Trust Sanger Institute
Hinxton, Cambridge CB10 1SA, UK



More information about the Leish-l mailing list