Aniseed Release Notes
Gene expression data:
Thanks to a collaborative agreement with Yutaka Satou, the system now includes the annotated ISH data from Transcription factor/signalling molecules carried out in Yutaka Satou and Nori Satoh's lab. Pictures are not yet available, but will be in the coming months. For now if you want to see the pictures from this screen, click on the link in the result page that will bring you to the home page of the Ghost database where you will find the original images. When refering to any gene from this screen, please refer to the original publication in Development, NOT to Aniseed. The same applies when refering to the Halocynthia data found on Aniseed: refer to the initial paper, not to Aniseed. We have also started in Collaboration with Yutaka to reannotate more precisely the expression profiles, making use of the more detailed Aniseed anatomical dictionaries. This reannotation will be put online progressively.
Most gene annotation, and in particular Gene Ontology terms and names was derived from orthology relationships, which only exist for about 50% of gene models. We have added GO terms (localisation and molecular function) and names (highly similar to... similar to...) on the basis of the best blast hit.
During the past months, we have corrected some earlier problems in the Ciona anatomical dictionaries, and horizontally linked them, by introducing lineage information. You can now trace the lineage of a structure and use this to query for genes expressed in a precise lineage across developmental stages. We also defined 17 keywords defining the final fates of given blastomeres or structures from Ciona intestinalis. Thanks to the 3D data we obtained and quantified using the 3D embryo Handler software and 3D embryo reconstructions, we provide some informations relative to neighbourhood relationships, shape of blastomeres, etc.... This is currently only available for two stages of Ciona embryos: 16- and 32-cell, but we are now building up a collection of reconstructed embryos and their biometrical analysis will gradually be put online. Some embryological information (inductions, competence of cells) is available for the late 32-cell stage. This is only very preliminary and subject to changes wihout notice. Use it with precautions. It is just there to give a flavour of what we would like to develop in the future.
Upon selection of a dictionary, it is possible to ask to display the structures that will eventually adopt one or several fates of interest. They will be marked by a green dot on the results page. At the top of the results page, all buttons are now active. Upon selection of a structure: The "In situ data" button will give access to all genes expressed in this structure, The "lineage" button will show you all progenitors and progeny (the interface will improve soon), The "biometry" button only works with cells, and gives access to the volume, surface, shape, and neighbours of the selected cell with a quantitation of the surface of contacts (only for 16- and 32-cell stages so far). The "fates" button gives the fates of the selected structure (works only for cells). The "induction" button indicates whether the cell is subject to an induction process at this stage. This is still work in progress, very few data are entered but please give us feed back on what you like or dislike there.
Select one or several stages, one or several fates, which boolean syntax you want to use, and the results page will indicate all relevant cells and structures. The format of the results page is OK but not yet great but it will improve with time (and with the help of your suggestions).
Select your favorite cells within the anatomical dictionary (beware this only works with cells, not higher order structures), a maximal( or minimal) distance for cells of interest, a minimal or maximal surface of contact and the system will retrieve a list of relevant cells. Clicking on a cell opens a new page showing its position in the anatomical dictionary.
Select a species and a stage (works only on 16 cell and late 32 cell in Ciona intestinalis), and some properties of interest for your cell (entropy (meaning compactness), sphericity, elongation, flatness, large or small volume, etc...). The results page lists all cells satisfying the criteria, and with all their shape descriptor values. Click on a cell and you will see its position in the anatomical tree.
select species, a stage and a structure, tick the boxes according to whether you want to see the progenitors, progeny or both of the selected cells. The system will return a list of cells with an indication of the stage they are present. A tree-based representation will soon replace the current interface.
DDD: (alias Digital Differential Display):
This interface allows to find genes that have differential EST representation between sequenced cDNA libraries. It uses the idea (put forward in Ciona by Yutaka Satou and N. Satoh) that as the libraries were not normalised , the number of ESTs in a library reflects the abundance of the transcript in the starting population. The DDD interface allows you to find genes expressed at statistically higher levels between sets of libraries. This is a very powerful tool, with ONE MAJOR LIMITATION: each request mobilises a large proportion of the resources of our current server. So please, for the sake of others, do not play with it, only place the requests that make sense for your research. If the use of this interface significantly reduces the speed of the system, we will remove the interface until we have migrated to a new more powerful server (see below).
the last tool is the tiny "refine" button that appears at the bottom of most results page (except the ISH results pages). By selecting another interface and pressing refine, you tell the system you want to use the results on your page as the search space of the next query. In short, it allows you to place sequential queries such as find the zinc finger genes (Interpro search) that are annotated transcription factors (Refine with Gene Ontology search), etc... Play with it and you will see how powerful this small addition makes the system.
From Oracle to PostgreSQL:
Aniseed version 1.0 was using a commercial SQL database system, Oracle, as is the released version 2.0. However, in the next few weeks, version 2.0 will migrate to a different, non-commercial SQL engine: PostgreSQL. Advantage of this is that it will be possible to install mirror sites of Aniseed away from Marseille, making the system less sensitive to server troubles. It will also make the system truely generic and easier to use for other model systems.
Migration to a new server:
While postgreSQL opens the way to mirror sites, we will be migrating within a month to a more powerful independent server within an IBM cluster. This should allow faster processing of more complex queries. No change of URL address is anticipated.
At present we estimate that the development of new interfaces, besides the ones mentioned above, will slow down and that the focus will be placed on:
Getting as many data into the system as possible.
This includes more ISH (currently a bit over 10000 for Ciona and Halocynthia combined) and promoter expression data (currently less than 10.....), more anatomical data (for Halocynthia for instance), more embryological data (inductions, competence, etc...) and more 3D data and reconstructed embryos. We would like the whole community to contribute to this aim and will therefore release the loader software for remote submission of ISH and Cis-regulatory element data, probably during the Santa Barbara meeting. We will make all efforts that submitted data are be tracable, and properly attributed to their contributing author. We are currently reconstructing in 3D a more important set of embryos, and expect to release the 3D Embryo Handler software at the beginning of autumn, with the associated reconstructed embryos. Again, this should allow the community to contribute additional models to the other labs.
Making all data downlodable as flat or XML files.
We have started to put some flat files in the download section, but this is still incomplete. We plan to release the rest of the data by summer, so that people can donwload them and use them for large or small scale bioinformatics analysis.
Community decisions over the future of the system.
The system is now getting rather large and it may be time for it to become steered by the community rather than by my lab. May be it would be a good idea to include a discussion of its future during the Monday July 11 evening round table in Santa Barbara? For instance, should we decide to nominate a steering committee to oversee future developments?
The Aniseed (Ascidian Network of In Situ Expression and Embryological
Data) system is a community resource for ascidian developmental studies.
It allows one to mine and download available embryological, anatomical,
genomic and gene expression data.
It is an Oracle database organised in six main parts:
For each of the 22 stages we defined, the anatomical field describes, using a controlled hierarchical dictionary, the different biological structures and blastomeres present in each ascidian species. This ontology is represented as a directed graph, which allows one to organise terms as the nodes of a tree and to link them according to the characteristics they share. We thus described each organ and structure in a hierarchical way where terms at the top of the hierarchy represent a global structure (eg. Mesoderm) while child terms correspond to more precise parts (eg. Secondary notochord lineage). This description has a single-cell resolution level up to the beginning of gastrulation, a stage up to which the lineage was completely worked out. Following this stage, the ontology follows the germ layers and was designed to be as compatible as possible with vertebrate ontologies.
The system hosts all available ESTs, cDNAs, and gene models for the two species Ciona intestinalis (mainly the very big Kyoto set and the small Marseille set) and Halocynthia roretzi (the set contributed by the Halocynthia consortium via Kaz Makabe and Takeshi Kawashima). 400.000 Ciona EST and cDNA clones are clustered according to the predicted gene model they correspond to. The remaining 80.000 clones could not be matched either because of the draft quality of the assembly (5% of genes are estimated to be missing) or of the gene predictions (these predictions do not at present take into account the EST data and frequently miss the 5' and 3' ends of genes). The proportion of clones correctly clustered will increase with the accuracy of the assembled annotated genome.
Functional annotation of proteins:
Functional annotation of the predicted proteins was achieved by three methods. We first run a programme, Inparanoid (Remm et al., 2001), which identifies orthologues by comparing in a pairwise fashion proteomes from completely sequenced organisms. Clear fly, human or mouse orthologues for approximately 50% of predicted Ciona genes (8119/15592) are detected this way. The relatively small % of detected orthologues is again probably due to the incompleteness of the JGI gene models. The orthologues are then used to name, but also to attribute a Gene Ontology classification to the Ciona gene. In parallel, we run Interproscan (Zdobnov et al., 2001) for each Ciona protein and deduced the presence of functional motifs. These were in turn used to attribute GO terms to proteins without clear orthologues. Finally, a BlastP search against trembl and swissprot with a cut off of 1e-06, will soon be used to complement the GO information for proteins without clear orthologues or motifs, but with similarity to proteins previously assigned a function. The identification of orthologues also opens the way to a comparison of expression profiles among metazoans.
Additional tables were included in the design of the database to more precisely characterise the function of proteins. These tables include for example protein interaction data, and DNA binding specificity of transcription factors. At present, however, they remain empty.
Two types of expression data are currently supported.
The ESTs generated in the Ciona genome projects originate from a collection of non-normalised cDNA libraries from different stages and adult tissues. Clustering of the ESTs on the basis of their correspondance to a given gene model allows one to calculate the abundance of the clones corresponding to this gene in the different sequenced libraries. This EST count proves to be a reliable measure of the level of expression of a gene at a given time or in a given tissue (Satou et al., 2003).
In addition, Aniseed currently hosts In situ hybridisation data for around 200 Ciona intestinalis genes with a restricted expression pattern, mainly coming from the in situ screen carried out in the Lemaire lab (Marseille, France). In situ hybridisation patterns are illustrated by standardised pictures (orientation, format) and described using the controlled vocabulary anatomical dictionary for the relevant stage. In addition to In situ data, Aniseed supports the description of promoter analyses and immunohistochemistry.
A unique feature of Aniseed is that it supports both wild type expression patterns, as well as expression patterns in manipulated embryos. Manipulations supported include both embryological (blastomere explantation, or ablation) and genetic (over-expression, Morpholino knock-down, treatment with pharmacological inhibitors or recombinant signalling proteins) treatment. This type of information is of crucial importance for the reconstruction of genetic cascades.
This part describes the source of the data either published or unpublished.
How to query Aniseed:
The web interface allows one to search Aniseed by
. In situ data
. Gene ontology
. InterPro domains
In situ data:
Following the selection of a species and a developmental stage, this page allows one to search for genes that are expressed in individual or multiple structures from the anatomical dictionary. Conversely, the expression data for a given gene model can be obtained. In addition to wild type embryos, it is possible to search for expression patterns in deregulated contexts. It is also possible to search for pictures showing co-expression of two genes.
The result page displays the species and stage, thumbnails of the relevant in situ pictures, a brief description of the staining and the identity of the stained molecule according to the controlled dictionary, corresponding gene model(s) and the labelled territories. All these fields can be clicked for further information. Upon clicking on "more" a second page appears showing a larger picture, a recap of all expression domains at this stage, the name of the annotator with direct e-mail link, the experimental conditions, and references. Included as well is the possibility to search for other genes expressed in the same territories, to perform expression clustering analysis and to search for expression data for the same gene, but in deregulated contexts (i.e. overexpression, morpholino injections, mutant background or ablation of embryo parts, explants, etc. ).
This page displays the anatomical dictionary at a given stage in a given species. The displayed anatomical dictionary can be used as an alternative interface to look for genes expressed in selected structures. The "get lineage", "get position" and "get fates" buttons are currently being implemented.
This page allows one to search for molecules by species, clone name, clone sequence name (Genbank accession number), and also gene name (biological name). The results window gives access to all genes matching the query. Selection of one gene leads to a detailed description of its features: link to the JGI genome project page, display of Interpro domains, link to EST counts and in situ data, prediction of orthologues in other complete genomes (Inparanoid predictions) or of paralogues in Ciona, and best BlastP hits in the Swissprot database. These pieces of information form the basis for the Gene Ontology classifications of the genes.
This function allows one to search for Ciona molecules showing similarity to a sequences of interest.
Gene Ontology/InterPro search pages:
Allows one to search for genes according to their associated Gene Ontology or InterPro terms (search for proteins involved in a given process, molecular function, subcellular localisation or with given protein motifs.)
How can you contribute to Aniseed?
The aim of Aniseed is to form a community tool that will help us all in our research, but may also in the future, allow one to start some modelling work on Ciona embryogenesis. The more labs that participate inthe project, the more satisfying the tool will be for all. Key in our mind is that the future of the tool will be determined by the participating labs. You can participate at many levels.
Entering your expression data:
This is the most simple way to participate and a very important one. You can already request from us the loader as a beta tester. You will see that entering data is rather simple and we welcome your views on how to make the process even more simple. There are several types of data you can enter:
1) published expression data on your favourite gene(s). These are usually very high quality data and most sought after.
2) expression data on genes you are not very interested in and do not want to take time to publish. These data are usually of lesser quality but are still very valuable to the community as they can guide other people's steps. You will see that entering the data in Aniseed is much simpler than publishing them, and your name (and e-mail) will remain attached to the data.
3) large scale in situ screens. These data are usually of lesser quality, but they are invaluable again as a guide for others.
Communicating embryo models:
Making embryo models is a rate limiting step. If you are interested in participating to this task, let us know. We will then let you have all the information about formats, etc..; so that your work is compatible with the system.
Expressing your wishes:
Once you have tried Aniseed you will probably have comments and suggestions for new pages allowing new searches, for types of data that are not yet supported, etc.... You are most welcome to communicate them to us and we will try and see what we can do, especially for reiterated requests.
If you are interested in developing new tools, we are most happy to help you do so...... Just let us know so that we can organise this.
We hope you will have fun with Aniseed, and look forward to your participation,
All the best,
Olivier Tassy and Patrick Lemaire