Ensembl Genome Browser

Ensembl was first developed in 1999 prior to the publication of the first draft of the human genome The aim of ensembl at that time was to annotate the human genome with gene models and other Available data and to make this all available through the web Since then Ensembl has continued to provide genome annotation for more and more chordate species including human, mouse and zebrafish The range of species annotated in Ensembl covers the broadest taxonomic space possible including conservational, agriculturally, medically and evolutionarily important species from all the major animal classes: mammals, birds, reptiles, fish and amphibians. Ensembl also presents gene annotation for a small number of non-vertebrate cordate species such as the lamprey and sea squirt, as well as non cordate species such as the fruit fly, C. elegans and Saccharomyces cerevisiae which are species that are important model organisms and are used in our comparative analyses? While Ensembl is focused on chordate species Ensembl Genomes is a sister project that extends Ensembl across the taxonomic space and is dedicated to non-chordates namely bacteria fungi protists, non vertebrate metazoans and plants You can find the Ensembl Genomes project pages at www.ensemblgenomes.org For each species Ensembl annotates a wide range of data onto the genome assembly. The first of these types of data is the gene models. For the protection of gene models, called the gene build, sequences from the International Nucleotide Sequence Databases or INSDCs, NCBI RefSeq and Uniprot are aligned to the genome. From these alignments transcripts are then clustered together based on overlapping coding sequences to form Ensembl genes. This is called the automated gene annotation pipeline. For human, mouse, rat, pig and zebrafish Ensembl also incorporates manually curated transcripts from the HAVANA project. This merged set of genes from the automated and manual annotation is called the Gencode geneset. Then, in addition to the gene annotation, a variety of other data types are further annotated onto the genome where available including variation data comparative genomics analysis and the annotation of regulatory features. Let’s see how all of this information is arranged on the Ensembl browser web site: www.ensembl.org This is the Ensembl homepage. The blue bar at the top is present on every page of Ensembl and it contains links to a number of different tools including BLAST/BLAT, BioMart and the Variant Effect Predictor as well as other features including help and documentation. There is also a search bar at the top of the homepage that allows you to search all of the species in Ensembl for genomic coordinates, gene names probe sets variant IDs phenotypes or regulatory feature IDs. Every two to three months the browser and underlying databases are updated which is all included in a single release version. The news in the top right of the homepage shows the current release version and announcements of the incorporation of new data as well as updates and changes in the annotation. An archive link is present at the bottom of every page, which allows you to access older versions of Ensembl On the home page you can also find information about all of the species represented in Ensembl There are quick links to our most popular species human, mouse and zebrafish as well as links to view the full list of Ensembl species. Here you can also navigate to the pre! sites, where you can browse new genomic assemblies that are not yet fully annotate and the Ensembl Genomes website. In this particular example we will search for a human gene with the HGNC identifier BRCA2. The HGNC identifier is the assigned human gene name from the HUGO Gene Nomenclature Committee. We will use this as an example to navigate through the pages of Ensembl. We could search directly from this main page by typing ‘BRCA2’ into the search bar, however, let’s click on Homo sapiens to move to the species specific homepage for human. Click on the link for the human GRCh38 assembly. Here we find information about the annotated data for this particular species You can find out about the gene annotation methodology, the comparative genomics analyses, variation data and regulation data. You can also find out about the genome assembly itself by clicking on the ‘More information and statistics’ link or you can jump to our browser dedicated to the GRCh37 genome assembly. Now we will search for a gene of interest By typing ‘BRCA2’ in the search bar at the top of the page and clicking ‘Go’. Click on this first link to go to the gene summary page for BRCA2 The Ensembl gene identifier for this gene is ENSG00000139618 This is the stable identifier meaning that even if this gene is updated the ID should remain the same from one Ensembl release to the next. Other stable IDs in Ensembl include ENST for transcripts, ENSP for peptides, ENSE for exons and ENSR for regulatory features. The Ensembl views are separated into tabs. At the moment, we are in the gene tab, links on the left shows specific information for the BRCA2 gene, for example: genomic sequence, sequence alignments, the BRCA2 gene tree, homology relationships, gene ontologies, variation data and BRCA2 gene expression. From this gene summary page, we can see that seven transcripts have been annotated. To see more information about these transcripts, we can click on ‘Show transcript table’. Two of these transcripts are members of the CCDS set, a consensus set of coding sequences established as a collaborative effort between the Ensembl project, HAVANA, NCBI and the UCSC genome browser. Further down this page, we can see the transcript models. The contig representing the genomic sequence is displayed as a blue bar. Transcripts above the contig are on the forward strand. Transcripts below the contig are on the reverse strand. Boxes represent exons and lines connecting those boxes are the introns. Boxes are filled in if they contain coding sequence. Unfilled boxes represent untranslated regions. The two protein coding transcripts of the BRCA2 gene show coding sequence and are on the forward strand of the genome, above the blue bar. The colours of the transcripts denotes the transcript biotype. One is colored gold, which denotes that this transcript was annotated by the Ensembl automated annotation pipeline as well as the manual annotation by the HAVANA project. The other protein coding transcript, coloured red, was annotated by the Ensembl automated annotation pipeline. The blue transcripts are non-coding processed transcripts. Let’s choose one of these transcripts to explore further. We could either click on one of the transcript IDs from the table above or click on the transcript diagram and follow the link to the ENST identifier. I’ll choose the golden transcript. We are now in the transcript tab. At the left there are links to the supporting evidence to see sequences on which this transcript was based as well as exon, cDNA and protein sequence displays. Other links includes ‘General identifiers’, which is where you can see matches to the Ensembl gene or protein sequence in other external databases such as Uniprot. You can even find different protein domains mapped to the amino acid sequence in the ‘Protein summary’ view and oligo probe mappings in the ‘Oligo probes’ view. Let’s now look at a larger region of the genome surrounding the BRCA2 gene. Click on the location tab in the blue bar at the top to go to the ‘Region in detail’ page. At the top of this page is a chromosomal overview and a red box depicts the region of the chromosome that the subsequent views on this page focus upon. The next image is a one Mb overview where the BRCA2 gene is highlighted in a red box. Neighboring genes are shown along the chromosome. Contigs are shaded in light or dark blue to show their position along the genomic sequence. You can click on a gene or click and drag your mouse to form a small box to re-centre to the display. Scrolling down, we are able to see the third most detailed depiction of the BRCA2 genomic location. The data is displayed in individual tracks, which can be formatted, moved, added or removed. By default we are viewing a small number of tracks, including the ‘Genes’ track and the ‘Conti’g track, representing the genome assembly. You can add more tracks to this view by clicking on the ‘Configure this page’ option. Active tracks are shown here. You can find additional data tracks to add by searching the sub-menus on the left hand side or by using the search option in the top left hand corner. Let’s choose to see variants from the dbSNP database. Click ‘Variation’ in the subheadings. Turn on the ‘dbSNP variants’ track by clicking the box and selecting the ‘Normal’ format. Let’s also add Uniprot alignments. Search for ‘Uniprot’ in the search box at the top left-hand corner. You can click the information icon to learn more about this source. Turn on this track in the ‘Normal’ format. Save and close the menu. The page should reload. You will see the alignment of the Uniprot proteins to this region of the genome. Variants are shown as vertical lines along the genome although at this scale the individual lines representing variants merge to form a single block. You can find out more information about individual variants by clicking on them. A pop-up window will show you further information about the class of variant, the observed alleles, the global minor allele frequency (if available) as well as a link to the variant tab, a page that contains all the available information for this variant. Finally you can also find out more information about regulatory features present in this region of the genome by clicking on the individual features in the ‘Regulatory build’ track. The pop-up window will contain information about the type of regulatory feature it is plus a link to the regulatory tab, a page that contains all the available information for this regulatory feature. Help is available on most Ensembl pages. You can access page specific help by clicking on the help icon for more information. In the help window we find a link to the documentation. There are also links to FAQs, the video tutorials and the glossary. Thank you for watching this brief introduction to Ensembl. If you have any questions related to Ensembl please email our helpdesk [email protected]

One thought on “Ensembl Genome Browser

Leave a Reply

Your email address will not be published. Required fields are marked *