# Genome3D Releases

## Overview

An overview of the Genome3D release process:

 1. Download set of protein sequences from model organisms (from UniProtKB)
 2. Contributing resources add their own annotations to these sequences (via [`'head'` Genome3D API](https://head.genome3d.eu/api))
 3. At release time, the `'head'` database is saved as `'latest'` (which will not change) 
 4. Create a new `'head'` database and return to step 1 

The first step is to identify and download a dataset of target Genome3D protein sequences from UniProtKB. 
The taxon id is used to search the UniProtKB API (with the 'representative proteome' keyword) to get
sequences for each model organism.

## Releases


| Version | UniProtKB Date | Genome3D Release Date | Genomes               | Sequences |
|---------|----------------|-----------------------|-----------------------|-----------| 
| v1.0    | Aug 2013       | May 2014              | 10 (+2 Pfam datasets) | 197,848   | 
| v2.1    | Oct 2018       | Sep 2019              | 14 (+2 Pfam datasets) | 445,635   | 


## Downloading data

The target sequences for each release of Genome3D can be found as compressed tarfiles. These files can be
downloaded and extracted by:


```sh
$ tar zxvf Genome3D.v2_1.tgz
```

The annotations can be accessed via the Genome3D API which is documented here:

    https://www.genome3d.eu/api


Annotations are sent to the server via the same API, an example CLI client can be found here:

    https://github.com/UCLOrengoGroup/genome3d-openapi-client  


The following provides an overview of the files contained in each of the Genome3D releases: 


```sh
$ head -n 5 all.data.tab all.ids all.fa taxon.tsv 
==> all.data.tab <==
Entry   Entry name      Status  Organism        Length  Version (entry) Date of last sequence modification      Sequence        yourlist:M20181015F725F458AC8690F874DD868E4ED79B88007FB1H       isomap:M20181015F725F458AC8690F874DD868E4ED79B88007FB1H
P09032  EI2BG_YEAST     reviewed        Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast)   578     166     1997-11-01      8d98996060f932c50e66b52f9147565a
P42835  EGT2_YEAST      reviewed        Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast)   1041    132     2011-09-21      bd6ddd8c8d2a352054affb11d8e3f7b9
P00925  ENO2_YEAST      reviewed        Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast)   437     192     2007-01-23      d0185180dc0fc39e6059279b4b65d319
Q08651  ENV9_YEAST      reviewed        Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast)   330     138     1996-11-01      0bf0e0978966e59dd7f87a40b45d3b23

==> all.ids <==
P09032
P42835
P00925
Q08651
P36156

==> all.fa <==
>sp|P09032|EI2BG_YEAST Translation initiation factor eIF-2B subunit gamma OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) OX=559292 GN=GCD1 PE=1 SV=3
MSIQAFVFCGKGSNLAPFTQPDFPFQTQNKDSTAATSGDKLNELVNSALDSTVINEFMQH
STRLPKALLPIGNRPMIEYVLDWCDQADFKEISVVAPVDEIELIESGLTSFLSLRKQQFE
LIYKALSNSNHSHHLQDPKKINFIPSKANSTGESLQKELLPRINGDFVILPCDFVTDIPP
QVLVDQFRNRDDNNLAMTIYYKNSLDSSIDKKQQQKQKQQQFFTVYSENEDSERQPILLD

==> taxon.tsv <==
559292  bakersyeast.fa  6049
3702    arabidopsis.fa  39260
9606    human.fa        20395
7227    fly.fa  21899
36329   plasmodium.fa   5447
```