GenoEx-GDE User’s manual v.1.1
part 2 – Database Manual
GDE stands for Genomic Data Exchange and GenoEx-GDE is the database that allows users to exchange whole SNP sets, where users define how and with whom the data is to be shared.
The whole system consists of the database itself, where data files can be uploaded and downloaded from, and the support program – gxprep.py – for input files’ preparation.
The instructions for preparing the input files from raw laboratory files and for assigning sharing permissions can be found in the first part of the manual: GDE-gxprep.py manual.
This part of the manual describes step by step how to:
upload the data to the GenoEx-GDE database;
define data selection choice and download the data from the GenoEx-GDE database.
1. Upload file formats
GenoEx-GDE allows upload of two types of files
- data file (format 706)
- sharing permissions file (format 711)
Detailed file formats can be found in the manual describing the usage of the support program gxprep.py GDE-gxprep.py manual. Here we only remind the examples of these files.
2. Organizations, Users and Logins
Every organization which signs the contact for becoming GenoEx-GDE Service User is from the service’s point of view considered as single User. On the other hand, it is possible to create several separate personal logins for different persons within the organization. As one Service User, they share the History of uploads and downloads as well as all rights and permissions assigned to the User. New logins can be added to the Service User’s account at any point of time. To do so, the user should send the request to firstname.lastname@example.org
At the login page, every person should log in with their personal credentials. In case of lost password, the Reset Password functionality is available under the login tab.
3. GDE data upload
As part of GenoEx services, GDE is sharing the entry point with PSE, therefore please make sure that you choose the right tab for the upload in the drop-down menu.
GenoEx-GDE allows for simultaneous or separate upload of 706 and 711 files. Choosing different alternatives results in one of the following scenarios:
Uploading both 706 and 711 files, containing the same animals results in adding both genotype data and sharing permissions to the database
Uploading only 706 file results in adding genotype data to the database with no sharing permissions attached i.e. these data is accessible only to the owner organization, until relevant sharing information is provided.
Uploading only 711 file results in updating the sharing permissions for all the genotypes listed in it only if these genotypes, identified by combination of animal ID and UUid are already present in the database. Note: Newly uploaded sharing permissions will overwrite the existing ones. That means that uploading new 711 containing empty sharing list will remove all the existing sharing permissions for given genotypes. In order to update the existing sharing list, the User should re upload the relevant 711 file updated with the desired changes
In order to upload the files to the GenoEx-GDE database, both 706 and 711 files have to be zipped in two separate zip files. Zip format is the only one accepted by the GenoEx-GDE database for the upload.
Zip files can be created using the support program gxprep.py as described in the relative manual GDE-gxprep.py manual.
If the User creates the zip files without using the support program gxprep.py, please make sure that the file containing genotypes (in 706 format) and compressed in the zip archive has name "genotypes.csv" and that the file containing sharing permissions (in 711 format) and compressed in the zip archive has name "711_file.csv"
To upload the zip files please choose the correct file from your computer and press UPLOAD
When upload starts, you will be redirected to the status page.
Note: this page does not update automatically, thus you must refresh it in order to see the progress.
The first status you will see is INITIATED, then PROCESSING and finally either FINISHED or FAILED, dependently whether the upload was successful or not.
During the uploading process, the following checks are performed:
- File706 and File711 format consistency;
SNP Array Code included among the ones listed in the "GenoEx-GDE system data - SNP Arrays" table;
Laboratory included among the ones listed in the "GenoEx-GDE system data - Laboratories" table;
- Full duplicate genotypes rejected (the upload of identical data: exactly the same animal, same SNP density, same genotype already stored in the database by the same organisation will reject the whole file706).
According to ICAR Guidelines Section 4 - DNA Technology (Feb 2022), in the uploading process GenoEx-GDE performs basic genotypes quality control checks on:
Call Rate (>=90%);
Genotype classes frequency (AA, AB, BB >= 20%).
Genotypes not fulfilling such quality control criteria are uploaded on the database and reported in both feedback email and status web page. During the uploading process the presence of animal's pedigree information in the Interbull Centre IDEA database is also checked (UUIDs of genotyped animals without information in IDEA are reported in both feedback email and status web page).
Successful Submission Status Page
Failing Submission Status Page
Like in the example above, a FAILED submission is always accompanied with the informative error message, which allows the User to find and correct the problem.
In any case, the User will receive a feedback e-mail when the data processing is finalised.
4. GDE data download
In order to download data from GenoEx-GDE the User has to select the Download options from GDE drop-down menu.
The genotypes to be downloaded (or the sharing permissions) can be chosen by BREED, COUNTRY OF ORIGIN, GENDER, SNP ARRAY (used for genotyping), submitting ORGANISATION and UPLOADING DATE (the latter two available only for genotypes extraction).
In the extraction process it is possible to choose if to download ALL the genotypes available for the animals (within chosen criteria by selecting the option "Genotypes Extraction All") or only ONE genotype per animal, the one with the highest call rate (within chosen criteria by selecting the option "Genotypes Extraction Best Only").
According to ICAR Guidelines Section 4 - DNA Technology (Feb 2022), GenoEx-GDE has in place basic genotypes quality control checks for SNP‐based genotype data. By selecting the options Call Rate (>=90%) and/or Genotype classes frequency (AA, AB, BB >= 20%) the user can download only the genotypes fulfilling such quality control criteria.
GenoEx-GDE is also connected to the Interbull Centre IDEA database (which hosts the international pedigree of dairy and beef breeds evaluated by Interbull Centre. By selecting the option "Pedigree Information" it is therefore possible to extract only genotypes of animals with pedigree information available in IDEA database.
Here as well, the status page should be reloaded to see the current status of the process. When the extraction is ready, a link to download the genotype file is provided:
The link to download the genotype file is also saved in the "Job history" table, after selecting the specific "Job ID". The downloaded genotypes file is formatted according to 706 file format specifications, while the sharing permissions file according to the 711 file format specifications. This means that corresponding SNP order file is needed to decode the 706 genotype string. In order to download both the genotypes and the sharing permissions, two different extraction requests should be created. The page has to be refreshed between the two requests.
Hints on genotypes extraction for InterGenomics Organisations are described here link.
5. GDE data statistics
By selecting from the GDE drop-down menu the option "statistics", the user can access to the GenoEx-GDE data overview, showing the user's data uploaded on on the database as well as the data uploaded by other organisations the user has access to.
6. User’s History
All the User’s actions related to genotypes upload and download are saved and can be later reviewed. As mentioned in section 2, GenoEx considers an Organisation that has a User’s Contract with Interbull Centre signed, is considered as single User, which however, can have several logins. The History is User, not Login specific, thus actions from all logins within one Organisation are listen in the history table together.
7. System Data
The page allows to display all the available SNP orders, the allowed laboratories, breeds and countries. If the User has the need to, all the available SNP orders can be downloaded from the SYSTEM DATA page.
If you have any further questions and/or suggestions for improving this manual, please contact us at email@example.com