PSE File Formats
Submissions to GenoEx-PSE must be uploaded as a ZIP-archive containing two files: meta.csv (file702) and snps.csv (file704). Every SNP specific record in snps.csv file must have a corresponding record in meta.csv file and the information regarding country and breed for given individual must be in agreement in both files. The fields in both files should be comma separated and no blank fields are allowed. Note that SNP-names must be written in capital letters.
At each upload from a Service User, both files must be zipped together and submitted at the same time. Data submitted in one file will not be processed until both files are available .
Data extracted from the GenoEx-PSE database generally follow the same format as the data uploaded. The downloaded files does however contain one additional column with a Unique Upload ID, allowing to match meta.csv and snps.csv files coming from different uploading events. Therefore, Upload ID allows distinguishing multiple records of one individual coming from different sources and/or genotyping events. If you wish to upload more than one genotype for a given animal, you will have to do it in a separate set of files, as the database does not accept any duplicates in order to allow distinguishing between various genotyping events.
! Please note: Since the input files are comma delimited, do avoid the use of comma within a data field. If any of your fields require comma sign, add the escape character ‘\’ before the comma e.g. ‘Laboratory\, Branch’.
file 702 - meta.csv
File 702, named meta.csv, is required to be a variable length, comma delimited file in .csv format including a single record for each animal for which a SNP genotype details is being reported in the snps.csv file. For example, if the snps.csv file (File 704-AB or 704-TOP) includes SNP genotype results for 100 animals, the meta.csv file will have 100 records, one per animal.
Col |
Name |
Format |
Description |
Example |
1 |
Record type |
Numeric |
Record type |
702 |
2 |
Service user |
Alphanumeric |
Name of uploading organisation |
INRA |
3 |
Source country/sending country |
Alphanumeric |
3 letter country code2 |
FRA |
4 |
Animal ID1 - breed code |
Alphanumeric |
3 letter breed code3 |
BSW |
5 |
Animal ID1 - country code |
Alphanumeric |
3 letter country code2 |
AUS |
6 |
Animal ID1 - sex code |
Alpha |
1 letter breed code, M or F |
M |
7 |
Animal ID1 - registration |
Alphanumeric |
Animal identification, maximum 18 characters |
0001234567 |
8 |
Genotyping Laboratory |
Alphanumeric |
Genotyping laboratory4 |
Weatherbys Ireland |
9 |
Sample ID |
Alphanumeric |
Sample numbers used in the genotyping laboratory |
R1234567890 |
10 |
Scan Date |
Numeric |
date when laboratory concluded the analysis, format yyyymmdd |
20220425 |
11 |
Platform |
Alphanumeric |
SNP platform5 |
Illumina |
12 |
SNP array |
Numeric |
SNP array, named by number of SNPs5 |
54001 |
13 |
Call Rate |
Numeric |
Percent call rate of the animals full genotype used to create the SNP record for GenoEx-PSE, two decimals |
99.98 |
!downloads only! |
||||
(14) |
upload ID |
Alphanumeric |
Unique ID allowing to match multiple data files of the same individual by upload event |
1fdb3a18-4a0d-41ee-8530-dac3880d00cf |
file 704 - snps.csv
File 704, named snps.csv, contains the actual genotype data for the animals listed in the corresponding meta.csv file.
The Service User may select to upload and/or download SNP genotype data in either the "AB" or "TOP" allele designations, which determines the content of the first field, namely Record Type, in the following file format (i.e.: File 704-AB versus File 704-TOP, respectively). The file704-AB/704-TOP will be exchanged as a variable length, comma delimited file in .csv format and include a single record for each SNP included for each animal.
For example, if meta.csv file includes 100 animals and each SNP genotype include the 200 SNPs recommended by ISAG for Parentage Verification then snps.csv will include a maximum of 20,000 records (100 animals x 200 SNPs each). In the event that any SNP was not "called" and the result is missing, then that SNP for that animal should not be included in the GenoEx-PSE genotype exchange file. For this reason, even if the meta.csv file includes 100 animals, this second file may not necessarily have a total of 20,000 records.
Col |
Name |
Format |
Description |
Example |
1 |
Record type |
Numeric |
Record type |
704-AB or 704-TOP |
2 |
Animal ID1 - breed code |
Alphanumeric |
3 letter breed code3 |
BSW |
3 |
Animal ID1 - country code |
Alphanumeric |
3 letter country code2 |
AUS |
4 |
Animal ID1 - sex code |
Alpha |
1 letter breed code, M or F |
M |
5 |
Animal ID1 - registration |
Alphanumeric |
Animal identification, maximum 18 characters |
0001234567 |
6 |
SNP Name |
Alphanumeric |
SNP Name in CAPITAL letters6 |
ARS-BFGL-BAC-19454 |
7 |
Allele 1 |
Alpha |
A/B for 704-AB, A/C/G/T for 704-TOP |
A |
8 |
Allele 2 |
Alpha |
A/B for 704-AB, A/C/G/T for 704-TOP |
A |
!downloads only! |
||||
(9) |
upload ID |
Alphanumeric |
Unique ID allowing to match multiple data files of the same individual by upload event |
1fdb3a18-4a0d-41ee-8530-dac3880d00cf |
1Columns 4,5,6, and 7 does together make up the animal identification. Interbull ID is not a requirement in PSE, but it is highly recommended to be used for animals that has such identification. For further information on Interbull ID, see ID guidelines
2Country code according to ISO country codes
3Breed code according to ICAR breed codes
4Allowed values according to laboratory list. To request inclusion of new laboratories, e-mail Interbull Center at genoex@slu.se prior to upload
5Allowed values according to platform/array list. To request inclusion of new platforms or arrays, e-mail Interbull Center at genoex@slu.se prior to upload
6Full SNP list for parentage verification and parentage discovery