PSE File Formats

Submissions to GenoEx-PSE must be uploaded as a ZIP-archive containing two files: meta.csv (file702) and snps.csv (file704). Every SNP specific record in snps.csv file must have a corresponding record in meta.csv file and the information regarding country and breed for given individual must be in agreement in both files. The fields in both files should be comma separated and no blank fields are allowed. Note that SNP-names must be written in capital letters.

At each upload from a Service User, both files must be zipped together and submitted at the same time. Data submitted in one file will not be processed until both files are available .

Data extracted from the GenoEx-PSE database generally follow the same format as the data uploaded. The downloaded files does however contain one additional column with a Unique Upload ID, allowing to match meta.csv and snps.csv files coming from different uploading events. Therefore, Upload ID allows distinguishing multiple records of one individual coming from different sources and/or genotyping events. If you wish to upload more than one genotype for a given animal, you will have to do it in a separate set of files, as the database does not accept any duplicates in order to allow distinguishing between various genotyping events.

! Please note: Since the input files are comma delimited, do avoid the use of comma within a data field. If any of your fields require comma sign, add the escape character ‘\’ before the comma e.g. ‘Laboratory\, Branch’.

file 702 - meta.csv

File 702, named meta.csv, is required to be a variable length, comma delimited file in .csv format including a single record for each animal for which a SNP genotype details is being reported in the snps.csv file. For example, if the snps.csv file (File 704-AB or 704-TOP) includes SNP genotype results for 100 animals, the meta.csv file will have 100 records, one per animal.

Col

Name

Format

Description

Example

1

Record type

Numeric

Record type

702

2

Service user

Alphanumeric

Name of uploading organisation

INRA

3

Source country/sending country

Alphanumeric

3 letter country code2

FRA

4

Animal ID1 - breed code

Alphanumeric

3 letter breed code3

BSW

5

Animal ID1 - country code

Alphanumeric

3 letter country code2

AUS

6

Animal ID1 - sex code

Alpha

1 letter breed code, M or F

M

7

Animal ID1 - registration

Alphanumeric

Animal identification, maximum 18 characters

0001234567

8

Genotyping Laboratory

Alphanumeric

Genotyping laboratory4

Weatherbys Ireland

9

Sample ID

Alphanumeric

Sample numbers used in the genotyping laboratory

R1234567890

10

Scan Date

Numeric

date when laboratory concluded the analysis, format yyyymmdd

20220425

11

Platform

Alphanumeric

SNP platform5

Illumina

12

SNP array

Numeric

SNP array, named by number of SNPs5

54001

13

Call Rate

Numeric

Percent call rate of the animals full genotype used to create the SNP record for GenoEx-PSE, two decimals

99.98

!downloads only!

(14)

upload ID

Alphanumeric

Unique ID allowing to match multiple data files of the same individual by upload event

1fdb3a18-4a0d-41ee-8530-dac3880d00cf

file 704 - snps.csv

File 704, named snps.csv, contains the actual genotype data for the animals listed in the corresponding meta.csv file.

The Service User may select to upload and/or download SNP genotype data in either the "AB" or "TOP" allele designations, which determines the content of the first field, namely Record Type, in the following file format (i.e.: File 704-AB versus File 704-TOP, respectively). The file704-AB/704-TOP will be exchanged as a variable length, comma delimited file in .csv format and include a single record for each SNP included for each animal.

For example, if meta.csv file includes 100 animals and each SNP genotype include the 200 SNPs recommended by ISAG for Parentage Verification then snps.csv will include a maximum of 20,000 records (100 animals x 200 SNPs each). In the event that any SNP was not "called" and the result is missing, then that SNP for that animal should not be included in the GenoEx-PSE genotype exchange file. For this reason, even if the meta.csv file includes 100 animals, this second file may not necessarily have a total of 20,000 records.

Col

Name

Format

Description

Example

1

Record type

Numeric

Record type

704-AB or 704-TOP

2

Animal ID1 - breed code

Alphanumeric

3 letter breed code3

BSW

3

Animal ID1 - country code

Alphanumeric

3 letter country code2

AUS

4

Animal ID1 - sex code

Alpha

1 letter breed code, M or F

M

5

Animal ID1 - registration

Alphanumeric

Animal identification, maximum 18 characters

0001234567

6

SNP Name

Alphanumeric

SNP Name in CAPITAL letters6

ARS-BFGL-BAC-19454

7

Allele 1

Alpha

A/B for 704-AB, A/C/G/T for 704-TOP

A

8

Allele 2

Alpha

A/B for 704-AB, A/C/G/T for 704-TOP

A

!downloads only!

(9)

upload ID

Alphanumeric

Unique ID allowing to match multiple data files of the same individual by upload event

1fdb3a18-4a0d-41ee-8530-dac3880d00cf

public/PSE_file_formats (last edited 2024-03-04 09:43:13 by KatarineHaugaard)