Differences between revisions 3 and 16 (spanning 13 versions)
Revision 3 as of 2019-03-22 12:35:14
Size: 7213
Comment:
Revision 16 as of 2024-02-21 16:01:17
Size: 6008
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
== PSE Submission File Formats v1.1 ==
==== Submissions must uploaded as a ZIP-archive containing two files meta.csv and snps.csv. ====
Every SNP specific record in snps.csv file must have a corresponding record in meta.csv file and the information regarding country and breed for given individual must be in agreement in both files.
== PSE Submission File Formats ==
==== Submissions must be uploaded as a ZIP-archive containing two files meta.csv (file702) and snps.csv (file704). ====
Every SNP specific record in snps.csv file must have a corresponding record in meta.csv file and the information regarding country and breed for given individual must be in agreement in both files. The fields in both files should be comma separated and no blank fields are allowed. Note that SNP-names must be written in capital letters.
Line 5: Line 5:
The fields in both files should be comma separated and no blank fields are allowed. At each upload from a Service User, both files must be zipped together and submitted at the same time. Data submitted in one file will not be processed until both files are available.
Line 7: Line 7:
=== meta.csv ===
This file contains information related to animals and the genotyping. There must be one line in this file for each animal in the ''snps.csv''.
||<tablewidth="1068px" tableheight="21px"width="140px" style="border:none;padding:0in">'''Field Name''' ||<width="332px" style="border:none;padding:0in">'''Description''' ||<width="218px" style="border:none;padding:0in">'''Allowed Values''' ||
Data extracted from the GenoEx-PSE database generally follow the same format as the data uploaded. The downloaded files does however contain one additional column with a Unique Upload ID, allowing to match meta.csv and snps.csv files coming from the same upload event. Therefore, Upload ID allows distingushing multiple records of one individual coming from different sources and/or genotyping events.
Line 11: Line 9:

||<tablewidth="1069px" tableheight="458px"width="140px" style="border:none;padding:0in">Record Type ||<width="332px" style="border:none;padding:0in">Numeric ||<width="218px" style="border:none;padding:0in">702 ||
||<width="140px" style="border:none;padding:0in">Service User ||<width="332px" style="border:none;padding:0in">Alphanumeric ||<width="218px" style="border:none;padding:0in">Any organization abbreviation ||
||<width="140px" style="border:none;padding:0in">Source Country of Animals ||<width="332px" style="border:none;padding:0in">Numeric ||<width="218px" style="border:none;padding:0in">Any country code ||
||<width="140px" style="border:none;padding:0in">Animal ID - Breed Code ||<width="332px" style="border:none;padding:0in">Alphabetic ||<width="218px" style="border:none;padding:0in">Any breed code ||
||<width="140px" style="border:none;padding:0in">Animal ID - Nation Code ||<width="332px" style="border:none;padding:0in">Numeric ||<width="218px" style="border:none;padding:0in">Any country code ||
||<width="140px" style="border:none;padding:0in">Animal ID - Sex Code ||<width="332px" style="border:none;padding:0in">Alphabetic ||<width="218px" style="border:none;padding:0in">'''M''' or '''F''' ||
||<width="140px" style="border:none;padding:0in">Animal ID - Registration ||<width="332px" style="border:none;padding:0in">Alphanumeric ||<width="218px" style="border:none;padding:0in">Example: ''A1234567890'' ||
||<width="140px" style="border:none;padding:0in">Genotyping Laboratory Identification ||<width="332px" style="border:none;padding:0in">Alphanumeric ||<width="218px" style="border:none;padding:0in">Name and location or genotyping laboratory Example: ''Weatherbys Ireland'' ||
||<width="140px" style="border:none;padding:0in">Sample ID ||<width="332px" style="border:none;padding:0in">Alphanumeric ||<width="218px" style="border:none;padding:0in">Example: ''R1234567890'' ||
||<width="140px" style="border:none;padding:0in">Scan Date ||<width="332px" style="border:none;padding:0in">Numeric ||<width="218px" style="border:none;padding:0in">Any valid date in the format ''yyyymmdd'' ||
||<width="140px" style="border:none;padding:0in">Platform ||<width="332px" style="border:none;padding:0in">Alphanumeric ||<width="218px" style="border:none;padding:0in">Any SNP Platform code ||
||<width="140px" style="border:none;padding:0in">No. SNPs in Genotype ||<width="332px" style="border:none;padding:0in">Total number of SNPs in the animal's full genotype used to create the SNP record for GenoEx-PSE ||<width="218px" style="border:none;padding:0in">Example: ''55647'' ||
||<width="140px" style="border:none;padding:0in">Genotype Call Rate ||<width="332px" style="border:none;padding:0in">Percent call rate of the animal's full genotype used to create the SNP record for GenoEx-PSE ||<width="218px" style="border:none;padding:0in">Example: ''99.9'' ||
=== file 702 - meta.csv ===
File 702, named meta.csv, is required to be a variable length, comma delimited file in .csv format including a single record for each animal for which a SNP genotype details is being reported in the snps.csv file. For example, if the snps.csv file (File 704-AB or 704-TOP) includes SNP genotype results for 100 animals, the meta.csv file will have 100 records, one per animal.
||<tablewidth="1000px">'''Col<<BR>>''' ||'''Name<<BR>>''' ||'''Format<<BR>>''' ||'''Description<<BR>>''' ||'''Example<<BR>>''' ||
||1 ||Record type ||Numeric ||Record type ||702 ||
||2 ||Service user ||Alphanumeric ||Name of uploading organisation ||INRA ||
||3 ||Source country/sending country ||Alphanumeric ||3 letter country code^2^ ||FRA ||
||4 ||Animal ID^1^ - breed code ||Alphanumeric ||3 letter breed code^3^ ||BSW ||
||5 ||Animal ID^1^ - country code ||Alphanumeric ||3 letter country code^2^ ||AUS ||
||6 ||Animal ID^1^ - sex code ||Alpha ||1 letter breed code, M or F ||M ||
||7 ||Animal ID^1^ - registration ||Alphanumeric ||Animal identification, maximum 18 characters ||0001234567 ||
||8 ||Genotyping Laboratory ||Alphanumeric ||Genotyping laboratory^4^ ||Weatherbys Ireland ||
||9 ||Sample ID ||Alphanumeric ||Sample numbers used in the genotyping laboratory ||R1234567890 ||
||10 ||Scan Date ||Numeric ||date when laboratory concluded the analysis, format yyyymmdd ||20220425 ||
||11 ||Platform ||Alphanumeric ||SNP platform^4^ ||Illumina ||
||12 ||SNP array ||Numeric ||SNP array, named by number of SNPs^4^ ||54001 ||
||13 ||Call Rate ||Numeric ||Percent call rate of the animals full genotype used to create the SNP record for !GenoEx-PSE, two decimals ||99.98 ||
||||||||||<style="text-align:center">'''!downloads only!''' ||
||(14) ||upload ID ||Alphanumeric ||Unique ID allowing to match multiple data files of the same individual by upload event ||1fdb3a18-4a0d-41ee-8530-dac3880d00cf ||
Line 29: Line 31:
=== snps.csv ===
This file contains genotype data one or more animals. Four each animal in this file, identified by the four ''Animal ID'' fields, there must be a corresponding record in the ''meta.csv'' file.
||<tablewidth="1073px" tableheight="21px"width="165px" style="border:none;padding:0in">'''Field Name''' ||<width="253px" style="border:none;padding:0in">'''Description''' ||<width="260px" style="border:none;padding:0in">'''Allowed Values''' ||
=== file 704 - snps.csv ===
File 704, named snps.csv, contains the actual genotype data for the animals listed in the corresponding meta.csv file. The Service User may select to upload and/or download SNP genotype data in either the "AB" or "TOP" allele designations, which determines the content of the first field, namely Record Type, in the following file format (i.e.: File 704-AB versus File 704-TOP, respectively). This file will be exchanged as a variable length, comma delimited file in .csv format and include a single record for each SNP included for each animal. For example, if meta.csv file includes 100 animals and each SNP genotype include the 200 SNPs recommended by ISAG for Parentage Verification then this second file will include a maximum of 20,000 records (100 animals x 200 SNPs each). In the event that any SNP was not "called" and the result is missing, then that SNP for that animal should not be included in the GenoEx-PSE genotype exchange file. For this reason, even if the meta.csv file includes 100 animals, this second file may not necessarily have a total of 20,000 records.
||<tablewidth="1000px">'''Col<<BR>>''' ||'''Name<<BR>>''' ||'''Format<<BR>>''' ||'''Description<<BR>>''' ||'''Example<<BR>>''' ||
||1 ||Record type ||Numeric ||Record type ||704-AB or 704-TOP ||
||2 ||Animal ID^1^ - breed code ||Alphanumeric ||3 letter breed code^3^ ||BSW ||
||3 ||Animal ID^1^ - country code ||Alphanumeric ||3 letter country code^2^ ||AUS ||
||4 ||Animal ID^1^ - sex code ||Alpha ||1 letter breed code, M or F ||M ||
||5 ||Animal ID^1^ - registration ||Alphanumeric ||Animal identification, maximum 18 characters ||0001234567 ||
||6 ||SNP Name ||Alphanumeric ||SNP Name in CAPITAL letters^5^ ||ARS-BFGL-BAC-19454 ||
||7 ||Allele 1 ||Alpha ||A/B for 704-AB, A/C/G/T for 704-TOP ||A ||
||8 ||Allele 2 ||Alpha ||A/B for 704-AB, A/C/G/T for 704-TOP ||A ||
||||||||||<style="text-align:center">'''!downloads only!''' ||
||(9) ||upload ID ||Alphanumeric ||Unique ID allowing to match multiple data files of the same individual by upload event ||1fdb3a18-4a0d-41ee-8530-dac3880d00cf ||
Line 34: Line 46:
||<tablewidth="1073px" tableheight="249px"width="165px" style="border:none;padding:0in">Record Type ||<width="253px" style="border:none;padding:0in">Numeric ||<width="260px" style="border:none;padding:0in">'''704-AB''' or '''704-TOP''' ||
||<width="165px" style="border:none;padding:0in">Animal ID - Breed Code ||<width="253px" style="border:none;padding:0in">Alphabetic ||<width="260px" style="border:none;padding:0in">Any breed code ||
||<width="165px" style="border:none;padding:0in">Animal ID - Nation Code ||<width="253px" style="border:none;padding:0in">Numeric ||<width="260px" style="border:none;padding:0in">Any country code ||
||<width="165px" style="border:none;padding:0in">Animal ID - Sex Code ||<width="253px" style="border:none;padding:0in">Alphabetic ||<width="260px" style="border:none;padding:0in">'''M''' or '''F''' ||
||<width="165px" style="border:none;padding:0in">Animal ID - Registration ||<width="253px" style="border:none;padding:0in">Alphanumeric ||<width="260px" style="border:none;padding:0in">Example: ''A1234567890'' ||
||<width="165px" style="border:none;padding:0in">SNP Name ||<width="253px" style="border:none;padding:0in">Alphanumeric ||<width="260px" style="border:none;padding:0in">Any SNP name in CAPITALS ||
||<width="165px" style="border:none;padding:0in">Allele 1 ||<width="253px" style="border:none;padding:0in">Alphabetic ||<width="260px" style="border:none;padding:0in">'''A''' or '''B''' for ''704-AB''; '''A''', '''C''' '''G''' or '''T'''for ''704-TOP'' ||
||<width="165px" style="border:none;padding:0in">Allele 2 ||<width="253px" style="border:none;padding:0in">Alphabetic ||<width="260px" style="border:none;padding:0in">'''A''' or '''B''' for ''704-AB''; '''A''', '''C''' '''G''' or '''T'''for ''704-TOP'' ||
Line 44: Line 48:
 . ^1^Columns 4,5,6, and7 does together make up the animal identification. Interbull ID is not a requirement, but it is highly recommended to be used for animals that has such identification. For further information on Interbull ID, see here: https://interbull.org/ib/form_id_guidelines <<BR>> ^2^Country code according to [[https://www.iso.org/obp/ui/#search|ISO country codes]] <<BR>> ^3^Breed code according to [[http://www.interbull.org/ib/icarbreedcodes|ICAR breed codes]] <<BR>> ^4^Allowed values according to [[https://interbull.org/ib/pse_platform_list|laboratory/platform list]]. To request inclusion of new laboratories, platform or array, e-mail Interbull Center at genoex@interbull.se prior to upload <<BR>> ^5^Full SNP list for [[https://interbull.org/ib/pse_parentage_verification_snps|parentage verification]]

PSE Submission File Formats

Submissions must be uploaded as a ZIP-archive containing two files meta.csv (file702) and snps.csv (file704).

Every SNP specific record in snps.csv file must have a corresponding record in meta.csv file and the information regarding country and breed for given individual must be in agreement in both files. The fields in both files should be comma separated and no blank fields are allowed. Note that SNP-names must be written in capital letters.

At each upload from a Service User, both files must be zipped together and submitted at the same time. Data submitted in one file will not be processed until both files are available.

Data extracted from the GenoEx-PSE database generally follow the same format as the data uploaded. The downloaded files does however contain one additional column with a Unique Upload ID, allowing to match meta.csv and snps.csv files coming from the same upload event. Therefore, Upload ID allows distingushing multiple records of one individual coming from different sources and/or genotyping events.

file 702 - meta.csv

File 702, named meta.csv, is required to be a variable length, comma delimited file in .csv format including a single record for each animal for which a SNP genotype details is being reported in the snps.csv file. For example, if the snps.csv file (File 704-AB or 704-TOP) includes SNP genotype results for 100 animals, the meta.csv file will have 100 records, one per animal.

Col

Name

Format

Description

Example

1

Record type

Numeric

Record type

702

2

Service user

Alphanumeric

Name of uploading organisation

INRA

3

Source country/sending country

Alphanumeric

3 letter country code2

FRA

4

Animal ID1 - breed code

Alphanumeric

3 letter breed code3

BSW

5

Animal ID1 - country code

Alphanumeric

3 letter country code2

AUS

6

Animal ID1 - sex code

Alpha

1 letter breed code, M or F

M

7

Animal ID1 - registration

Alphanumeric

Animal identification, maximum 18 characters

0001234567

8

Genotyping Laboratory

Alphanumeric

Genotyping laboratory4

Weatherbys Ireland

9

Sample ID

Alphanumeric

Sample numbers used in the genotyping laboratory

R1234567890

10

Scan Date

Numeric

date when laboratory concluded the analysis, format yyyymmdd

20220425

11

Platform

Alphanumeric

SNP platform4

Illumina

12

SNP array

Numeric

SNP array, named by number of SNPs4

54001

13

Call Rate

Numeric

Percent call rate of the animals full genotype used to create the SNP record for GenoEx-PSE, two decimals

99.98

!downloads only!

(14)

upload ID

Alphanumeric

Unique ID allowing to match multiple data files of the same individual by upload event

1fdb3a18-4a0d-41ee-8530-dac3880d00cf

file 704 - snps.csv

File 704, named snps.csv, contains the actual genotype data for the animals listed in the corresponding meta.csv file. The Service User may select to upload and/or download SNP genotype data in either the "AB" or "TOP" allele designations, which determines the content of the first field, namely Record Type, in the following file format (i.e.: File 704-AB versus File 704-TOP, respectively). This file will be exchanged as a variable length, comma delimited file in .csv format and include a single record for each SNP included for each animal. For example, if meta.csv file includes 100 animals and each SNP genotype include the 200 SNPs recommended by ISAG for Parentage Verification then this second file will include a maximum of 20,000 records (100 animals x 200 SNPs each). In the event that any SNP was not "called" and the result is missing, then that SNP for that animal should not be included in the GenoEx-PSE genotype exchange file. For this reason, even if the meta.csv file includes 100 animals, this second file may not necessarily have a total of 20,000 records.

Col

Name

Format

Description

Example

1

Record type

Numeric

Record type

704-AB or 704-TOP

2

Animal ID1 - breed code

Alphanumeric

3 letter breed code3

BSW

3

Animal ID1 - country code

Alphanumeric

3 letter country code2

AUS

4

Animal ID1 - sex code

Alpha

1 letter breed code, M or F

M

5

Animal ID1 - registration

Alphanumeric

Animal identification, maximum 18 characters

0001234567

6

SNP Name

Alphanumeric

SNP Name in CAPITAL letters5

ARS-BFGL-BAC-19454

7

Allele 1

Alpha

A/B for 704-AB, A/C/G/T for 704-TOP

A

8

Allele 2

Alpha

A/B for 704-AB, A/C/G/T for 704-TOP

A

!downloads only!

(9)

upload ID

Alphanumeric

Unique ID allowing to match multiple data files of the same individual by upload event

1fdb3a18-4a0d-41ee-8530-dac3880d00cf

public/PSE_file_formats (last edited 2024-09-12 15:20:02 by KatarineHaugaard)