Size: 6477
Comment:
|
Size: 6540
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
== PSE Submission File Formats == ==== Submissions must be uploaded as a ZIP-archive containing two files meta.csv (file702) and snps.csv (file704). ==== |
== PSE File Formats == Submissions to !GenoEx-PSE must be uploaded as a ZIP-archive containing two files: meta.csv (file702) and snps.csv (file704). |
Line 5: | Line 6: |
At each upload from a Service User, both files must be zipped together and submitted at the same time. Data submitted in one file will not be processed until both files are available. | At each upload from a Service User, both files must be zipped together and submitted at the same time. Data submitted in one file will not be processed until both files are available and the animal identification information match within both files. |
Line 7: | Line 8: |
Data extracted from the !GenoEx-PSE database generally follow the same format as the data uploaded. The downloaded files does however contain one additional column with a Unique Upload ID, allowing to match meta.csv and snps.csv files coming from the same upload event. Therefore, Upload ID allows distingushing multiple records of one individual coming from different sources and/or genotyping events. If you wish to upload more than one genotype for a given animal, you will have to do it in a separate set of files, as the database does not accept any duplicates in order to allow distinguishing between various genotyping events. | Data extracted from the !GenoEx-PSE database generally follow the same format as the data uploaded. The downloaded files does however contain one additional column with a Unique Upload ID, allowing to match meta.csv and snps.csv files coming from different uploading events. Therefore, Upload ID allows distinguishing multiple records of one individual coming from different sources and/or genotyping events. If you wish to upload more than one genotype for a given animal, you will have to do it in a separate set of files, as the database does not accept any duplicates in order to allow distinguishing between various genotyping events. |
Line 50: | Line 51: |
. ^1^Columns 4,5,6, and7 does together make up the animal identification. Interbull ID is not a requirement, but it is highly recommended to be used for animals that has such identification. For further information on Interbull ID, see here: https://interbull.org/ib/form_id_guidelines <<BR>> ^2^Country code according to [[https://www.iso.org/obp/ui/#search|ISO country codes]] <<BR>> ^3^Breed code according to [[http://www.interbull.org/ib/icarbreedcodes|ICAR breed codes]] <<BR>> ^4^Allowed values according to [[https://interbull.org/ib/pse_platform_list|laboratory/platform list]]. To request inclusion of new laboratories, platform or array, e-mail Interbull Center at genoex@interbull.se prior to upload <<BR>> ^5^Full SNP list for [[https://interbull.org/ib/pse_parentage_verification_snps|parentage verification]] | . ^1^Columns 4,5,6, and7 does together make up the animal identification. Interbull ID is not a requirement, but it is highly recommended to be used for animals that has such identification. For further information on Interbull ID, see here: https://interbull.org/ib/form_id_guidelines <<BR>> ^2^Country code according to [[https://www.iso.org/obp/ui/#search|ISO country codes]] <<BR>> ^3^Breed code according to [[http://www.interbull.org/ib/icarbreedcodes|ICAR breed codes]] <<BR>> ^4^Allowed values according to [[https://interbull.org/ib/pse_platform_list|laboratory/platform list]]. To request inclusion of new laboratories, platform or array, e-mail Interbull Center at genoex@slu.se prior to upload <<BR>> ^5^Full SNP list for [[https://interbull.org/ib/pse_parentage_verification_snps|parentage verification]] |
PSE File Formats
Submissions to GenoEx-PSE must be uploaded as a ZIP-archive containing two files: meta.csv (file702) and snps.csv (file704). Every SNP specific record in snps.csv file must have a corresponding record in meta.csv file and the information regarding country and breed for given individual must be in agreement in both files. The fields in both files should be comma separated and no blank fields are allowed. Note that SNP-names must be written in capital letters.
At each upload from a Service User, both files must be zipped together and submitted at the same time. Data submitted in one file will not be processed until both files are available and the animal identification information match within both files.
Data extracted from the GenoEx-PSE database generally follow the same format as the data uploaded. The downloaded files does however contain one additional column with a Unique Upload ID, allowing to match meta.csv and snps.csv files coming from different uploading events. Therefore, Upload ID allows distinguishing multiple records of one individual coming from different sources and/or genotyping events. If you wish to upload more than one genotype for a given animal, you will have to do it in a separate set of files, as the database does not accept any duplicates in order to allow distinguishing between various genotyping events.
! Please note: Since the input files are comma delimited, do avoid the use of comma within a data field. If any of your fields require comma sign, add the escape character ‘\’ before the comma e.g. ‘Laboratory\, Branch’.
file 702 - meta.csv
File 702, named meta.csv, is required to be a variable length, comma delimited file in .csv format including a single record for each animal for which a SNP genotype details is being reported in the snps.csv file. For example, if the snps.csv file (File 704-AB or 704-TOP) includes SNP genotype results for 100 animals, the meta.csv file will have 100 records, one per animal.
Col |
Name |
Format |
Description |
Example |
1 |
Record type |
Numeric |
Record type |
702 |
2 |
Service user |
Alphanumeric |
Name of uploading organisation |
INRA |
3 |
Source country/sending country |
Alphanumeric |
3 letter country code2 |
FRA |
4 |
Animal ID1 - breed code |
Alphanumeric |
3 letter breed code3 |
BSW |
5 |
Animal ID1 - country code |
Alphanumeric |
3 letter country code2 |
AUS |
6 |
Animal ID1 - sex code |
Alpha |
1 letter breed code, M or F |
M |
7 |
Animal ID1 - registration |
Alphanumeric |
Animal identification, maximum 18 characters |
0001234567 |
8 |
Genotyping Laboratory |
Alphanumeric |
Genotyping laboratory4 |
Weatherbys Ireland |
9 |
Sample ID |
Alphanumeric |
Sample numbers used in the genotyping laboratory |
R1234567890 |
10 |
Scan Date |
Numeric |
date when laboratory concluded the analysis, format yyyymmdd |
20220425 |
11 |
Platform |
Alphanumeric |
SNP platform4 |
Illumina |
12 |
SNP array |
Numeric |
SNP array, named by number of SNPs4 |
54001 |
13 |
Call Rate |
Numeric |
Percent call rate of the animals full genotype used to create the SNP record for GenoEx-PSE, two decimals |
99.98 |
!downloads only! |
||||
(14) |
upload ID |
Alphanumeric |
Unique ID allowing to match multiple data files of the same individual by upload event |
1fdb3a18-4a0d-41ee-8530-dac3880d00cf |
file 704 - snps.csv
File 704, named snps.csv, contains the actual genotype data for the animals listed in the corresponding meta.csv file. The Service User may select to upload and/or download SNP genotype data in either the "AB" or "TOP" allele designations, which determines the content of the first field, namely Record Type, in the following file format (i.e.: File 704-AB versus File 704-TOP, respectively). This file will be exchanged as a variable length, comma delimited file in .csv format and include a single record for each SNP included for each animal. For example, if meta.csv file includes 100 animals and each SNP genotype include the 200 SNPs recommended by ISAG for Parentage Verification then this second file will include a maximum of 20,000 records (100 animals x 200 SNPs each). In the event that any SNP was not "called" and the result is missing, then that SNP for that animal should not be included in the GenoEx-PSE genotype exchange file. For this reason, even if the meta.csv file includes 100 animals, this second file may not necessarily have a total of 20,000 records.
Col |
Name |
Format |
Description |
Example |
1 |
Record type |
Numeric |
Record type |
704-AB or 704-TOP |
2 |
Animal ID1 - breed code |
Alphanumeric |
3 letter breed code3 |
BSW |
3 |
Animal ID1 - country code |
Alphanumeric |
3 letter country code2 |
AUS |
4 |
Animal ID1 - sex code |
Alpha |
1 letter breed code, M or F |
M |
5 |
Animal ID1 - registration |
Alphanumeric |
Animal identification, maximum 18 characters |
0001234567 |
6 |
SNP Name |
Alphanumeric |
SNP Name in CAPITAL letters5 |
ARS-BFGL-BAC-19454 |
7 |
Allele 1 |
Alpha |
A/B for 704-AB, A/C/G/T for 704-TOP |
A |
8 |
Allele 2 |
Alpha |
A/B for 704-AB, A/C/G/T for 704-TOP |
A |
!downloads only! |
||||
(9) |
upload ID |
Alphanumeric |
Unique ID allowing to match multiple data files of the same individual by upload event |
1fdb3a18-4a0d-41ee-8530-dac3880d00cf |
1Columns 4,5,6, and7 does together make up the animal identification. Interbull ID is not a requirement, but it is highly recommended to be used for animals that has such identification. For further information on Interbull ID, see here: https://interbull.org/ib/form_id_guidelines
2Country code according to ISO country codes
3Breed code according to ICAR breed codes
4Allowed values according to laboratory/platform list. To request inclusion of new laboratories, platform or array, e-mail Interbull Center at genoex@slu.se prior to upload
5Full SNP list for parentage verification