Size: 31313
Comment:
|
Size: 19457
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
Describe public/GDE_api_manual here. |
|
Line 6: | Line 4: |
part 3 - API Manual (including use of gxapi.py) See also''''' !GenoEx-GDE''''' GDE_user_manual. ''''' gxapi.py ''''' support program, maintained and distributed by the Interbull Centre, allows easy access to API for upload/extract/download of 706 and 711 files associated with the !GenoEx-GDE database. This manual describes each of the calls of the API along with the usage of the gxapi program. These destriptions are organized into three section to focus on the main aspects of the API. * prepare the files from raw laboratory outputs with the use gxprep program * assign sharing permissions to genotypes, individuals, breeds etc * change sharing permissions at any point of time == 1. GenoEx-GDE upload files’ formats == !GenoEx-GDE allows upload of two types of files: * data file ('''''format 706''''' – section 1.1) * sharing permissions file ('''''format 711''''' – section 1.2) '''Note: '''Both input files are prepared from the laboratory output files by the gxprep.py program (section 2). The program is also assigning an unique UUID identifier to each genotype allowing distinguishing between several records of one animal. Both file are delimited by semicolon. === 1.1. File 706 === This file contains the actual genomic data, as well as the information about the animal, genotyping laboratory and chip used for the genotyping. <<BR>>Because typical genomic data contain a lot of information, data in this file is coded down to a single digit per SNP and single record per animal. Correctness of such coding requires SNPs to be written in certain order within the data stream, according to the SNP order list, where particular SNPs are recognized by name and given a position in the data stream. This coding, although allowing easy exchange of really large files, because of its dependency on the order, is unfortunately also prone to errors. Therefore, to ensure the highest data quality in the !GenoEx-GDE database, we provide a program called gxprep that takes raw laboratory files with your data as input, fetches correct SNP order list from our servers and produces correctly ordered 706 file ready to be uploaded to the database. See section 2. ==== 706 file format ==== ||<tablewidth="570px"width="259px" height="25px" style="border-top:1.00pt solid #000000;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0.49mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">'''Field Description ''' ||<width="131px" style="border-top:1.00pt solid #000000;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0.49mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">'''Format ''' ||<width="167px" style="border-top:1.00pt solid #000000;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0.49mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">'''Example ''' || ||<tablewidth="570px"width="259px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">Record type ^1^ ||<width="131px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">integer 3 ||<width="167px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">706 || ||<tablewidth="570px"width="259px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">Breed of animal ^4^ ||<width="131px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">character 3 ||<width="167px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">BSW || ||<tablewidth="570px"width="259px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">Country of first registration of animal ^2^ ||<width="131px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">character 3 ||<width="167px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">AUS || ||<tablewidth="570px"width="259px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">Sex ||<width="131px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">character 1 ||<width="167px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">M || ||<tablewidth="570px"width="259px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">ID number of animal ^5^ ||<width="131px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">alphanumeric 12 ||<width="167px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">000000A12345 || ||<tablewidth="570px"width="259px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">Organization sending this information ||<width="131px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">character ||<width="167px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">ANAFI || ||<tablewidth="570px"width="259px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">UUID ^6^ ||<width="131px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">alphanumeric 36 ||<width="167px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">assigned automatically by gxprep.py program || ||<tablewidth="570px"width="259px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">Genotyping laboratory ^7^ ||<width="131px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">character ||<width="167px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">Weatherbys Ireland || ||<tablewidth="570px"width="259px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">Sample ID ||<width="131px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">alphanumeric ||<width="167px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">S1234WI2001 || ||<tablewidth="570px"width="259px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">Additional ||<width="131px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">for future reference ||<width="167px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm"> || ||<tablewidth="570px"width="259px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">Array identifier ^8^ ||<width="131px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">alphanumeric ||<width="167px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">54609_a || ||<tablewidth="570px"width="259px" style="border-top:none;border-bottom:none;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0mm;padding-left:0mm;padding-right:0.49mm">AB – Genotype for SNP Index 1 ^10^ ||<width="131px" style="border-top:none;border-bottom:none;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0mm;padding-left:0mm;padding-right:0.49mm">integer 1 ||<width="167px" style="border-top:none;border-bottom:none;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0mm;padding-left:0mm;padding-right:0.49mm">0 || ||<width="259px" style="border-top:none;border-bottom:none;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0mm;padding-left:0mm;padding-right:0.49mm">AB – Genotype for SNP Index 2 ^10^ ||<width="131px" style="border-top:none;border-bottom:none;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0mm;padding-left:0mm;padding-right:0.49mm">integer 1 ||<width="167px" style="border-top:none;border-bottom:none;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0mm;padding-left:0mm;padding-right:0.49mm">1 || ||<width="259px" style="border-top:none;border-bottom:none;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0mm;padding-left:0mm;padding-right:0.49mm">AB – Genotype for SNP Index … ^10^ ||<width="131px" style="border-top:none;border-bottom:none;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0mm;padding-left:0mm;padding-right:0.49mm">integer 1 ||<width="167px" style="border-top:none;border-bottom:none;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0mm;padding-left:0mm;padding-right:0.49mm">2 || ||<width="259px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">AB – Genotype for SNP Index n ^9,10^ ||<width="131px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">integer 1 ||<width="167px" style="border-top:none;border-bottom:1.00pt solid #000000;border-left:none;border-right:1.00pt solid #000000;padding-top:0mm;padding-bottom:0.49mm;padding-left:0mm;padding-right:0.49mm">5 || 1. Record type is always 706 for this File Format 1. ISO 3166-1 alpha-3 codes (3 characters, capital letters) 1. Breed of evaluation (3 characters, capital letters, BSW, GUE, HOL, JER, RDC, SIM) 1. Breed of animal (3 characters, capital letters) 1. Alpha-numerical, Interbull standard, always 12 characters long 1. UUID, one for every uploaded genotype sequence. Additional information about generation of UUID can be found at [[https://wiki.interbull.org/public/uuid?action=print&rev=8|here]] 1. Genotyping laboratory, among the ones listed in the "Laboratories" table available in the !GenoEx-GDE System Data page https://genoex.org/display. If the laboratory is not listed in the table, a request should be sent to !GenoEx@slu.se 1. Array identifier, one of the listed in the "SNP Arrays" table ("Code" column) available in the !GenoEx-GDE System Data page https://genoex.org/display. 1. n is equal to the number of SNPs reported in the stem of the Array identifier 1. coded SNP values written as a continuous string. <<BR>>Acceptable values depend on the Illumina coded allele values, according to the following: {{{ BB→0 AB→1 AA→2 ‘unknown’→5 }}} ==== 706 example ==== {{{ 706;BSW;ITA;M;000000A12345;ANARB;09c98b1e-6af8-4254-9768-58d7cd1ddafd;Weatherbys Ireland;S1234WI2001;;54609_a;021010… }}} === 1.2. File 711 === . The function of 711 file is to set the permissions regarding who can have access to the data. Since the default sharing state for all the data uploaded to !GenoEx-GDE database is ‘not shareable’, the User has to upload 711 file(s) to change it. ==== 711 file format ==== ||<tablewidth="643px"width="195px" style="border:1px solid #000000;padding:0.49mm 2.03mm">'''Field Name''' ||<width="256px" style="border:1px solid #000000;padding:0.49mm 2.03mm">Format ||<width="178px" style="border:1px solid #000000;padding:0.49mm 2.03mm">Example || ||<width="195px" style="border:1px solid #000000;padding:0.49mm 2.03mm">Record Type ^1^ ||<width="256px" style="border:1px solid #000000;padding:0.49mm 2.03mm">alphanumeric 3 ||<width="178px" style="border:1px solid #000000;padding:0.49mm 2.03mm">711 || ||<width="195px" style="border:1px solid #000000;padding:0.49mm 2.03mm">Animal ID ^2^ - Breed Code ^3^ ||<width="256px" style="border:1px solid #000000;padding:0.49mm 2.03mm">character 3 ||<width="178px" style="border:1px solid #000000;padding:0.49mm 2.03mm">BSW || ||<width="195px" style="border:1px solid #000000;padding:0.49mm 2.03mm">Animal ID - Nation Code ^4^ ||<width="256px" style="border:1px solid #000000;padding:0.49mm 2.03mm">character 3 (with the exception of 840) ||<width="178px" style="border:1px solid #000000;padding:0.49mm 2.03mm">AUS || ||<width="195px" style="border:1px solid #000000;padding:0.49mm 2.03mm">Animal ID - Sex Code ||<width="256px" style="border:1px solid #000000;padding:0.49mm 2.03mm">character 1 ||<width="178px" style="border:1px solid #000000;padding:0.49mm 2.03mm">M || ||<width="195px" style="border:1px solid #000000;padding:0.49mm 2.03mm">Animal ID - Registration ||<width="256px" style="border:1px solid #000000;padding:0.49mm 2.03mm">alphanumeric 12 ||<width="178px" style="border:1px solid #000000;padding:0.49mm 2.03mm">000000A12345 || ||<width="195px" style="border:1px solid #000000;padding:0.49mm 2.03mm">UUID ^5^ ||<width="256px" style="border:1px solid #000000;padding:0.49mm 2.03mm">alphanumeric 36 ||<width="178px" style="border:1px solid #000000;padding:0.49mm 2.03mm">assigned automatically by gxprep.py program || ||<width="195px" style="border:1px solid #000000;padding:0.49mm 2.03mm">Shareable with organization(s) ^7^ ||<width="256px" style="border:1px solid #000000;padding:0.49mm 2.03mm">character, repeatable ||<width="178px" style="border:1px solid #000000;padding:0.49mm 2.03mm">BFRO,IBC || 1. Record type is always 711 for this File Format 1. Please see Interbull Bulletin 28. Each file can only contain any given animal in one row. 1. Breed of animal (3 characters, capital letters) 1. ISO 3166-1 alpha-3 codes (3 characters) 1. UUID, used as reference to every uploaded genotype sequence in the 706 file. Additional information about generation of UUID can be found [[https://wiki.interbull.org/public/uuid?action=print&rev=8|here]] 1. Comma-separated list of zero or more organizations that should be allowed to download the associated genotype ==== 711 example ==== {{{ 711;BSW;ITA;M;000000A12345;09c98b1e-6af8-4254-9768-58d7cd1ddafd;BFRO,IBC }}} == 2. Upload preparation program – gxprep.py == '''''gxprep.py''''' support program, maintained and distributed by Interbull Centre, prepares a set of files: 706 and 711, ready to be uploaded to !GenoEx-GDE database. Always check you are using the latest version of gxprep.py program downloading it from https://genoex.org/. === 2.1. gxprep functions === The program has four main commands: parse, sharing, show and zip parse should always be run as first option, because the other commands run on the output files produced by this command. sharing and show can be run several times, allowing gradual fine adjustments of the sharing permissions. zip is to be run at the end of the preparation process, when both files are ready for the upload. ==== parse ==== reads the input files (see section 2.2) and produces file 706 and initial 711 file - with all the animal IDs and corresponding UUIDs - but only the default (if set) sharing permissions assigned Most of the standard laboratory output files can be used as input unmodified (see 2.2) but the User has to provide additional information while running this command. Note: All these values could be provided also as default by specifying them in an initialization file (see 2.3.) . • '''-h [HELP] ''' . • '''-a [ARRAY] '''– alphanumerical value that defines the chip that was used for genotyping according to the Array List (see also section 3). '''Note: '''If the chip used for genotyping is not listed, the User is advised to contact Interbull Centre and provide the full list of SNPs and the name of the chip, which will then be added to the Array List. Using different array than the one assigned to the used chip is possible, but would raise multiple warnings. This is, however OK, if the actual chip and therefore laboratory output contains additional, custom SNPs, that the User does not wish to upload. . • '''-d [DELIMITER] '''– character argument that defines the column delimiter in body section of input file (laboratory output file). Default delimiter is tab, thus this command line option should be used if another delimiter is used instead. • -l [LAB] – standardized name of the laboratory where genotyping was performed, according to the Laboratory List (see also section 3). If the laboratory is not listed there, the User should notify Interbull Centre before preparing the data. • -s [SAMPLEMAP] – file mapping sample ID from laboratory to the actual animal ID. '''Note:''' all animal IDs should be provided as International Interbull ID (see interbull.org/ib/interbull_guidelines for more information) . • '''-l [LAB] '''– Genotyping laboratory, among the ones listed in the "Laboratories" table available in the !GenoEx-GDE System Data page https://genoex.org/display. If the laboratory is not listed in the table, a request should be sent to GenoEx@slu.se . • '''-s [SAMPLEMAP] '''– File mapping sample id to corresponding animal IDs. . • '''-i [INPUTSPEC] '''– since the contents of the laboratory output files may differ between organizations, the User has to specify which columns in the file contain required information. With the assumption that column numbers start with 1, the order should be as follows: SNP name, sample ID, allele1, allele 2. This value should be provided as four space separated numbers in single or double quotation, e.g. ‘1 2 3 4’ . • '''-c [CACHEDIR]''' – specifies the directory where SNP order files will be downloaded from Interbull servers '''Note:''' Both output files names from this command will have a common stem, shared with input file, followed by the file type number (e.g. from input file I52690.txt, the stem is I52690, and the files produced by parse are I52690-706.csv and I52690-711.csv), which is used by the other commands in this program. ===== parse example ===== {{{ python gxprep.py parse -a 7931_a -l CIGENE -s SampleMap.txt -C ~/gde/gxprep/cachedir -i "1 2 3 4" Iexample.txt }}} This command first retrieves the SNP order file from Interbull server and saves it to /gde/gprep/cachedir folder. Then, it parses the Iexample.txt file retrieving animal IDs from !SampleMap.txt file (see section 2.2. for description of input files). In Iexample.txt, it looks for SNP name in column 1, sample ID in column 2, allele1 in column 3 and allele2 in column 4. Allele 1 and allele 2 are then coded to one digit, accordingly to the formula: BB→0, AB→1, AA→2, ‘unknown’→5 and placed in the genotype string according to the SNP order. All samples listed in this file are get CIGENE as laboratory. Also, each newly created record gets assigned UUid identifier and the same number, along with corresponding animal ID is listed in 711 file. If you set up any defaults for sharing (see section 2.3) they will also be used in newly created 711 file, otherwise the last column in this file will remain empty. <<BR>>The files created in this example will be named Iexample-706.csv and Iexample-711.csv ==== sharing ==== is used to add or remove organizations from the list of the organizations allowed to download given genotype. This command only operates on the 711 file and thus ignores the 706 file, if present. '''Note:''' Newly created, by parse command 711 file, will normally have the last column (‘Shareable with organizations’) empty, unless the defaults are specified otherwise (see 2.3.). Therefore, in most of the cases it is necessary to run sharing to create the list of organizations each records can be shared with. This can be done either by assigning the same permissions to all the data within the file, by adding them according to pattern defined by breed, sex or country of origin or by providing a list of specific animals that should have the sharing permissions changed. This command takes the following arguments: * ''' - a [ORGANIZATION]''' - adds organization(s) the list, the data should be shared with. If no further arguments are provided, this sharing will be assigned to all the records in given 711 file. * '''- r [ORGANIZATION] '''- removes listed organization(s) from the sharing list * '''- b [BREED]''' - assigns sharing permissions by breed(s) * '''- g {M,m,F,f}''' - assigns sharing permissions by sex. Not defining this argument will assign defined sharing permissions to animals of both genders, given they fulfill the other conditions * '''- f [COUNTRY] '''- assigns sharing permissions by country(s) of registration (part of animal ID) * '''- i {filename}''' - with this command User can provide the list of specific animals (animal ID) that should be affected by sharing permissions change. * '''INPUT FILE '''– the stem of (common part of the names of 706 and 711 files) ===== sharing examples ===== . {{{ python gxprep.py sharing -a ANARB –a BFRO –b BSW -g M -f ITA Iexample }}} This command adds sharing permissions for ANARB and BFRO to all the BSW males originating from Italy in Iexample-711.csv file . {{{ python gxprep.py sharing -r ANARB –a IBC –i aidlist.txt Iexample }}} This command adds sharing permissions for IBC and removes it for ANARB for all the genotypes in Iexample-711.csv file according to the animal ID list in the file aidlist.txt ==== show ==== gives an overview of sharing patterns in given file, using the stem of input files as the only argument. ===== show example ===== . {{{ python gxprep.py show Iexample }}} This command shows the summary of the sharing settings, giving an output like this: {{{ Content of sharing intermediate file Iexample-711.csv: 11 genotypes (all female) shared with IBC 9 genotypes (all male) shared with BFRO,IBC }}} ==== zip ==== . prepares zip file to be uploaded to !GenoEx-GDE database. It uses the stem of input files as the only argument. ===== zip example ===== . {{{ python gxprep.py zip Iexample }}} After running this command both 706 and 711 files will be zipped in two separate files ready for the upload to !GenoEx-GDE database. === 2.2. gxprep input files === '''''gxprep '''''is constructed to accept most typical laboratory output files and convert it to 706 data file adding also the initial version of 711 file for setting data sharing permissions. Of course, since both 706 and 711 files base on animal ID, whereas laboratory output files operate on sample IDs, the User also needs to provide a reference file mapping each sample ID to the corresponding animal ID. The laboratory output file and ID reference file are expected to follow the formats as described below: ==== laboratory file ==== This file contains actual data as received from the laboratory, with Sample ID as a key. In the examples above this file is named Iexample.txt '''[Header] '''<<BR>> optional, general information regarding analysis, chip and number of samples<<BR>> '''[Data] '''<<BR>>please make sure that is specified, no matter if the header is included or not.<<BR>>''' ! [Data] marks the place where reading the information is started. ''' ||<tablewidth="972px" tableheight="160px" tablestyle="text-align:left">'''Field Name ''' ||'''Description ''' ||'''Allowed Values ''' || ||SNP name ||Alphanumeric ||SNP name in CAPITALS e.g. ARS-BFGL-NGS-64740 || ||Sample ID ||Alphanumeric ||Laboratory sample ID, has to correspond to animal ID in key file || ||All1 ||Alphabetic ||1 character code A or B according to Illumina AB coding || ||All2 ||Alphabetic ||1 character code A or B according to Illumina AB coding || '''Note: '''the above columns are required for !GenoEx. Laboratory file can however contain additional columns, or columns in different order. As described in point 1.2.1. under parse, the User can specify which columns are containing required information. ==== ID reference file ==== This file contains the key to identify which Sample ID belongs to which animal. In the examples above this file is named !SampleMap.txt <<BR>>'''Note: '''The only allowed delimiter allowed in this reference file is TAB ||<tablewidth="647px" tableheight="104px">'''Field Name''' ||'''Description''' ||'''Allowed Values''' || ||Sample ID ||Alphanumeric ||Laboratory sample ID, has to correspond to Sample ID in genotyping file || ||Animal ID ||Alphanumeric ||International Interbull ID* || * International Interbull Animal ID consists of 18 characters as follows: <<BR>> 3 characters - breed code (capitals, according to ICAR breed coding), <<BR>>3 characters – country code (capitals, according to Interbull country coding), <<BR>>1 character – sex code capital M or F), <<BR>>11 characters – registration ID (alphanumerical). === 2.3. gxprep default settings === If the User is always using the same array, laboratory, the same columns in the laboratory file or always shares all the data with the same list of the organizations, they may want to pre set these values as default. This can be done by editing '''''gxprep.ini file.''''' Dependently on your local settings, this file is located in the current directory and/or in the users home directory and also the .gxpreprc file in the user’s home directory. <<BR>> Recognized configuration options are the following: <<BR>> . arrayspec - default for -a/--array switch of parse sub command <<BR>>labspec - default for -l/--lab switch of parse sub command <<BR>>samplemap - default for -s/--sample-map switch of parse sub command <<BR>>inputspec - default for -i/--input-spec switch of parse sub command <<BR>> delimchar - default for -d/--delimeter switch of parse sub command <<BR>>cachedir - default for -C/--cachedir switch of parse sub command sharing - default <<BR>>sharing, i.e. which organizations to share data with . gxprep.ini example {{{ [gxprep] arrayspec=44887_a labspec=The Roslin Institute samplemap=~/referencefile.txt inputspec=1 2 5 6 delimchar=, cachedir=~/gxprep sharing=ANARB BFRO }}} In the example above,__ if not specified otherwise in the command line__: * 706 file will be created accordingly to the SNP order 44887_a * assigned laboratory is ‘The Roslin Institute’ * ID reference file is looked up under the name referencefile.txt. * the information is read from input file as follows: * SNPname - column 1 * sample ID - column 2 * allele 1 and allele 2 - columns 5 and 6, respectively. * the delimiter is expected to be comma. * SNP order file is downloaded and stored in ~/gxprep * all genotypes processed with these settings have sharing permitted for ANARB and BFRO == 3. gxprep "Tips and Tricks" == Whereas all currently allowed values are available to view via !GenoEx home page, one can also see it directly in the terminal with use of gxprep.py <<BR>>Below, there is a list of commands to retrieve specific lists list of '''supported arrays ''' . {{{ python gxprep.py parse -a xxxx xxxx }}} list of '''supported labs ''' . {{{ python gxprep.py parse -l xxxx xxxx }}} list of supported''' organizations ''' . {{{ python gxprep.py sharing -a xxxx xxxx }}} list of supported '''country codes ''' . {{{ python gxprep.py sharing -f xxxx xxxx }}} list of supported''' breed codes ''' . {{{ python gxprep.py sharing -b xxxx xxxx }}} download '''specific SNP order file''' - Windows . {{{ python gxprep.py parse -C . -a 38275_a xxxx>g }}} download '''specific SNP order file''' - Linux . {{{ python gxprep.py parse -C . -a 38275_a xxxx>g dev/null }}} '''Note: '''Trailing arguments ‘xxxx xxxx’ can be replaced by any other nonsense words at least 2 characters long. |
'''part 3 - API Manual (including the use of the gxapi.py program).''' See also part 1 in [[https://interbull.org/ib/gde_user_manual|GDE_user_manual]] and part 2 in [[https://interbull.org/ib/gde_gxprep_manual|GDE_gxprep_manual]]. This part assumes that those previous parts have already been read and understood. The ''gxapi.py'' support program, maintained and distributed by the Interbull Centre, allows easy access to the API for upload and download of 706 and 711 files associated with the !GenoEx-GDE database and is provided as an easy way to get started with using the API. For those that can read the python code that it is written in, it also provides an additional source of detailed documentation of the API. This manual describes each of the calls of the API along with the usage of the gxapi program. These descriptions are organized into four sections to focus on the main aspects of the API where the first section provides an overview of, and some general information about, the API and the last section focus on the gxapi.py program. The remaining sections focus on different usage of the API. == Section 1, overview and general information == The API is provided as an alternative way to access the functionality provided via the https://genoex.org/ web browser interface and is provided via POST calls on the same site. The operations have a basic structure where each call require arguments in JSON format split into '''parameters''' and '''auth''', both of which are key/value mappings.<<BR>> The '''auth''' part always contain keys '''username''' and '''pw''' where the respective values should be your registered email address and associated password.<<BR>> The '''parameters''' part contain different keys depending on what call it is about, but always, at least, information on the '''version'''.<<BR>> An example call via the ''curl'' program looks like (in one long command): . {{{ curl --data 'data={ "parameters": { "version": "220805" }, "auth": { "username": "username@company.com", "pw": "test" } }' https://genoex.org/genoex/api/gde_get_parameters }}} Where the '''`username@company.com`''' and '''`test`''' strings would need to be substituted with your registered email and associated password. Note that passwords containing special characters may be problematic and need to be encoded according to JSON rules for this to work. This represents the common basic structure of every call to the API although many calls would need additional parameters.<<BR>> This is a suitable command to use for verifying that you have access and that communication is set up correctly, so please use this (or something similar) as first command when trying out this functionality. The above example uses the linux command line syntax to embed strings inside a string. For windows, this would probably need to be written something like: . {{{ curl --data "data={ \"parameters\": { \"version\": \"220805\" }, \"auth\": { \"username\": \"username@company.com\", \"pw\": \"test\" } }" https://genoex.org/genoex/api/gde_get_parameters }}} The remaining examples will stick to the linux syntax as it is easier to read, so if you need to use this altered syntax for this example, then apply the corresponding modifications to each following example.<<BR>> All calls may be using the HTTP construct multipart/form-data (i.e. the `-F` flag instead of `--data` in ''curl''), but only the ''gde_submission'' call require that. This is perhaps a suitable time to emphasize that the use of ''curl'' in this documentation is primarily meant to provide examples of the use of the API and to show the details of the syntax of the arguments. It is definitely not meant as a suggestion that this is a good way to implement a scripted workflow, in fact it is strongly recommended against that. The gxapi.py program is meant for such uses. The data returned from each call is a JSON encoded data structure containing, at the minimum, keys '''"status"''' and '''"status_message"'''. If status has value `true` then an additional key named '''return_values''' is provided (with some exceptions). The details below are up-to-date with the '''''220805''''' version of the API (and gxapi.py). The API is in central parts asynchronous, i.e. an operation is first initiated and then the user would need to periodically poll for the status of that operation until it terminates either successfully or with a failure. This mode of operation is needed to avoid the timeouts inherent in normal implementations of the HTTP protocol for long running operations. The return values of most calls is a JSON data structure looking, at the top level, like: . {{{ {"return_values": { ... }, "status": true, "status_message": "some message string"} }}} Where the "..." would be a set of key/value pairs which will vary between different calls.<<BR>> Whenever the value of '''"status"''' is `false`, then the value of '''"status_message"''' will report the error message. Furthermore, if the value of '''"status"''' is `true`, then the value of '''"return_values"''' should still be investigated for possible error messages before retrieving the real return values (keys '''"error"''' or '''"error_list"''' to be explicit). The value of '''"status_message"''' should be ignored when the value of '''"status"''' is `true`, but it may in some cases contain a comment. The following two sections focus on the primary functionalities provided: upload and download of 706/711 files. == Section 2, upload of 706/711 files == This is a two steps operation: a submit call (once) and then intermittently (once per minute or so) polling status of that submission until a terminating state is reached. '''Step 1''': An example of a submit call via the ''curl'' program looks like (note that this call require multipart/form-data): . {{{ curl -s -F 'data={ "parameters": { "version": "220805" }, "auth": { "username": "username@company.com", "pw": "test" } }' \ -F "access_file=@system-data/711-access.zip;type=application/octet-stream" \ -F "dataset_file=@system-data/706-genotypes.zip;type=application/octet-stream" https://genoex.org/genoex/api/gde_submission }}} As in all these examples, the '''`username@company.com`''' and '''`test`''' strings would need to be substituted to your registered email address and associated password before running this.<<BR>> In addition, the paths and filenames specified (i.e. the parts between `@` and `;` inside the JSON strings) need to be adapted to your own situation.<<BR>> Note that the use of a single backslash at the end of the lines is just a way to visualize that the single command continues on the next line. This example shows how to upload a 706 file and the associated 711 file in one go, but if only one of these file types are to be uploaded then simply remove the '''-F''' switch, and following associated JSON string, related to the file you ''are not'' going to upload. The above submission call will return a JSON data structure containing, if successful, the '''job_id''' assigned to this submission: . {{{ {"return_values": {"access_file": "711-access.zip", "access_status_message": "Access file received", "dataset_file": "706-genotypes.zip", "dataset_status_message": "Dataset file received", "job_id": "9be6c0bf-de9f-4951-b9e1-27217ec1e0c4"}, "status": true, "status_message": "job started"} }}} Please remember that in all calls, if the key '''status''' has a `false` value, then the error message is found in '''status_message'''. Even if '''status''' is `true`, there may still be errors described inside the '''return_values''' data structure, but in the specific case of `gde_submission` there are currently no such cases. '''Step 2''': <<Anchor(polling)>>Polling for status, is accomplished via a call like: . {{{ curl -s --data 'data={ "parameters": { "version": "220805", "job_id": "9be6c0bf-de9f-4951-b9e1-27217ec1e0c4" }, \ "auth": { "username": "username@company.com", "pw": "test" } }' https://genoex.org/genoex/api/gde_job_status }}} The '''9be6c0bf-de9f-4951-b9e1-27217ec1e0c4''' string needs to be replaced with the value of the '''job_id''' key provided in the return data structure of the submit call above. This last call is then intermittently repeated, with no change, until a '''job_status''' of ''"FINISHED"'' or ''"FAILED"'' is reached and returned in a JSON data structure: . {{{ {"return_values": {"error_list": [], "job_id": "9be6c0bf-de9f-4951-b9e1-27217ec1e0c4", "job_status": "FINISHED", "status_time": "2022-08-05 13:35:02"}, "status": true, "status_message": "GDE upload"} }}} Errors may be returned under the key '''error_list''' inside the '''return_values''' value (but there is no '''error''' key there in this call). Note that in the ''gde_job_status'' call, the '''return_values''' structure may in the ''"FINISHED"'' case include an additional key '''test_results'''. The value of that key is, if present, a (potentially rather long, multiple line) string that should be made known to the user. This additional key will only be present if the job_id is referring to an upload operation. == Section 3, download of 706/711 files == Download operations are a bit different from upload as 711 files are downloaded in synchronous mode but 706 files are downloaded in asynchronous mode, similar to upload. In addition, there is an optional preliminary step to retrieve all the available values to choose from when selecting the parameter values to provide in the download operation<<BR>> (you may want to redirect the output to a file {see '''params.json''' in the command line of the example} to have the results handy - and refresh this file from time to time by repeating this operation): . {{{ curl -s --data 'data={ "parameters": { "version": "220805" }, "auth": { "username": "username@company.com", "pw": "test" } }' \ -o params.json https://genoex.org/genoex/api/gde_get_parameters }}} This is a synchronous operation and hence a single step is sufficient. The '''return_values''' data structure in the reply will include keys: '''breeds''', '''countries''', '''orgs''', '''gender''', '''extraction_type''' and '''arrays''', but no error message. The value of each key is a list of strings to choose from when specifying the corresponding parameter in calls below. This data roughly corresponds to the data shown in the download dialog of the web browser interface. === Download 706 files === This is a three steps operation: an extraction call (once) followed by intermittently (every 30 seconds or so) polling status of that extraction until a terminating state is reached and finally, if status of extraction is "FINISHED", downloading the resulting assembled zip file. '''Step 1''': The extraction call is where the specification for what data to download is provided.<<BR>> The allowed values for different parts of the specification are: * "breeds": list of breed codes for the data to download, empty list means all available breeds * "countries": list of country codes for the data to download, empty list means all available countries * "gender": either "BOTH", "F" or "M" * "extraction_type": either "b" (best fit - i.e. highest call_rate) or "a" (all genotypes) * "arrays": list of array aliases for the data to download, empty list means all available arrays * "orgs": list of organization codes for the data to download, empty list means all organizations except IBC * "date_start": genotypes uploaded since this day are eligible * "date_end": genotypes uploaded until this day are eligible * "quality_criteria": comma separated string with one or more of: "frequency", "pedigree" and/or "call_rate", empty string means all, null means ignored Example call: . {{{ curl -s --data 'data={ "parameters": { "version": "220805", "breeds": ["BSW"], "countries": [], "gender": "BOTH", "extraction_type": "b", "arrays": [], \ "orgs": [], "date_start": "2020-12-01", "date_end": "2022-08-03", "quality_criteria": null }, \ "auth": { "username": "username@company.com", "pw": "test" } }' \ https://genoex.org/genoex/api/gde_extraction }}} Note that in this example, the values of keys "countries" and "arrays" are specified as empty lists. This means "all values included". The value for key "orgs" is also an empty list, but that means all orgs except IBC.<<BR>> The value of "quality_criteria" is null, also meaning "anything goes" ignoring the results of the quality checks, i.e. all genotypes are considered for extraction. This extraction call will return a JSON data structure containing, if successful, the '''job_id''' assigned to this submission: . {{{ {"return_values": {"job_id": "16ebfbd8-6f22-4fb0-b9c9-62580d3f65fe"}, "status": true, "status_message": "job started"} }}} If the extraction call fails, it may return an error message as a string value in the '''error''' key inside the '''return_values''' value. '''Step 2''': Intermittently poll for status, which is performed identical to how it is done for the upload except that the job_id is extracted from the reply of the extraction call.<<BR>> See [[#polling|section 2 step 2]] "polling for status", for how this step is accomplished. '''Step 3''': If the extraction was successful (i.e. polling ended with status ''"FINISHED"''), this step is simply a call to download the zip file associated with the prepared extraction. Example: . {{{ curl -s --data 'data={ "parameters": { "version": "220805", "job_id": "16ebfbd8-6f22-4fb0-b9c9-62580d3f65fe"}, \ "auth": { "username": "username@company.com", "pw": "test" } }' \ -o downloaded-706.zip https://genoex.org/genoex/api/gde_download }}} Errors are a bit tricky to handle as they are returned as a JSON structure instead of the expected zip file, so would in the above ''curl'' call end up inside the ''downloaded-706.zip'' file. See the gxapi.py program for how this may be accomplished. === Download 711 files === This is a single step operation which is fully specified in a single ''curl'' call: . {{{ curl -s --data 'data={ "parameters": { "version": "220805", "breeds": ["BSW"], "countries": [], "gender": "BOTH", "arrays": [] }, \ "auth": { "username": "username@company.com", "pw": "test" } }' \ -o downloaded-711.zip https://genoex.org/genoex/api/gde_download_711 }}} The parameters "breeds", "countries", "gender" and "arrays" are used precisely as for the download of file706 reported above. The same challenge in handling errors as ''gde_download'' above apply also to ''gde_download_711'' calls. == Section 4, using the gxapi.py program == The gxapi.py program is fetched from the web browser interface on the `GDE -> UPLOAD` page.<<BR>> Note that the gxapi.py program requires a fairly recent version of python (3.7 or newer) with the requests module installed. In the examples below, the gxapi.py program is assumed to be located in the current directory, but that is not a requirement.<<BR>> If it is located in another directory, just precede `gxapi.py` in the examples with the path to where it is stored, i.e. use something like `path-to-installdir/gxapi.py` instead of `gxapi.py` in the examples. An alternative (Linux only?) way to install it and execute it is to put the gxapi.py file in one of the directories in your execution path, see environment variable `PATH`, and enable the execution flag on it. In that case, the leading "python " can be removed from the examples below. To get a quick overview of how to execute it, run it with the `-h` switch: . {{{ python gxapi.py -h }}} === Upload of 706/711 files === To get a quick overview of how to execute `upload`, run it with the `-h` switch: . {{{ python gxapi.py upload -h }}} To upload a pair of 706/711 files in one go, simply run it like this: . {{{ python gxapi.py upload username@company.com test path-to-files/706-file.zip path-to-files/711-file.zip }}} To upload only one file, either a 706 or a 711 file, just omit the argument referencing the other file in the example above. === Download of 706/711 files === The optional preliminary step, calling `gde_get_parameters`, is performed via: . {{{ python gxapi.py fetch username@company.com test >params.json }}} (complete with redirecting the stdout to a file, `params.json`, to save output for later). To get an overview of how to execute `download`, run it with the `-h` switch: . {{{ python gxapi.py download -h }}} Here, the switches are divided into groups "optional arguments" (used for both 706 and 711 files), "genotypes" (used for 706 files only) and "access" (used for 711 files only): optional arguments: * -b BREEDS, --breeds BREEDS<<BR>>which breeds to extract data for (comma delimited, omit for all) * -c COUNTRIES, --countries COUNTRIES<<BR>>which countries to extract data for (comma delimited, omit for all) * -M, --male<<BR>>extract male data only * -F, --female<<BR>>extract female data only * -a ARRAYS, --arrays ARRAYS<<BR>>which arrays to download data for (comma delimited, omit for all) genotypes: * --all<<BR>>download all matched data (default - best matched only) * -o ORGS, --orgs ORGS<<BR>>which organizations to extract data from (comma delimited, omit for all) * -S START, --start START<<BR>>which start date to extract data for (omit for all) * -E END, --end END<<BR>>which end date to extract data for (omit for today) * -q QUALITY, --quality QUALITY<<BR>>which successful tests to extract data for (omit for all) access: * -A<<BR>>download access data instead of genotypes Note that the switch `-A` is used to select if downloading a 711 file (if present) or a 706 file (if omitted).<<BR>> At the end of the help output, a couple of small explicit examples are shown, but here follows a couple more. To, for example, download a 706 file containing the best genotypes of BSW bulls regardless of country, array, organization, date-of-upload or quality status execute: . {{{ python gxapi.py download -b BSW -M -q "" username@company.com test path-to-files/downloaded-706-file.zip }}} Adding switch `--all` would remove limitation to download only the best genotypes of each animal. To do the same download but only genotypes that pass all quality checks, omit `-q ""` from the above command. To further limit the data downloaded, add switches for breeds, countries, arrays, organizations and/or dates and either replace the empty string after `-q` with a suitable specification (e.g. `pedigree,call_rate`) or omit `-q` and associated string completely (which is the same as specifying `-q frequency,pedigree,call_rate`). Example data volume limiting switch: . {{{ -a "GeneSeek Dairy Ultra LD v2 7049,Illumina Bovine3k BeadChip 2900" }}} Another example, downloading a 711 file for all animals available: . {{{ python gxapi.py download -A username@company.com test path-to-files/downloaded-711-file.zip }}} |
GenoEx-GDE User’s manual v.1.2
part 3 - API Manual (including the use of the gxapi.py program).
See also part 1 in GDE_user_manual and part 2 in GDE_gxprep_manual. This part assumes that those previous parts have already been read and understood.
The gxapi.py support program, maintained and distributed by the Interbull Centre, allows easy access to the API for upload and download of 706 and 711 files associated with the GenoEx-GDE database and is provided as an easy way to get started with using the API. For those that can read the python code that it is written in, it also provides an additional source of detailed documentation of the API.
This manual describes each of the calls of the API along with the usage of the gxapi program. These descriptions are organized into four sections to focus on the main aspects of the API where the first section provides an overview of, and some general information about, the API and the last section focus on the gxapi.py program. The remaining sections focus on different usage of the API.
Section 1, overview and general information
The API is provided as an alternative way to access the functionality provided via the https://genoex.org/ web browser interface and is provided via POST calls on the same site. The operations have a basic structure where each call require arguments in JSON format split into parameters and auth, both of which are key/value mappings.
The auth part always contain keys username and pw where the respective values should be your registered email address and associated password.
The parameters part contain different keys depending on what call it is about, but always, at least, information on the version.
An example call via the curl program looks like (in one long command):
curl --data 'data={ "parameters": { "version": "220805" }, "auth": { "username": "username@company.com", "pw": "test" } }' https://genoex.org/genoex/api/gde_get_parameters
Where the username@company.com and test strings would need to be substituted with your registered email and associated password. Note that passwords containing special characters may be problematic and need to be encoded according to JSON rules for this to work. This represents the common basic structure of every call to the API although many calls would need additional parameters.
This is a suitable command to use for verifying that you have access and that communication is set up correctly, so please use this (or something similar) as first command when trying out this functionality.
The above example uses the linux command line syntax to embed strings inside a string. For windows, this would probably need to be written something like:
curl --data "data={ \"parameters\": { \"version\": \"220805\" }, \"auth\": { \"username\": \"username@company.com\", \"pw\": \"test\" } }" https://genoex.org/genoex/api/gde_get_parameters
The remaining examples will stick to the linux syntax as it is easier to read, so if you need to use this altered syntax for this example, then apply the corresponding modifications to each following example.
All calls may be using the HTTP construct multipart/form-data (i.e. the -F flag instead of --data in curl), but only the gde_submission call require that.
This is perhaps a suitable time to emphasize that the use of curl in this documentation is primarily meant to provide examples of the use of the API and to show the details of the syntax of the arguments. It is definitely not meant as a suggestion that this is a good way to implement a scripted workflow, in fact it is strongly recommended against that. The gxapi.py program is meant for such uses.
The data returned from each call is a JSON encoded data structure containing, at the minimum, keys "status" and "status_message". If status has value true then an additional key named return_values is provided (with some exceptions).
The details below are up-to-date with the 220805 version of the API (and gxapi.py).
The API is in central parts asynchronous, i.e. an operation is first initiated and then the user would need to periodically poll for the status of that operation until it terminates either successfully or with a failure. This mode of operation is needed to avoid the timeouts inherent in normal implementations of the HTTP protocol for long running operations.
The return values of most calls is a JSON data structure looking, at the top level, like:
{"return_values": { ... }, "status": true, "status_message": "some message string"}
Where the "..." would be a set of key/value pairs which will vary between different calls.
Whenever the value of "status" is false, then the value of "status_message" will report the error message. Furthermore, if the value of "status" is true, then the value of "return_values" should still be investigated for possible error messages before retrieving the real return values (keys "error" or "error_list" to be explicit). The value of "status_message" should be ignored when the value of "status" is true, but it may in some cases contain a comment.
The following two sections focus on the primary functionalities provided: upload and download of 706/711 files.
Section 2, upload of 706/711 files
This is a two steps operation: a submit call (once) and then intermittently (once per minute or so) polling status of that submission until a terminating state is reached.
Step 1: An example of a submit call via the curl program looks like (note that this call require multipart/form-data):
curl -s -F 'data={ "parameters": { "version": "220805" }, "auth": { "username": "username@company.com", "pw": "test" } }' \ -F "access_file=@system-data/711-access.zip;type=application/octet-stream" \ -F "dataset_file=@system-data/706-genotypes.zip;type=application/octet-stream" https://genoex.org/genoex/api/gde_submission
As in all these examples, the username@company.com and test strings would need to be substituted to your registered email address and associated password before running this.
In addition, the paths and filenames specified (i.e. the parts between @ and ; inside the JSON strings) need to be adapted to your own situation.
Note that the use of a single backslash at the end of the lines is just a way to visualize that the single command continues on the next line.
This example shows how to upload a 706 file and the associated 711 file in one go, but if only one of these file types are to be uploaded then simply remove the -F switch, and following associated JSON string, related to the file you are not going to upload.
The above submission call will return a JSON data structure containing, if successful, the job_id assigned to this submission:
{"return_values": {"access_file": "711-access.zip", "access_status_message": "Access file received", "dataset_file": "706-genotypes.zip", "dataset_status_message": "Dataset file received", "job_id": "9be6c0bf-de9f-4951-b9e1-27217ec1e0c4"}, "status": true, "status_message": "job started"}
Please remember that in all calls, if the key status has a false value, then the error message is found in status_message. Even if status is true, there may still be errors described inside the return_values data structure, but in the specific case of gde_submission there are currently no such cases.
Step 2: Polling for status, is accomplished via a call like:
curl -s --data 'data={ "parameters": { "version": "220805", "job_id": "9be6c0bf-de9f-4951-b9e1-27217ec1e0c4" }, \ "auth": { "username": "username@company.com", "pw": "test" } }' https://genoex.org/genoex/api/gde_job_status
The 9be6c0bf-de9f-4951-b9e1-27217ec1e0c4 string needs to be replaced with the value of the job_id key provided in the return data structure of the submit call above.
This last call is then intermittently repeated, with no change, until a job_status of "FINISHED" or "FAILED" is reached and returned in a JSON data structure:
{"return_values": {"error_list": [], "job_id": "9be6c0bf-de9f-4951-b9e1-27217ec1e0c4", "job_status": "FINISHED", "status_time": "2022-08-05 13:35:02"}, "status": true, "status_message": "GDE upload"}
Errors may be returned under the key error_list inside the return_values value (but there is no error key there in this call).
Note that in the gde_job_status call, the return_values structure may in the "FINISHED" case include an additional key test_results. The value of that key is, if present, a (potentially rather long, multiple line) string that should be made known to the user. This additional key will only be present if the job_id is referring to an upload operation.
Section 3, download of 706/711 files
Download operations are a bit different from upload as 711 files are downloaded in synchronous mode but 706 files are downloaded in asynchronous mode, similar to upload.
In addition, there is an optional preliminary step to retrieve all the available values to choose from when selecting the parameter values to provide in the download operation
(you may want to redirect the output to a file {see params.json in the command line of the example} to have the results handy - and refresh this file from time to time by repeating this operation):
curl -s --data 'data={ "parameters": { "version": "220805" }, "auth": { "username": "username@company.com", "pw": "test" } }' \ -o params.json https://genoex.org/genoex/api/gde_get_parameters
This is a synchronous operation and hence a single step is sufficient.
The return_values data structure in the reply will include keys: breeds, countries, orgs, gender, extraction_type and arrays, but no error message. The value of each key is a list of strings to choose from when specifying the corresponding parameter in calls below. This data roughly corresponds to the data shown in the download dialog of the web browser interface.
Download 706 files
This is a three steps operation: an extraction call (once) followed by intermittently (every 30 seconds or so) polling status of that extraction until a terminating state is reached and finally, if status of extraction is "FINISHED", downloading the resulting assembled zip file.
Step 1: The extraction call is where the specification for what data to download is provided.
The allowed values for different parts of the specification are:
- "breeds": list of breed codes for the data to download, empty list means all available breeds
- "countries": list of country codes for the data to download, empty list means all available countries
- "gender": either "BOTH", "F" or "M"
- "extraction_type": either "b" (best fit - i.e. highest call_rate) or "a" (all genotypes)
- "arrays": list of array aliases for the data to download, empty list means all available arrays
- "orgs": list of organization codes for the data to download, empty list means all organizations except IBC
- "date_start": genotypes uploaded since this day are eligible
- "date_end": genotypes uploaded until this day are eligible
- "quality_criteria": comma separated string with one or more of: "frequency", "pedigree" and/or "call_rate", empty string means all, null means ignored
Example call:
curl -s --data 'data={ "parameters": { "version": "220805", "breeds": ["BSW"], "countries": [], "gender": "BOTH", "extraction_type": "b", "arrays": [], \ "orgs": [], "date_start": "2020-12-01", "date_end": "2022-08-03", "quality_criteria": null }, \ "auth": { "username": "username@company.com", "pw": "test" } }' \ https://genoex.org/genoex/api/gde_extraction
Note that in this example, the values of keys "countries" and "arrays" are specified as empty lists. This means "all values included". The value for key "orgs" is also an empty list, but that means all orgs except IBC.
The value of "quality_criteria" is null, also meaning "anything goes" ignoring the results of the quality checks, i.e. all genotypes are considered for extraction.
This extraction call will return a JSON data structure containing, if successful, the job_id assigned to this submission:
{"return_values": {"job_id": "16ebfbd8-6f22-4fb0-b9c9-62580d3f65fe"}, "status": true, "status_message": "job started"}
If the extraction call fails, it may return an error message as a string value in the error key inside the return_values value.
Step 2: Intermittently poll for status, which is performed identical to how it is done for the upload except that the job_id is extracted from the reply of the extraction call.
See section 2 step 2 "polling for status", for how this step is accomplished.
Step 3: If the extraction was successful (i.e. polling ended with status "FINISHED"), this step is simply a call to download the zip file associated with the prepared extraction. Example:
curl -s --data 'data={ "parameters": { "version": "220805", "job_id": "16ebfbd8-6f22-4fb0-b9c9-62580d3f65fe"}, \ "auth": { "username": "username@company.com", "pw": "test" } }' \ -o downloaded-706.zip https://genoex.org/genoex/api/gde_download
Errors are a bit tricky to handle as they are returned as a JSON structure instead of the expected zip file, so would in the above curl call end up inside the downloaded-706.zip file. See the gxapi.py program for how this may be accomplished.
Download 711 files
This is a single step operation which is fully specified in a single curl call:
curl -s --data 'data={ "parameters": { "version": "220805", "breeds": ["BSW"], "countries": [], "gender": "BOTH", "arrays": [] }, \ "auth": { "username": "username@company.com", "pw": "test" } }' \ -o downloaded-711.zip https://genoex.org/genoex/api/gde_download_711
The parameters "breeds", "countries", "gender" and "arrays" are used precisely as for the download of file706 reported above.
The same challenge in handling errors as gde_download above apply also to gde_download_711 calls.
Section 4, using the gxapi.py program
The gxapi.py program is fetched from the web browser interface on the GDE -> UPLOAD page.
Note that the gxapi.py program requires a fairly recent version of python (3.7 or newer) with the requests module installed.
In the examples below, the gxapi.py program is assumed to be located in the current directory, but that is not a requirement.
If it is located in another directory, just precede gxapi.py in the examples with the path to where it is stored, i.e. use something like path-to-installdir/gxapi.py instead of gxapi.py in the examples.
An alternative (Linux only?) way to install it and execute it is to put the gxapi.py file in one of the directories in your execution path, see environment variable PATH, and enable the execution flag on it. In that case, the leading "python " can be removed from the examples below.
To get a quick overview of how to execute it, run it with the -h switch:
python gxapi.py -h
Upload of 706/711 files
To get a quick overview of how to execute upload, run it with the -h switch:
python gxapi.py upload -h
To upload a pair of 706/711 files in one go, simply run it like this:
python gxapi.py upload username@company.com test path-to-files/706-file.zip path-to-files/711-file.zip
To upload only one file, either a 706 or a 711 file, just omit the argument referencing the other file in the example above.
Download of 706/711 files
The optional preliminary step, calling gde_get_parameters, is performed via:
python gxapi.py fetch username@company.com test >params.json
(complete with redirecting the stdout to a file, params.json, to save output for later).
To get an overview of how to execute download, run it with the -h switch:
python gxapi.py download -h
Here, the switches are divided into groups "optional arguments" (used for both 706 and 711 files), "genotypes" (used for 706 files only) and "access" (used for 711 files only):
optional arguments:
-b BREEDS, --breeds BREEDS
which breeds to extract data for (comma delimited, omit for all)-c COUNTRIES, --countries COUNTRIES
which countries to extract data for (comma delimited, omit for all)-M, --male
extract male data only-F, --female
extract female data only-a ARRAYS, --arrays ARRAYS
which arrays to download data for (comma delimited, omit for all)
genotypes:
--all
download all matched data (default - best matched only)-o ORGS, --orgs ORGS
which organizations to extract data from (comma delimited, omit for all)-S START, --start START
which start date to extract data for (omit for all)-E END, --end END
which end date to extract data for (omit for today)-q QUALITY, --quality QUALITY
which successful tests to extract data for (omit for all)
access:
-A
download access data instead of genotypes
Note that the switch -A is used to select if downloading a 711 file (if present) or a 706 file (if omitted).
At the end of the help output, a couple of small explicit examples are shown, but here follows a couple more.
To, for example, download a 706 file containing the best genotypes of BSW bulls regardless of country, array, organization, date-of-upload or quality status execute:
python gxapi.py download -b BSW -M -q "" username@company.com test path-to-files/downloaded-706-file.zip
Adding switch --all would remove limitation to download only the best genotypes of each animal. To do the same download but only genotypes that pass all quality checks, omit -q "" from the above command. To further limit the data downloaded, add switches for breeds, countries, arrays, organizations and/or dates and either replace the empty string after -q with a suitable specification (e.g. pedigree,call_rate) or omit -q and associated string completely (which is the same as specifying -q frequency,pedigree,call_rate).
Example data volume limiting switch:
-a "GeneSeek Dairy Ultra LD v2 7049,Illumina Bovine3k BeadChip 2900"
Another example, downloading a 711 file for all animals available:
python gxapi.py download -A username@company.com test path-to-files/downloaded-711-file.zip