| Size: 8644 Comment:  | Size: 18060 Comment:  | 
| Deletions are marked like this. | Additions are marked like this. | 
| Line 6: | Line 6: | 
| See also part 1 in [[GDE_user_manual]] and part2 in [[GDE_gxprep_manual]]. This part assumes that those previous parts have already been read and understood. The ''gxapi.py'' support program, maintained and distributed by the Interbull Centre, allows easy access to the API for upload and download of 706 and 711 files associated with the !GenoEx-GDE database and is provided as an easy way to get started with using the API. For those that can read the python code that it is written in, it also provides an additional source of detailed documentation of the API. This manual describes each of the calls of the API along with the usage of the gxapi program. These descriptions are organized into four sections to focus on the main aspects of the API where the first section provides an overview of, and some general information about, the API and the last section focus on the gxapi.py program. The remaining sections focus on different usage of the API. | See also part 1 in [[https://interbull.org/ib/gde_user_manual|GDE_user_manual]] and part 2 in [[https://interbull.org/ib/gde_gxprep_manual|GDE_gxprep_manual]]. This part assumes that those previous parts have already been read and understood. The ''gxapi.py'' support program, maintained and distributed by the Interbull Centre, allows easy access to the API for upload and download of 706 and 711 files associated with the !GenoEx-GDE database and is provided as an easy way to get started with using the API. For those that can read the python code that it is written in, it also provides an additional source of detailed documentation of the API. This manual describes each of the calls of the API along with the usage of the gxapi program. These descriptions are organized into four sections to focus on the main aspects of the API where the first section provides an overview of, and some general information about, the API and the last section focus on the gxapi.py program. The remaining sections focus on different usage of the API. | 
| Line 19: | Line 13: | 
| The API is provided as an alternative way to access the functionality provided via the htt``ps://genoex.org/ web site interface and is provided via POST calls on the same site. The operations have a basic structure where each call require arguments in JSON format split into '''parameters''' and '''auth''', both of which are key/value mappings.<<BR>> The '''auth''' part always contain keys '''username''' and '''pw''' where the respective values should be your registered email address and associated password.<<BR>> The '''parameters''' part contain different keys depending on what call it is about, but always at least '''version'''.<<BR>> | The API is provided as an alternative way to access the functionality provided via the https://genoex.org/ web browser interface and is provided via POST calls on the same site. The operations have a basic structure where each call require arguments in JSON format split into '''parameters''' and '''auth''', both of which are key/value mappings.<<BR>> The '''auth''' part always contain keys '''username''' and '''pw''' where the respective values should be your registered email address and associated password.<<BR>> The '''parameters''' part contain different keys depending on what call it is about, but always, at least, information on the '''version'''.<<BR>> | 
| Line 24: | Line 16: | 
| . {{{ curl -F 'data={ "parameters": { "version": "220805" }, "auth": { "username": "username@company.com", "pw": "test" } }' https://genoex.org/genoex/api/gde_get_parameters }}} Naturally, the '''`username@company.com`''' and '''`test`''' strings need to be substituted with something more appropriate before running this. This shows the common basic structure of every call to the API, but most calls need additional parameters. The data returned is a JSON encoded data structure containing, at the minimum, keys '''status''' and '''status_message''' and if status has value True then an additional key '''return_values''' is provided. | . {{{ curl --data 'data={ "parameters": { "version": "220805" }, "auth": { "username": "username@company.com", "pw": "test" } }' https://genoex.org/genoex/api/gde_get_parameters }}} Where the '''`username@company.com`''' and '''`test`''' strings would need to be substituted with your registered email and associated password. This represents the common basic structure of every call to the API although most calls would need additional parameters.<<BR>> This is a suitable command to use for verifying that you have access and that communication is set up correctly, so please use this (or something similar) as first command when trying out this functionality. The above example uses the linux command line syntax to embed strings inside a string. For windows, this would probably need to be written something like: . {{{ curl --data "data={ \"parameters\": { \"version\": \"220805\" }, \"auth\": { \"username\": \"username@company.com\", \"pw\": \"test\" } }" https://genoex.org/genoex/api/gde_get_parameters }}} The remaining examples will stick to the linux syntax as it is easier to read, so if you need to use this altered syntax for this example, then apply the corresponding modifications to each following example. This is perhaps a suitable time to emphasize that the use of ''curl'' in this documentation is purely meant to provide examples of the use of the API and to show the details of the syntax of the arguments. It is definitely not meant as a suggestion that this is a good way to implement a scripted workflow, in fact there is a strong recommendation against that. The gxapi.py program is meant for such uses. The data returned from each call is a JSON encoded data structure containing, at the minimum, keys '''"status"''' and '''"status_message"'''. If status has value `true` then an additional key named '''return_values''' is provided (with some exceptions). | 
| Line 34: | Line 40: | 
| The API is in large parts asynchronous, i.e. where an operation is first initiated and then the user need to periodically poll for the status of that operation until it terminates either successfully or with a failure. This mode of operation is needed to avoid the timeouts inherent in normal implementations of the HTTP protocol for long running operations. The return values of all calls is a JSON data structure looking, at the top level, like: | The API is in central parts asynchronous, i.e. an operation is first initiated and then the user would need to periodically poll for the status of that operation until it terminates either successfully or with a failure. This mode of operation is needed to avoid the timeouts inherent in normal implementations of the HTTP protocol for long running operations. The return values of most calls is a JSON data structure looking, at the top level, like: | 
| Line 44: | Line 49: | 
| Naturally the "..." is really a set of key/value pairs, but these vary between the different calls.<<BR>> When the value of "status" is false, then the value of "status_message" is the error message. If the value of "status" is instead true, then the value of "return_values" should be investigated for possible error messages before retrieving the real return values. The following two sections focus on the primary functionality provided; upload of 706/711 files and then download of 706/711 files. | Where the "..." would be a set of key/value pairs which will vary between different calls.<<BR>> Whenever the value of '''"status"''' is `false`, then the value of '''"status_message"''' will report the error message. Furthermore, if the value of '''"status"''' is `true`, then the value of '''"return_values"''' should still be investigated for possible error messages before retrieving the real return values (keys '''"error"''' and/or '''"error_list"''' to be explicit). The value of '''"status_message"''' should be ignored when the value of '''"status"''' is `true`, but it may in some cases contain a comment. The following two sections focus on the primary functionalities provided: upload and download of 706/711 files. | 
| Line 51: | Line 57: | 
| This is a two step operation: a submit call (once) and then intermittently (once per minute or so) polling status of that submission until a terminating state is reached. An example submit call via the ''curl'' program looks like: . {{{ curl -s -F 'data={ "parameters": { "version": "220805" }, "auth": { "username": "username@company.com", "pw": "test" } }' \ -F 'access_file=@system-data/711-access.zip;ţype=application/octet-stream' \ -F 'dataset_file=@system-data/706-genotypes.zip;ţype=application/octet-stream' https://genoex.org/genoex/api/gde_submission }}} As always, the '''`username@company.com`''' and '''`test`''' strings need to be substituted with something more appropriate before running this.<<BR>> | This is a two steps operation: a submit call (once) and then intermittently (once per minute or so) polling status of that submission until a terminating state is reached. '''Step 1''': An example of a submit call via the ''curl'' program looks like: . {{{ curl -s --data 'data={ "parameters": { "version": "220805" }, "auth": { "username": "username@company.com", "pw": "test" } }' \ --data "access_file=@system-data/711-access.zip;type=application/octet-stream" \ --data "dataset_file=@system-data/706-genotypes.zip;type=application/octet-stream" https://genoex.org/genoex/api/gde_submission }}} As in all these examples, the '''`username@company.com`''' and '''`test`''' strings would need to be substituted to your registered email address and associated password before running this.<<BR>> | 
| Line 64: | Line 71: | 
| This example specifies the aggregate upload of both a 706 file and the associated 711 file in one go, but if only one of these file types are to be uploaded then simply omit the other files '''-F''' switch and associated JSON string. This submission call will return a JSON data structure containing, if successful, the '''job_id''' assigned to this submission: | This example shows how to upload a 706 file and the associated 711 file in one go, but if only one of these file types are to be uploaded then simply remove the '''--data''' switch, and following associated JSON string, related to the file you ''are not'' going to upload. The above submission call will return a JSON data structure containing, if successful, the '''job_id''' assigned to this submission: | 
| Line 77: | Line 84: | 
| Note that in all calls, if the key '''status''' has a False value, then the error message is found in '''status_message'''. Even if '''status''' is True, there may still be errors described inside the '''return_values''' data structure. The second step, polling for status, is accomplished via a call like: . {{{ curl -s -F 'data={ "parameters": { "version": "220805", "job_id": "9be6c0bf-de9f-4951-b9e1-27217ec1e0c4" }, \ "auth": { "username": "username@company.com", "pw": "test" } }' https://genoex.org/genoex/api/gde_job_status }}} The '''9be6c0bf-de9f-4951-b9e1-27217ec1e0c4''' string need to be replaced with the value of the '''job_id''' key provided in the return data structure of the submit call above. This call is then intermittently repeated, with no change, until either a job_status of ''FINISHED'' or ''FAILED'' is reached and returned in a JSON data structure: | Please remember that in all calls, if the key '''status''' has a `false` value, then the error message is found in '''status_message'''. Even if '''status''' is `true`, there may still be errors described inside the '''return_values''' data structure. '''Step 2''': <<Anchor(polling)>>Polling for status, is accomplished via a call like: . {{{ curl -s --data 'data={ "parameters": { "version": "220805", "job_id": "9be6c0bf-de9f-4951-b9e1-27217ec1e0c4" }, \ "auth": { "username": "username@company.com", "pw": "test" } }' https://genoex.org/genoex/api/gde_job_status }}} The '''9be6c0bf-de9f-4951-b9e1-27217ec1e0c4''' string needs to be replaced with the value of the '''job_id''' key provided in the return data structure of the submit call above. This last call is then intermittently repeated, with no change, until either a job_status of ''"FINISHED"'' or ''"FAILED"'' is reached and returned in a JSON data structure: | 
| Line 98: | Line 108: | 
| Note that in the ''gde_job_status'' case, the `return_values` structure may include an additional key `test_results`. The value of that key is, if present, a (potentially rather long) string that should be made known to the user. | |
| Line 100: | Line 112: | 
| Download operations are a bit different from upload as 711 files are downloaded in synchronous mode and 706 files are downloaded in asynchronous mode similar to upload. In addition, there is an optional preliminary step to retrieve the available values to choose from when selecting the parameter values to provide in the download operation<<BR>> (you may want to redirect the output to a file {see '''params.log''' in the command line} to have the results handy - and refresh this file from time to time by repeating this operation): . {{{ curl -s -F 'data={ "parameters": { "version": "220805" }, "auth": { "username": "username@company.com", "pw": "test" } }' \ -o params.log https://genoex.org/genoex/api/gde_get_parameters }}} | Download operations are a bit different from upload as 711 files are downloaded in synchronous mode and 706 files are downloaded in asynchronous mode, similar to upload. In addition, there is an optional preliminary step to visualize all the available values to choose from when selecting the parameter values to provide in the download operation<<BR>> (you may want to redirect the output to a file {see '''params.json''' in the command line of the example} to have the results handy - and refresh this file from time to time by repeating this operation): . {{{ curl -s --data 'data={ "parameters": { "version": "220805" }, "auth": { "username": "username@company.com", "pw": "test" } }' \ -o params.json https://genoex.org/genoex/api/gde_get_parameters }}} | 
| Line 110: | Line 124: | 
| The '''return_values''' data structure in the reply will include keys: '''breeds''', '''countries''', '''orgs''', '''gender''' and '''arrays'''. The value of each key is a list of strings to choose from when specifying the corresponding parameter in calls below. | The '''return_values''' data structure in the reply will include keys: '''breeds''', '''countries''', '''orgs''', '''gender''' and '''arrays'''. The value of each key is a list of strings to choose from when specifying the corresponding parameter in calls below. This data roughly corresponds to the data shown in the download dialog of the web browser interface. | 
| Line 114: | Line 127: | 
| This is a three step operation: an extraction call (once) followed by intermittently (every 15 seconds or so) polling status of that extraction until a terminating state is reached and finally fetching the resulting assembled zip file. | This is a three steps operation: an extraction call (once) followed by intermittently (every 30 seconds or so) polling status of that extraction until a terminating state is reached and finally, if status of extraction is "FINISHED", downloading the resulting assembled zip file. '''Step 1''': The extraction call is where the specification for what data to download is provided.<<BR>> The allowed values for different parts of the specification are: * "breeds": list breed codes for the data to download, empty list means all available breeds * "countries": list country codes for the data to download, empty list means all available countries * "gender": either "BOTH", "F" or "M" * "extraction_type": either "b" (best fit - i.e. highest call_rate) or "a" (all genotypes) * "arrays": list array aliases for the data to download, empty list means all available arrays * "orgs": list organization codes for the data to download, empty list means all available organizations * "date_start": genotypes uploaded since this day are eligible * "date_end": genotypes uploaded until this day are eligible * "quality_criteria": comma separated string with one or more of: "frequency", "pedigree" and/or "call_rate", empty string means all, null means ignored Example call: . {{{ curl -s --data 'data={ "parameters": { "version": "220805", "breeds": ["BSW"], "countries": [], "gender": "BOTH", "extraction_type": "b", "arrays": [], \ "orgs": [], "date_start": "2020-12-01", "date_end": "2022-08-03", "quality_criteria": null }, \ "auth": { "username": "username@company.com", "pw": "test" } }' \ https://genoex.org/genoex/api/gde_extraction }}} Note that in this example, the values of keys "countries", "arrays" and "orgs" are specified as empty lists. This means "all values included".<<BR>> The value of "quality_criteria" is null, also meaning "anything goes" ignoring the results of the quality checks, i.e. all genotypes are considered for extraction. This extraction call will return a JSON data structure containing, if successful, the '''job_id''' assigned to this submission: . {{{ {"return_values": {"job_id": "16ebfbd8-6f22-4fb0-b9c9-62580d3f65fe"}, "status": true, "status_message": "job started"} }}} '''Step 2''': Intermittently poll for status, which is performed identical to how it is done for the upload except that the job_id is extracted from the reply of the extraction call.<<BR>> See [[#polling|section 2 step 2]] "polling for status", for an example. '''Step 3''': If the extraction was successful (i.e. polling ended with status "FINISHED"), simply a call to download the zip file associated with the prepared extraction. Example: . {{{ curl -s --data 'data={ "parameters": { "version": "220805", "job_id": "16ebfbd8-6f22-4fb0-b9c9-62580d3f65fe"}, \ "auth": { "username": "username@company.com", "pw": "test" } }' \ -o downloaded-706.zip https://genoex.org/genoex/api/gde_download }}} | 
| Line 119: | Line 173: | 
| This is a single step operation which could be specified in a single curl call: . {{{ curl -s -F 'data={ "parameters": { "version": "220805", "breeds": ["BSW"], "countries": null, "gender": "BOTH", "arrays": null }, \ "auth": { "username": "username@company.com", "pw": "test" } }' \ -o saved-out-711.zip https://genoex.org/genoex/api/gde_download_711 }}} | This is a single step operation which is fully specified in a single curl call: . {{{ curl -s --data 'data={ "parameters": { "version": "220805", "breeds": ["BSW"], "countries": [], "gender": "BOTH", "arrays": [] }, \ "auth": { "username": "username@company.com", "pw": "test" } }' \ -o downloaded-711.zip https://genoex.org/genoex/api/gde_download_711 }}} The parameters "breeds", "countries", "gender" and "arrays" are used precisely as for the download of file706 reported above. | 
| Line 127: | Line 184: | 
| The gxapi.py program is fetched from the web browser interface on the `GDE -> UPLOAD` page.<<BR>> Note that the gxapi.py program requires a fairly recent version of python (3.7 or newer) with the requests module installed. In the examples below, the gxapi.py program is assumed to be located in the current directory, but that is not a requirement.<<BR>> If it is located in another directory, just precede `gxapi.py` in the examples with the path to where it is stored, i.e. use something like `path-to-installdir/gxapi.py` instead of `gxapi.py` in the examples. An alternative (Linux only) way to install it and execute it is to put the gxapi.py file in one of the directories in your execution path, `PATH`, and enable the execution flag on it. In that case, the leading "python " can be removed from the examples. To get a quick overview of how to execute it, run it with the `-h` switch: . {{{ python gxapi.py -h }}} === Upload of 706/711 files === To get a quick overview of how to execute `upload`, run it with the `-h` switch: . {{{ python gxapi.py upload -h }}} To upload a pair of 706/711 files in one go, simply run it like this: . {{{ python gxapi.py upload username@company.com test path-to-files/706-file.zip path-to-files/711-file.zip }}} To upload only one file, either a 706 or a 711 file, just omit the argument for the other file in the example above. === Download of 706/711 files === The optional preliminary step, calling `gde_get_parameters`, is performed via: . {{{ python gxapi.py fetch username@company.com test >params.log }}} (complete with redirecting the stdout to a file, `params.log`, to save output for later). To get an overview of how to execute `download`, run it with the `-h` switch: . {{{ python gxapi.py download -h }}} Here, the switches are divided into groups "optional arguments" (used for both 706 and 711 files), "genotypes" (used for 706 files only) and "access" (used for 711 files only): optional arguments: * -b BREEDS, --breeds BREEDS<<BR>>which breeds to extract data for (comma delimited, omit for all) * -c COUNTRIES, --countries COUNTRIES<<BR>>which countries to extract data for (comma delimited, omit for all) * -M, --male<<BR>>extract male data only * -F, --female<<BR>>extract female data only * -a ARRAYS, --arrays ARRAYS<<BR>>which arrays to download data for (comma delimited, omit for all) genotypes: * --all<<BR>>download all matched data (default - best matched only) * -o ORGS, --orgs ORGS<<BR>>which organizations to extract data from (comma delimited, omit for all) * -S START, --start START<<BR>>which start date to extract data for (omit for all) * -E END, --end END<<BR>>which end date to extract data for (omit for today) * -q QUALITY, --quality QUALITY<<BR>>which successful tests to extract data for (omit for all) access: * -A<<BR>>download access data instead of genotypes Note that the switch `-A` is used to select if downloading a 711 file (if present) or a 706 file (if omitted).<<BR>> At the end of the help output, a couple of small explicit examples are shown, but here follows a couple more. To, for example, download a 706 file containing the best genotypes of BSW bulls regardless of country, array, organization, date-of-upload or quality status execute: . {{{ python gxapi.py download -b BSW -M -q "" username@company.com test path-to-files/downloaded-706-file.zip }}} Adding switch `--all` would remove limitation to download only the best genotypes of each animal. To do the same download but only genotypes that pass all quality checks, omit `-q ""` from the above command. To further limit the data downloaded, add switches for breeds, countries, arrays, organizations and/or dates and either replace the empty string after `-q` with a suitable specification (e.g. `pedigree,call_rate`) or omit `-q` and associated string completely (which is the same as specifying `-q frequency,pedigree,call_rate`). Example data volume limiting switch: . {{{ -a "GeneSeek Dairy Ultra LD v2 7049,Illumina Bovine3k BeadChip 2900" }}} Another example, downloading a 711 file for all animals available: . {{{ python gxapi.py download -A username@company.com test path-to-files/downloaded-711-file.zip }}} | 
 
 
GenoEx-GDE User’s manual v.1.2
part 3 - API Manual (including the use of the gxapi.py program).
See also part 1 in GDE_user_manual and part 2 in GDE_gxprep_manual. This part assumes that those previous parts have already been read and understood.
The gxapi.py support program, maintained and distributed by the Interbull Centre, allows easy access to the API for upload and download of 706 and 711 files associated with the GenoEx-GDE database and is provided as an easy way to get started with using the API. For those that can read the python code that it is written in, it also provides an additional source of detailed documentation of the API.
This manual describes each of the calls of the API along with the usage of the gxapi program. These descriptions are organized into four sections to focus on the main aspects of the API where the first section provides an overview of, and some general information about, the API and the last section focus on the gxapi.py program. The remaining sections focus on different usage of the API.
Section 1, overview and general information
The API is provided as an alternative way to access the functionality provided via the https://genoex.org/ web browser interface and is provided via POST calls on the same site. The operations have a basic structure where each call require arguments in JSON format split into parameters and auth, both of which are key/value mappings.
 The auth part always contain keys username and pw where the respective values should be your registered email address and associated password.
 The parameters part contain different keys depending on what call it is about, but always, at least, information on the version.
 An example call via the curl program looks like (in one long command): 
- curl --data 'data={ "parameters": { "version": "220805" }, "auth": { "username": "username@company.com", "pw": "test" } }' https://genoex.org/genoex/api/gde_get_parameters
Where the username@company.com and test strings would need to be substituted with your registered email and associated password. This represents the common basic structure of every call to the API although most calls would need additional parameters.
 This is a suitable command to use for verifying that you have access and that communication is set up correctly, so please use this (or something similar) as first command when trying out this functionality. 
The above example uses the linux command line syntax to embed strings inside a string. For windows, this would probably need to be written something like:
- curl --data "data={ \"parameters\": { \"version\": \"220805\" }, \"auth\": { \"username\": \"username@company.com\", \"pw\": \"test\" } }" https://genoex.org/genoex/api/gde_get_parameters
The remaining examples will stick to the linux syntax as it is easier to read, so if you need to use this altered syntax for this example, then apply the corresponding modifications to each following example.
This is perhaps a suitable time to emphasize that the use of curl in this documentation is purely meant to provide examples of the use of the API and to show the details of the syntax of the arguments. It is definitely not meant as a suggestion that this is a good way to implement a scripted workflow, in fact there is a strong recommendation against that. The gxapi.py program is meant for such uses.
The data returned from each call is a JSON encoded data structure containing, at the minimum, keys "status" and "status_message". If status has value true then an additional key named return_values is provided (with some exceptions).
The details below are up-to-date with the 220805 version of the API (and gxapi.py).
The API is in central parts asynchronous, i.e. an operation is first initiated and then the user would need to periodically poll for the status of that operation until it terminates either successfully or with a failure. This mode of operation is needed to avoid the timeouts inherent in normal implementations of the HTTP protocol for long running operations.
The return values of most calls is a JSON data structure looking, at the top level, like:
- {"return_values": { ... }, "status": true, "status_message": "some message string"}
Where the "..." would be a set of key/value pairs which will vary between different calls.
 Whenever the value of "status" is false, then the value of "status_message" will report the error message. Furthermore, if the value of "status" is true, then the value of "return_values" should still be investigated for possible error messages before retrieving the real return values (keys "error" and/or "error_list" to be explicit). The value of "status_message" should be ignored when the value of "status" is true, but it may in some cases contain a comment. 
The following two sections focus on the primary functionalities provided: upload and download of 706/711 files.
Section 2, upload of 706/711 files
This is a two steps operation: a submit call (once) and then intermittently (once per minute or so) polling status of that submission until a terminating state is reached.
Step 1: An example of a submit call via the curl program looks like:
- curl -s --data 'data={ "parameters": { "version": "220805" }, "auth": { "username": "username@company.com", "pw": "test" } }' \ --data "access_file=@system-data/711-access.zip;type=application/octet-stream" \ --data "dataset_file=@system-data/706-genotypes.zip;type=application/octet-stream" https://genoex.org/genoex/api/gde_submission
As in all these examples, the username@company.com and test strings would need to be substituted to your registered email address and associated password before running this.
 In addition, the paths and filenames specified (i.e. the parts between @ and ; inside the JSON strings) need to be adapted to your own situation.
 Note that the use of a single backslash at the end of the lines is just a way to visualize that the single command continues on the next line. 
This example shows how to upload a 706 file and the associated 711 file in one go, but if only one of these file types are to be uploaded then simply remove the --data switch, and following associated JSON string, related to the file you are not going to upload.
The above submission call will return a JSON data structure containing, if successful, the job_id assigned to this submission:
- {"return_values": {"access_file": "711-access.zip", "access_status_message": "Access file received", "dataset_file": "706-genotypes.zip", "dataset_status_message": "Dataset file received", "job_id": "9be6c0bf-de9f-4951-b9e1-27217ec1e0c4"}, "status": true, "status_message": "job started"}
Please remember that in all calls, if the key status has a false value, then the error message is found in status_message. Even if status is true, there may still be errors described inside the return_values data structure.
Step 2: Polling for status, is accomplished via a call like:
- curl -s --data 'data={ "parameters": { "version": "220805", "job_id": "9be6c0bf-de9f-4951-b9e1-27217ec1e0c4" }, \ "auth": { "username": "username@company.com", "pw": "test" } }' https://genoex.org/genoex/api/gde_job_status
The 9be6c0bf-de9f-4951-b9e1-27217ec1e0c4 string needs to be replaced with the value of the job_id key provided in the return data structure of the submit call above.
This last call is then intermittently repeated, with no change, until either a job_status of "FINISHED" or "FAILED" is reached and returned in a JSON data structure:
- {"return_values": {"error": "", "error_list": [], "job_id": "9be6c0bf-de9f-4951-b9e1-27217ec1e0c4", "job_status": "FINISHED", "status_time": "2022-08-05 13:35:02"}, "status": true, "status_message": "GDE upload"}
Note that in the gde_job_status case, the return_values structure may include an additional key test_results. The value of that key is, if present, a (potentially rather long) string that should be made known to the user.
Section 3, download of 706/711 files
Download operations are a bit different from upload as 711 files are downloaded in synchronous mode and 706 files are downloaded in asynchronous mode, similar to upload.
In addition, there is an optional preliminary step to visualize all the available values to choose from when selecting the parameter values to provide in the download operation
 (you may want to redirect the output to a file {see params.json in the command line of the example} to have the results handy - and refresh this file from time to time by repeating this operation): 
- curl -s --data 'data={ "parameters": { "version": "220805" }, "auth": { "username": "username@company.com", "pw": "test" } }' \ -o params.json https://genoex.org/genoex/api/gde_get_parameters
This is a synchronous operation and hence a single step is sufficient.
The return_values data structure in the reply will include keys: breeds, countries, orgs, gender and arrays. The value of each key is a list of strings to choose from when specifying the corresponding parameter in calls below. This data roughly corresponds to the data shown in the download dialog of the web browser interface.
Download 706 files
This is a three steps operation: an extraction call (once) followed by intermittently (every 30 seconds or so) polling status of that extraction until a terminating state is reached and finally, if status of extraction is "FINISHED", downloading the resulting assembled zip file.
Step 1: The extraction call is where the specification for what data to download is provided.
 The allowed values for different parts of the specification are: 
- "breeds": list breed codes for the data to download, empty list means all available breeds
- "countries": list country codes for the data to download, empty list means all available countries
- "gender": either "BOTH", "F" or "M"
- "extraction_type": either "b" (best fit - i.e. highest call_rate) or "a" (all genotypes)
- "arrays": list array aliases for the data to download, empty list means all available arrays
- "orgs": list organization codes for the data to download, empty list means all available organizations
- "date_start": genotypes uploaded since this day are eligible
- "date_end": genotypes uploaded until this day are eligible
- "quality_criteria": comma separated string with one or more of: "frequency", "pedigree" and/or "call_rate", empty string means all, null means ignored
Example call:
- curl -s --data 'data={ "parameters": { "version": "220805", "breeds": ["BSW"], "countries": [], "gender": "BOTH", "extraction_type": "b", "arrays": [], \ "orgs": [], "date_start": "2020-12-01", "date_end": "2022-08-03", "quality_criteria": null }, \ "auth": { "username": "username@company.com", "pw": "test" } }' \ https://genoex.org/genoex/api/gde_extraction
Note that in this example, the values of keys "countries", "arrays" and "orgs" are specified as empty lists.  This means "all values included".
 The value of "quality_criteria" is null, also meaning "anything goes" ignoring the results of the quality checks, i.e. all genotypes are considered for extraction. 
This extraction call will return a JSON data structure containing, if successful, the job_id assigned to this submission:
- {"return_values": {"job_id": "16ebfbd8-6f22-4fb0-b9c9-62580d3f65fe"}, "status": true, "status_message": "job started"}
Step 2: Intermittently poll for status, which is performed identical to how it is done for the upload except that the job_id is extracted from the reply of the extraction call.
 See section 2 step 2 "polling for status", for an example. 
Step 3: If the extraction was successful (i.e. polling ended with status "FINISHED"), simply a call to download the zip file associated with the prepared extraction. Example:
- curl -s --data 'data={ "parameters": { "version": "220805", "job_id": "16ebfbd8-6f22-4fb0-b9c9-62580d3f65fe"}, \ "auth": { "username": "username@company.com", "pw": "test" } }' \ -o downloaded-706.zip https://genoex.org/genoex/api/gde_download
Download 711 files
This is a single step operation which is fully specified in a single curl call:
- curl -s --data 'data={ "parameters": { "version": "220805", "breeds": ["BSW"], "countries": [], "gender": "BOTH", "arrays": [] }, \ "auth": { "username": "username@company.com", "pw": "test" } }' \ -o downloaded-711.zip https://genoex.org/genoex/api/gde_download_711
The parameters "breeds", "countries", "gender" and "arrays" are used precisely as for the download of file706 reported above.
Section 4, using the gxapi.py program
The gxapi.py program is fetched from the web browser interface on the GDE -> UPLOAD page.
 Note that the gxapi.py program requires a fairly recent version of python (3.7 or newer) with the requests module installed. 
In the examples below, the gxapi.py program is assumed to be located in the current directory, but that is not a requirement.
 If it is located in another directory, just precede gxapi.py in the examples with the path to where it is stored, i.e. use something like path-to-installdir/gxapi.py instead of gxapi.py in the examples. 
An alternative (Linux only) way to install it and execute it is to put the gxapi.py file in one of the directories in your execution path, PATH, and enable the execution flag on it. In that case, the leading "python " can be removed from the examples.
To get a quick overview of how to execute it, run it with the -h switch:
- python gxapi.py -h 
Upload of 706/711 files
To get a quick overview of how to execute upload, run it with the -h switch:
- python gxapi.py upload -h 
To upload a pair of 706/711 files in one go, simply run it like this:
- python gxapi.py upload username@company.com test path-to-files/706-file.zip path-to-files/711-file.zip 
To upload only one file, either a 706 or a 711 file, just omit the argument for the other file in the example above.
Download of 706/711 files
The optional preliminary step, calling gde_get_parameters, is performed via:
- python gxapi.py fetch username@company.com test >params.log 
(complete with redirecting the stdout to a file, params.log, to save output for later).
To get an overview of how to execute download, run it with the -h switch:
- python gxapi.py download -h 
Here, the switches are divided into groups "optional arguments" (used for both 706 and 711 files), "genotypes" (used for 706 files only) and "access" (used for 711 files only):
optional arguments:
- -b BREEDS, --breeds BREEDS 
 which breeds to extract data for (comma delimited, omit for all)
- -c COUNTRIES, --countries COUNTRIES 
 which countries to extract data for (comma delimited, omit for all)
- -M, --male 
 extract male data only
- -F, --female 
 extract female data only
- -a ARRAYS, --arrays ARRAYS 
 which arrays to download data for (comma delimited, omit for all)
genotypes:
- --all 
 download all matched data (default - best matched only)
- -o ORGS, --orgs ORGS 
 which organizations to extract data from (comma delimited, omit for all)
- -S START, --start START 
 which start date to extract data for (omit for all)
- -E END, --end END 
 which end date to extract data for (omit for today)
- -q QUALITY, --quality QUALITY 
 which successful tests to extract data for (omit for all)
access:
- -A 
 download access data instead of genotypes
Note that the switch -A is used to select if downloading a 711 file (if present) or a 706 file (if omitted).
 At the end of the help output, a couple of small explicit examples are shown, but here follows a couple more. 
To, for example, download a 706 file containing the best genotypes of BSW bulls regardless of country, array, organization, date-of-upload or quality status execute:
- python gxapi.py download -b BSW -M -q "" username@company.com test path-to-files/downloaded-706-file.zip 
Adding switch --all would remove limitation to download only the best genotypes of each animal. To do the same download but only genotypes that pass all quality checks, omit -q "" from the above command. To further limit the data downloaded, add switches for breeds, countries, arrays, organizations and/or dates and either replace the empty string after -q with a suitable specification (e.g. pedigree,call_rate) or omit -q and associated string completely (which is the same as specifying -q frequency,pedigree,call_rate).
Example data volume limiting switch:
- -a "GeneSeek Dairy Ultra LD v2 7049,Illumina Bovine3k BeadChip 2900" 
Another example, downloading a 711 file for all animals available:
- python gxapi.py download -A username@company.com test path-to-files/downloaded-711-file.zip 
