GEBVtest Software
The GEBV test is a validation procedure described in the Interbull Code of Practice, Appendix X.
Contents
This software consists of two programs (gtconvert.py and gebvtest.py) and utility module used by those programs (ibutils.py). The first program will process the sets of Cf/Df/Cr/Gr files for all trait groups for a single breed and population of evalution and create a single set of files in a trait independent format. The second program will perform the GEBV validation tests for all traits for one breed and population and then create a zip file with the input and output files, ready for submission to the Interbull Centre.
Note: In the future, organizations may prefer to prepare the data for the gebvtest.py program directly, bypassing the creation of the legacy file formats and the gtconvert.py program.
Installation and testing
The programs have been tested under Python 2.6, 2.7 and 3.2. As a minimum you will need to have these extra python modules installed on your system: NumPy and, just for Python 2.6, argparse.
Download the attached gebvtest20130205c.zip file.
Create a working directory and unzip the zip file in that directory. Two subdirectories will be created, programs and sample data. Typing
python gebvtest.py --help
from a command line prompt, from within the programs directory, should print a brief help message if the installation has been successful.
Some sample data for breed HOL and population ABC are available in the sample_data directory. The two programs can be run from the programs directory as follows:
python gtconvert.py -v hol abc ../sample_data python gebvtest.py -v -m hol abc ../sample_data
In this example data, parameters and output are all in the sample_data directory. Files can be read from other locations and output written to other locations as well. Please see the following sections for further information.
The outputs should match those in the source zip file.
Program gtconvert.py - User Manual
Information about the program
The program gtconvert.py converts the legacy file formats (fileCxxxf, fileCxxxr, fileDxxxf and fileGxxxr, for xxx in 010, 015-020, 115) into the new trait-independent vertical file formats that will be used for submitting EBVs to the IDEA DB in the near future. The program will find all the file{A}xxx{b} files in a specified DATADIR and convert them all, creating four files (file300Cf, file300Df, file300Cr and file300Gr) with separate bull proof records for all traits found in all the xxx files matching the specified breed of evaluation (BRD) and population/country code (POP). The program also converts the legacy parameter file into a trait info file specifically designed for the gebvtest program and creates a file of birth dates extracted from fileC010f.
All of the input files may contain data for more than one breed or population. The input files may have a SUFFIX, like ".usa" for example, but in this case all the files must have the same suffix.
Input files
fileCxxxf - national official genetic evaluations sent by the NGEC as input for the most recent Interbull MACE evaluation (formats: 010, 115, 015, 016, 017, 018, 019, 020)
fileDxxxf - daughter deviation file, including either DD or D_PGM for the same animals included in fileCxxxf (same formats as for fileCxxxf)
fileCxxxr - reduced conventional genetic evaluation file, obtained from conventional genetic evaluations using truncated data (same formats as for fileCxxxf)
fileGxxxr - reduced genomic evaluation file, obtained from genomic evaluations truncated data (same formats as for fileCxxxf)
parameter file - parameters used in most recent Interbull MACE evaluation - one file may contain all trait groups (Format: parameter file)
Running the Program
Usage notes
The program should be run from within the programs directory. Typing
python gtconvert.py --help
will give a summary of the program usage:
usage: gtconvert.py [-h] [-v] [-s SUFFIX] [-p PARFILE] [-d {DD,GM}] [-y YEAR] [-x {Y,N}] [-o OUTDIR] brd pop datadir positional arguments: brd evaluation breed code (BSW/GUE/JER/HOL/RDC/SIM) pop population code (same as country code except for CHR/DEA/DFS/FRR/FRM) datadir absolute or relative path to data files optional arguments: -h, --help show this help message and exit -v, --verbose increase output verbosity -s SUFFIX, --suffix SUFFIX suffix to add to all input file names, eg. ".usa" if file names are like fileC010f.usa (default=none) -p PARFILE, --parfile PARFILE path+name of input "parameter" file (default=DATADIR/parameterSUFFIX) -e ENCODING, --encoding ENCODING input file encoding (default=utf-8; try also iso-8859-1 or other values listed at http://docs.python.org/2/library/codecs.html#standard- encodings) -d {DD,GM}, --depvar {DD,GM} type of daughter performance on Df file (default=GM) -y YEAR, --year YEAR minimum birth year for test bulls (default is year of EVALDATE on parameter file less 8 years) -x {Y,N}, --type2x {Y,N} inclusion of type 21+22 bulls in test group (default=N) -o OUTDIR, --outdir OUTDIR directory for output files (default=DATADIR)
Note that the input parameter file may be in a different directory than the other files or have a different name or suffix, in which case the -p option must be specified.
The program adds defaults for several options to the trait info file it creates. This file may need to be edited manually or programmatically if different options are needed for some traits compared to other traits.
You may also choose to put the output files from this program into a different directory than the input files. In this case, the specified OUTDIR from this program should be used as the DATADIR for the gebvtest.py program.
Warning:
If the gtconvert.py program crashes with a UnicodeEncodeError, it means there are likely binary character codes, most likely in bull names, which do not fit the standard utf-8 encoding scheme. You can try specifying the option '-e iso-8859-1' or some other encoding listed at http://docs.python.org/2/library/codecs.html#standard-encodings. If that fails, you could try to set the name field to blank in all input files, since the bull name field on the 010 files are no longer used at the Interbull Centre. Also, make sure there is no binary data in any other field, due to uninitialized variables in some Fortran program for example.
Example of command line
python3.2 gtconvert.py hol abc /rawdata/abc/gebvtest1209/HOL/ -p /abc/parameter.abc -e 'iso-8859-1' -s .abc -o ../data/1302/ABCHOL
In this example
- python version 3.2 is used
- breed of evaluation is HOL
- population being evaluated is ABC
- data are read from /rawdata/abc/gebvtest1209/HOL/
- the parameter file is read from /abc/parameter.abc
- 'iso-8859-1’ is defined as the character format instead of the default format 'utf-8'
- the suffix .abc is added to the input files
- the outputs are written to ../data/1302/ABCHOL
Output files
The following files are written to the DATADIR or to OUTDIR, if specified. All files have a _POPBRD suffix, so that multiple sets of output files for different breeds or populations can co-exist in the same output directory, if desired.
traits - GEBV test options file(Format: traits)
file300Cf - national official genetic evaluations written in trait-independent format (Format: File300)
file300Df - daughter deviation file written in trait-independent format
file300Cr - reduced conventional genetic evaluation file written in trait-independent format
file300Gr - reduced genomic evaluation file written in trait-independent format
file736 - file with birth dates (Format: File736)
The execution log is written is written to stdout (i.e. the screen), so you should redirect output to a file if you would like to save it. An example gtconvert.log is available here.
Program gebvtest.py - User Manual
Information about the program
The program gebvtest.py performs the GEBV validation tests for one breed-population combination, for all traits. At the end of the program a zip file is created with the input files and the result file, ready for submission to the ITBC. The ITBC will perform some additional data checks and re-run the program to check the results. The result file is a new file735 format file, which is a modification and extension of the previous file731 format file.
Input files:
traits - GEBV test options file(Format: traits)
file300Cf - national official genetic evaluations written in trait-independent format (Format: File300)
file300Df - daughter deviation file written in trait-independent format
file300Cr - reduced conventional genetic evaluation file written in trait-independent format
file300Gr - reduced genomic evaluation file written in trait-independent format
file736 - file with birth dates (Format: File736)
Running the program
The program should be run from within the programs directory. Typing
python gebvtest.py --help
will give a summary of the program usage:
usage: gebvtest.py [-h] [-v] [-m] [-M MERGEDIR] [-Z] [-C] brd pop datadir positional arguments: brd evaluation breed code (BSW/GUE/JER/HOL/RDC/SIM) pop population code (same as country code except for CHR/DEA/DFS/FRR/FRM) datadir absolute or relative path to data files optional arguments: -h, --help show this help message and exit -v, --verbose increase output verbosity -m, --mergefiles write merged data files (for independent data checks) -M MERGEDIR, --mergedir MERGEDIR absolute or relative path for merged data files (default=DATADIR/merged) -Z, --no-zip do not create a zip file (eg. for preliminary testing or usage at ITBC) -C, --cleanup delete all files successfully added to the zip file
More detail on the -m --mergefiles options is available here.
Output files
file735 - results from the GEBV test for all traits tested (Format: File735) (Example)
gebvtest_log - summary of the calculations (Example)
Submission zip file - gebvtest.py generates a zip file including all input and output files which should be sent to the Interbull Centre as the official data submission for the GEBV test. The zip file will be named gtYYMM_POPBRD.zip, where YY and MM are year and month of test date, POP is the population code and BRD is the breed code.
GEBV test data submission
Interbull customers willing to participate in the GEBV test must send to the Interbull Centre the following files to interbull@slu.se:
Submission zip file - generated by the gebvtest.py program.
Form GENO - one form for each trait group validated.
Troubleshooting/FAQ
- Double check your data files and make sure the file formats are ok.
- In some cases special characters in bull names make the program crash. A hint is to leave out the bull names as they are not used anyway.
- All your files should contain a field of “country sending this information” and the code should be consistent for all files. Leaving a blank instead of a code for “country sending this information” has the effect that the file is not read.
- The record type (fist three positions of the file) must correspond to the data. So if your file is a longevity file the record type must be ‘017’ and not ‘717’ or ‘019’ or anything else.
- Make sure to use the -v flag and check the log files carefully (look for files with 0 records, for example)
- If python crashes with an error message:
- if any "import" statement causes an error, Python or one of the modules is not correctly installed
- if the gtconvert.py program crashes with an error message that says something about invalid 'utf8' characters, try specifying the '-e iso-8859-1' option (or some other encoding listed on the web page indicated in the --help message) - you may(!) be able to identify input data problems by looking at the program in the area of the line number printed by the error message
otherwise, please prepare a zip file with all the inputs and partial outputs (including screen output from gtconvert.py) and email it to interbull@slu.se , clearly showing the exact command line used to launch the program, and pasting in the error message too.
If bulls seem to be missing or in excess in the candidate or test groups, use -v -m options, but not -C, and check the link merged files.
If you need assistance, please do not hesitate to contact us at interbull@slu.se .