TrendTest Software

The trend validation procedures are described in the Interbull Code of Practice, https://wiki.interbull.org/public/CoPAppendixIII?action=print.

This software consists of two programs to convert legacy file formats to new formats (ttconvert1/3.py), three programs to perform trend validation by methods 1 - 3 (trendtest1-3.py), a program to combine the results across methods and prepare a zip file ready for submission to the Interbull Center (ttzip.py), and utility module used by those programs (ibutils.py). The conversion programs will process sets of legacy files (file01x and file04x) for all trait groups for a single breed and population of evalution and create a single set of files in a trait independent format. The remaining programs will perform the trend validation tests for all traits for one breed and population and then create a zip file with the input and output files, ready for submission to the Interbull Centre.

Note: In the future, organizations may prefer to prepare the data for the trendtest1-3.py programs directly, bypassing the creation of the legacy file formats and the use of the ttconvert1/3.py programs.


Installation and testing

The programs have been tested under Python 2.6, 2.7, 3.2 and 3.3. As a minimum you will need to have these extra python modules installed on your system: NumPy and, just for Python 2.6, argparse.

Download the attached trendtest20131017.zip file.

Create a working directory and unzip the zip file in that directory. Two subdirectories will be created, programs and sample_data. Typing, for example,

from a command line prompt, from within the programs directory, should print a brief help message if the installation has been successful.

Some sample data for breed HOL and population ABC are available in the sample_data directory. The two programs for method 1 can be run from the programs directory as follows:

python ttconvert1.py -v -s'.abc' hol abc ../sample_data
python trendtest1.py -v -m hol abc ../sample_data

In this example data, parameters and output are all in the sample_data directory. Files can be read from other locations and output written to other locations as well. Please see the following sections for further information.

The outputs should match those in the source zip file.

Detailed descriptions of the single programs are given in the following sections.


Control File

Shortly after the beginning of each test run Interbull Centre will send a control file, called file305_POPBRD, to every organization that will have to provide validation results for a given population and trait, either because they are testing significant changes in their model or they are participating for the first time or because their last validation was conducted more than two years ago. The format of the file is available in APPENDIX I, and an example is presented below:

#grp trt evaldate  herit     siresd gm x mh md byr1 miny maxy corr preval chg
uder scs 20130630 0.2240    0.38579 B- N 10 20 1981 1999 2003 0.99 09-may N
work msp 20130719 0.0890   26.23801 B+ N 10 20 1981 1999 2003 0.99 ------ Y

Usage notes

The siresd contained in the file is the MACE sire standard deviation as calculated in the current test run evaluation.
The fields preval and chg together give information of why validation is required:

Validation Method I

Definition: Comparison of genetic trends estimated using only first lactation versus all lactations in the routine national genetic evaluations.

Validation method I is taken care by the program trendtest1.py. The program reads in three files: File305_POPBRD (a control file sent by ITBC, see above) File300_POPBRD (alias file01x, see APPENDIX IIa) and file300FL_POPBRD (a new file following the same format as file300 but pertaining to first lactation only, see APPENDIX IIa). In order to make the file format transition as smooth as possible, the program ttconvert1.py will take care of converting the legacy files format 01x into the new file format 300_POPBRD.

TTCONVERT1.PY

The program is located in the programs directory. Typing

will give you a small summary of the program usage:

usage: ttconvert1.py [-h] [-v] [-s SUFFIX] [-e ENCODING] [-o OUTDIR]
                     brd pop datadir

positional arguments:
  brd                   evaluation breed code (BSW/GUE/JER/HOL/RDC/SIM)
  pop                   population code (same as country code except for
                        CHR/DEA/DFS/FRR/FRM)
  datadir               absolute or relative path to data files

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         increase output verbosity
  -s SUFFIX, --suffix SUFFIX
                        suffix to add to all input file names, eg. ".usa" if
                        file names are like file010.usa (default=none)
  -e ENCODING, --encoding ENCODING
                        input file encoding (default=utf-8; try also
                        iso-8859-1 or other values listed at
                        http://docs.python.org/2/library/codecs.html#standard-
                        encodings)
  -o OUTDIR, --outdir OUTDIR
                        directory for output files (default=DATADIR)

Warning

How to run the program

Go to the programs directory and type:

In this example

TRENDTEST1.PY

Typing

within the programs directory will give you a small summary of the program usage:

usage: trendtest1.py [-h] [-v] [-c CONTROLFILE] [-m] [-M MERGEDIR]
                     brd pop datadir
positional arguments:
  brd                   evaluation breed code (BSW/GUE/JER/HOL/RDC/SIM)
  pop                   population code (same as country code except for
                        CHR/DEA/DFS/FRR/FRM)
  datadir               absolute or relative path to data files
optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         increase output verbosity
  -c CONTROLFILE, --controlfile CONTROLFILE
                        path/name of the control file
                        (default=DATADIR/file305_POPBRD)
  -m, --mergefiles      write merged data files (for independent data checks)
  -M MERGEDIR, --mergedir MERGEDIR
                        absolute or relative path for merged data files
                        (default=DATADIR/merged1)

Trendtest1 - How to run the program

Go to the programs directory and type:

In this example

Output files

The following files are wriiten to the DATADIR or OUTDIR, if specified. All files have a _POPBRD suffix, so that multiple sets of output files for different breeds or population can co-exist in the same directory.

Trendtest1 - Editings

The program will read the three input files, file305_POPBRD, file300_POPBRD and file300FL_POPBRD, and apply some editings on the data such as only the following bulls will be selected for the test:

A merged file is created, called trt.csv (mil.csv for example), and placed under the DATADIR/merged1 directory, if not otherwise specified. The file can be used if you would like to do further investigation, the format of the file is available in APPENDIX IIIa.

Trendtest1 - Statistical test

The statistical test for method I would be calculated as:

The criteria for passing the test will then be equal to:

Trendtest1 - Log and Result File

A logfile is created, called tt1_POPBRD.log, and placed under DATADIR, if not otherwise specified. The file presents a summary of the information taken in consideration for all the traits analysed, such as

A result file, called file311_POPBRD, will be created in the DATADIR, if not otherwise specified. The file contains an overall summary of the traits analysed, the settings used and the final outcome of the validation, an example is presented below:

rec brd pop tgrp trt testdate pass testval      SDg bv    b_ALL    b_1ST bulls   stdALL   std1ST x byr1 mh md warnings
311 HOL ABC prod mil 20131024 FAIL   0.021  434.925 BV   48.409   39.246  5569  490.928  539.577 N 1986 10 20 LACT1_SCALE_WARNING

In this example:

Validation Method II

Definition: Analysis of within bull yearly Daughter Deviations (e.g. Daughter Yield Deviations, DYD), hereafter referred to as DD

Validation method II is taken care by the program trendtest2.py. The program reads in three files: File305_POPBRD (a control file sent by ITBC, see above section Control File), File300_POPBRD (alias file01x, see APPENDIX IIa) and file302_POPBRD (a new file format for submission of DD records, see APPENDIX IIb).

TRENDTEST2.PY

Typing

within the programs directory will give you a small summary of the program usage:

usage: trendtest2.py [-h] [-v] [-c CONTROLFILE] [-m] [-M MERGEDIR]
                     brd pop datadir

positional arguments:
  brd                   evaluation breed code (BSW/GUE/JER/HOL/RDC/SIM)
  pop                   population code (same as country code except for
                        CHR/DEA/DFS/FRR/FRM)
  datadir               absolute or relative path to data files

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         increase output verbosity
  -c CONTROLFILE, --controlfile CONTROLFILE
                        path/name of the control file
                        (default=DATADIR/file305_POPBRD)
  -m, --mergefiles      write merged data files (for independent data checks)
  -M MERGEDIR, --mergedir MERGEDIR
                        absolute or relative path for merged data files
                        (default=DATADIR/merged2)

Trendtest2 - How to run the program

Go to the programs directory and type:

In this example

Output files

The following files are written to the DATADIR or OUTDIR, if specified. All files have a _POPBRD suffix, so that multiple sets of output files for different breeds or population can co-exist in the same directory.

Trendtest2 - Editings

The program will read the three input files, file305_POPBRD, file300_POPBRD and file302_POPBRD, and apply some editings on the data such as only the following bulls will be selected for the test:

A merged file is created, called trt.csv (mil.csv for example), and placed under the DATADIR/merged2 directory, if not otherwise specified. The file can be used if you would like to do further investigation, the format of the file is available in APPENDIX IIIb.

Trendtest2 - Statistical test

The statistical test for method II would be calculated as:

The criteria for passing the test will then be equal to:

Trendtest2 - Log and Result File

A logfile is created, called tt2_POPBRD.log, and placed under DATADIR, if not otherwise specified. The file presents a summary of the information taken in consideration for all the traits analysed, such as

A result file, called file312_POPBRD, will be created in the DATADIR, if not otherwise specified. The file contains an overall summary of the traits analysed, the settings used and the final outcome of the validation, an example is presented below:

rec brd pop tgrp trt testdate pass testval       b      SDg bv bulls   std_DD x byr1 mh md warnings
312 HOL ABC prod fat 20131017 PASS   0.009   0.185   21.496 BV   153   19.186 N 1986 10 20 none

In this example:

Validation Method III

Definition: Analysis of official national predicted genetic merit variation across evaluation runs.

Validation method III is taken care by the program trendtest3.py. The program reads in three files: File305_POPBRD (a control file sent by ITBC, see above section Control File), File300_POPBRD (alias file01x, see APPENDIX IIa) and file303_POPBRD (alias file04x, see APPENDIX IIc). In order to make the file format transition as smooth as possible, the program ttconvert3.py will take care to convert the legacy files format 01x and file04x into the new file formats 300_POPBRD and 303_POPBRD.

TTCONVERT3.PY

The program is located in the programs directory. Typing

will give you a small summary of the program usage:

usage: ttconvert3.py [-h] [-v] [-s SUFFIX] [-e ENCODING] [-o OUTDIR]
                     brd pop datadir

positional arguments:
  brd                   evaluation breed code (BSW/GUE/JER/HOL/RDC/SIM)
  pop                   population code (same as country code except for
                        CHR/DEA/DFS/FRR/FRM)
  datadir               absolute or relative path to data files

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         increase output verbosity
  -s SUFFIX, --suffix SUFFIX
                        suffix to add to all input file names, eg. ".usa" if
                        file names are like fileC010f.usa (default=none)
  -e ENCODING, --encoding ENCODING
                        input file encoding (default=utf-8; try also
                        iso-8859-1 or other values listed at
                        http://docs.python.org/2/library/codecs.html#standard-
                        encodings)
  -o OUTDIR, --outdir OUTDIR
                        directory for output files (default=DATADIR)

Warning

How to run the program

Go to the programs directory and type:

In this example

TRENDTEST3.PY

Typing

within the programs directory will give you a small summary of the program usage:

usage: trendtest3.py [-h] [-v] [-s SAMPLES] [-c CONTROLFILE] [-m]
                     [-M MERGEDIR]
                     brd pop datadir

positional arguments:
  brd                   evaluation breed code (BSW/GUE/JER/HOL/RDC/SIM)
  pop                   population code (same as country code except for
                        CHR/DEA/DFS/FRR/FRM)
  datadir               absolute or relative path to data files

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         increase output verbosity
  -s SAMPLES, --samples SAMPLES
                        number of bootstrap samples (default=1000)
  -c CONTROLFILE, --controlfile CONTROLFILE
                        path/name of the control file
                        (default=DATADIR/file305_POPBRD)
  -m, --mergefiles      write merged data files (for independent data checks)
  -M MERGEDIR, --mergedir MERGEDIR
                        absolute or relative path for merged data files
                        (default=DATADIR/merged3)

Trendtest3 - How to run the program

Go to the programs directory and type:

In this example

Output files

The following files are wriiten to the DATADIR or OUTDIR, if specified. All files have a _POPBRD suffix, so that multiple sets of output files for different breeds or population can co-exist in the same directory.

Trendtest3 - Editings

The program will read the three input files, file305_POPBRD, file300_POPBRD and file303_POPBRD, and apply some editings on the data such as only the following bulls will be selected for the test:

A merged file is created, called trt.csv (mil.csv for example), and placed under the DATADIR/merged3 directory, if not otherwise specified. The file can be used if you would like to do further investigation, the format of the file is available in APPENDIX IIIc.

Trendtest3 - Statistical test

The statistical test for method III would be calculated as:

The criteria for passing the test will then be equal to:

Trendtest3 - Log and Result File

A logfile is created, called tt3_POPBRD.log, and placed under DATADIR, if not otherwise specified. The file presents a summary of the information taken in consideration for all the traits analysed, such as

A result file, called file313_POPBRD, will be created in the DATADIR, if not otherwise specified. The file contains an overall summary of the traits analysed, the settings used and the final outcome of the validation, an example is presented below:

rec brd pop tgrp trt testdate pass delta lower   upper stat testval biol      SDg  bv bulls    std_y    std_x x yyyy miny maxy  herit corr mh md nsamp warnings
313 HOL ABC conf sta 20131023 PASS 0.013 -0.001  0.027 PASS  0.023  FAIL     0.564 BV   581    0.463    0.328 N 2009 1991 1999 0.3700 0.86 10 20  1000

In this example:


Sending Results Back to Interbull Centre

Once you have finished running the validation for all populations and traits you needed, using one or all validation methods, results need to be summarized and send back to the Centre. A program called ttzip.py will take care of that for you.

TTZIP.PY

Typing

within the programs directory will give you a small summary of the program usage:

usage: ttzip.py [-h] [-C] brd pop datadir

positional arguments:
  brd            evaluation breed code (BSW/GUE/JER/HOL/RDC/SIM)
  pop            population code (same as country code except for
                 CHR/DEA/DFS/FRR/FRM)
  datadir        absolute or relative path to data files

optional arguments:
  -h, --help     show this help message and exit
  -C, --cleanup  delete all files successfully added to the zip file

ttzip.py - How to run the program

Go to the programs directory and type:

In this example

Output file
The program will create a zip file called ttYYMM_POPBRD.zip (for example tt1310_ABCHOL.zip) containing the results for all validation methods for all populations available in DATADIR or OUTDIR, if specified. Please email the zip file ttYYMM_POPBRD.zip to Interbull Centre ( valentina.palucci@slu.se )


APPENDIX I

Format305 for control files for the TrendTest software

The file305_POPBRD files are prepared by ITBC early in a test run and distributed to the NGECs that need to perform conventional validation for at least one trait in a given population (POP) and breed of evaluation (BRD).

Col

Name

Format

Description

1

tgrp

char

Trait group code (prod/conf/uder/long/calv/fert/work)

2

trt

char

Trait code (seehere)

3

evaldate

int

National evaluation date (yyyymmdd; from param file uploaded to IDEA)

4

herit

float

Heritability (from param file uploaded to IDEA)

5

siresd

float

Sire SD estimated at ITBC in current test run

6

merit

char

Genetic merit definition (B+/B-/T+/T-)

7

type2x

char

Whether foreign bulls (with type of proof 21 or 22) should be included (Y/N)

8

min_hrd

int

Minimum herds to include a bull

9

min_dgh

int

Minimum daughters to include a bull

10

byr1

int

First birth year to include for method 1 (1986 for HOL, 1981 for others)

11

miny

int

First birth year to include for method 3

12

maxy

int

Last birth year to include for method 3

13

corr

int

Correlation between new and old evaluations for method 3 (R=0.99)

14

preval

char

Date of previous validation (yy-mon) for traits last validated more than two years ago

15

chg

char

Change code (Y/N): whether validation is required because population is included for first time or because large changes where introduced in national evaluations for this trait

Notes

  • BRD: breed of evaluation (BSW/GUE/HOL/JER/RDC/SIM)
  • POP: population code (see here)

  • there is a header line which will be skipped by the software
  • there is an extra space between all fields to allow the file to be easily parsed without needing to specify fixed column positions

Sample data records

#grp trt evaldate  herit     siresd gm x mh md byr1 miny maxy corr preval chg
prod mil 20120101 0.2800  543.07922 B+ N 10 20 1986 1998 2002 0.98 ------ N
prod fat 20120101 0.2800   21.49578 B+ N 10 20 1986 1998 2002 0.98 ------ N
prod pro 20120101 0.2800   15.76838 B+ N 10 20 1986 1998 2002 0.98 ------ N
uder scs 20120101 0.1750   11.52474 B+ N 10 20 1986 1998 2002 0.98 ------ Y
conf sta 20120101 0.4500    0.95646 B+ N 10 20 1986 1998 2002 0.99 99-may N
conf usu 20120101 0.2100    0.90437 B+ N 10 20 1986 1998 2002 0.99 99-may N
conf loc 20120101 0.1200    1.00971 B+ N 10 20 1986 1998 2002 0.99 99-may N


APPENDIX IIa

APPENDIX I - Format File300-EBV and File700-GEBV

Col

Name

Start

Format

Description

Example

1

rec type

1

a3

Record type 1

300

2

brd_eval

5

a3

Breed of evaluation 2

HOL

3

pop

9

a3

Population code 3

USA

4

trt

13

a3

Trait of evaluation 4

mil

5

brd_anim

17

a3

Breed of animal

HOL

6

cou_orig

20

a3

Country of first registration

USA

7

sex

23

a1

Sex of animal

M

8

id_no

24

a12

Animal identification number

003000336289

9

typ_prf

37

i2

Type of proof 5

11

10

off_pub

40

a1

Official publicationof proof 6

Y

11

status

42

i2

Animal status 7

10

12

ndau

44

i8

Number of daughters 8

115

13

nhrd

52

i8

Number of herds 9

75

14

edc

60

i8

Number of effective daughter contributions 10

133

15

rel

69

f7.4

Repeatability/Reliability 11

82

16

ebv

76

f10.

National predicted genetic merit 12

2.780

  IMPORTANT NOTE !!!!!

  In the old fileformat 01x-020 and 115, the national proofs were multiplied by a factor: (prod=100; conf=100;udder=1000;long=1000;calv=1000;fert=1000; fert=1000;work=1000).  This multiplication will no longer be needed.
  • 1Valid record types:

    • 300 for EBV
    • 700 for GEBV
  • 2Breed codes accepted:

    • BSW=Brown Swiss type; GUE=Guernsey type; HOL=Holstein-Friesian (Black & White) type; JER=Jersey type; RDC=Red Dairy Cattle type ; SIM=Simmental type.

  • 3Valid population codes: ARG AUS BEL CAN CHE CZE aDEA DEU bDFS ESP EST FIN FRA cFRM GBR HUN IRL ISR ITA JPN LTU LVA NLD NZL POL PRT SVN SVK SWE USA URY ZAF

    • where: a Austria+Germany; bDenmark + Finland + Sweden; c France Montbeliarde;

  • 4Accepted traits abbreviations:

    • Production ==> mil = milk;fat =fat; pro = protein;

    • Conformation ==> sta = stature;cwi = chest width;bde = body depth;ang = angularity;ran = rump angle;rwi = rump width; rls = rear-leg set;rlr = rear-leg rear view;fan = foot angle;hde = heel depth/hoof height; fua = fore udder attachment; ruh = rear udder height; ruw = rear udder width; usu = udder support;ude = udder depth;ftp = front teat placement;ftl = (front) teat length;rtp = rear teat placement;ous = overall udder score; ofl = overall feet&legs score; ocs = overall conformation score; bcs = body condition score; loc = locomotion;

    • Udder ==>scs = somatic cell; mas = mastitis

    • Longevity ==> dlo = direct longevity;

    • Calving ==> dce = direct calving ease;mce = maternal calving ease;dsb= direct stillbirth;msb = maternal stillbirth

    • Female fertility ==>hco = heifer conception;crc = cow recycling;cc1 = lactating cow's ability to conceive (1);cc2 = lactating cow's ability to conceive (2);int= internval traits

    • Workability ==> msp = milking speed;tem = temperament

    • SNP Training ==> cma = clinical mastitis

  • 5Accepted codes:

    • 00 (unknown);

    • 11 (based on first crop sampling daughters);

    • 12 (based on first and second crop daughters);

    • 13 (based on parent average and genomic information only);

    • 21(based on imported semen of proven bull, second crop daughters only);

    • 22 (based on mostly, more than 50%, imported daughters or daughters born from imported embryos.)

    • 23 (GEBV with foreign PA)

    • 24 (GEBV with foreign proof)

  • 6Accepted abbreviations:

    • Y (if bull proof meets national standards for official publication in the country sending information.);

    • P (if bull is part of a simultaneous progeny-testing program, but the proof does not yet meet national standards for official publication);

    • N (otherwise).

  • 7Valid codes for status of bulls:

    • 00(unknown);

    • 10(bull randomly sampled through an official AI scheme);

    • 15 (young bull, genomically tested, not yet selected for AI);

    • 20(other bull. Records with “20” in this file will be excluded from the international evaluation, unless type of proof is “21”).

  • 8Field for number of daughters should be positive. For missing value put 0.

  • 9Field for number of herds should be positive. For missing value put 0.

  • 10 Production, conformation, udder health, fertility, workability, and SNP training traits: Weighting factor used for these traits is “the effective daughter contribution (EDC)”, which is described In the Interbull document Code of practice, Appendix IV, “Weighting factor for international genetic evaluation”, updated April 27, 2004. EDC values should be rounded to the nearest integer value.

    • Calving: The weighting factors used for calving traits it the total number of calvings for the direct effects and number of daughters with calving for maternal effect

    • Longevity: The weighting factor used for longevity traits depends on the national genetic evaluation model. For linear models the weighting factor is the same as described above for conformation, fertility, production, udder health and workability traits. For survival models number of culled daughters is used as the weighting factor.

  • 11Reliability values are nationally calculated reliability values expressed in percents with 4 decimials. For missing value put 0.

  • 12National predicted genetic merit values published domestically. For threshold models the submitted values are from the underlying scale. For missing values put 9999999999. Please note! In the old fileformat 01x-020 and 115, the national proofs were multiplied by a factor: (prod=100; conf=100;udder=1000;long=1000;calv=1000;fert=1000; fert=1000;work=1000). This multiplication will no longer be needed.


APPENDIX IIb

Format302 for Submission of validation method II

Col

Name

Start

Format

Description

1

rec

1

a3

Record type (302)

2

brd

5

a3

Breed of evaluation

3

pop

9

a3

Population code (see here)

4

trt

13

a3

Trait code (seehere)

5

bullid

17

a19

International ID

6

calvyear

37

i4

calving year (YYYY)

7

ndau

42

i5

number of daughters

8

ave_DD

48

f10.4

average Daughter Yield Deviation

  • brd and pop must be in upper case
  • trt must be in lower case


APPENDIX IIc

Format303 for data file for validation method III

Record length = 90

Col

Name

Start

Format

Description

1

rec

1

a3

Record type (303)

2

brd

5

a3

Breed of evaluation (BSW/GUE/HOL/JER/RDC/SIM)

3

pop

9

a3

Population code (see here)

4

trt

13

a3

Trait code (see here)

5

bullid

17

a19

International ID

6

byear

37

i4

Bull's birth year

7

type_prf

42

i2

Type of proof

8

ndau

45

i7

Number of daughters in proof in YYYY-4

9

ebv

53

f9.3

National predicted genetic merit in YYYY-4

10

n1

63

i5

Number of daughters added in YYYY-3

11

n2

69

i5

Number of daughters added in YYYY-2

12

n3

75

i5

Number of daughters added in YYYY-1

13

n4

81

i5

Number of daughters added in YYYY

14

year1d

87

i4

Mean year of first calving of daughters in YYYY-4

Notes:

  1. starting columns allow for an extra blank between all fields
  2. brd and pop must be in upper case
  3. trt must be in lower case
  4. YYYY: year of the most recent routine genetic evaluation run whose results will be included in the international evaluation

  5. nd1-nd4: number of new (first calving) daughters considered in the last available national genetic evaluation in each year

  6. year1d: mean year of first calving of daughters on which the bull’s national evaluation in year YYYY-4 was based

    • This field is not currently used by the trendtest software because it is not uniformly supplied by all NGECs. The field can be set to '0000'. The software replaces year1d by byear+4.


APPENDIX IIIa