Interbull CoP - Appendix VIII - Interbull validation test for genomic evaluations - GEBV test

Document based on:

Sullivan, P.G. 2023. Updated Interbull software for genomic validation tests. Interbull Bulletin 58, p.7-16.

VanRaden, P.M. 2021. Improved genomic validation including extra regressions. Interbull bulletin 56: 65-69.

Mäntysaari, E., Liu, Z and VanRaden P. 2010. Interbull Validation Test for Genomic Evaluations. Interbull Bulletin 41, p. 17-21.

Definitions:

EBV - Estimated Breeding Value (conventional national evaluations of the trait, free of genomic information, which are submitted to Interbull to be used in MACE evaluations)
DGV - Direct Estimated Genomic Value (genomic evaluations based on SNP prediction equations)
GEBV - Genomically Enhanced Estimated Breeding Value (evaluations that combine EBV and DGV)
dGEBV - De-regressed GEBV
GREL - Genomic reliability of the bull's GEBV
EDC - Effective Daughter Contribution
MACE - Multiple Trait Across Country Evaluation
PA - Parent Average
DD - Daughter Deviation
NGEC - National Genetic Evaluation Centre

Motivation

The inclusion of national genomic information in international comparisons for dairy breeds requires that the national genomic breeding values (GEBV) get validated by Interbull in a similar fashion that conventional EBV are validated as a pre-condition to participate in the MACE evaluations.

The GEBV test will be applied to validate national models used to compute GEBV that the national genetic evaluation centers (NGEC) publish and will eventually submit to Interbull for international genetic evaluations including genomic information. The GEBV test can also be considered a quality assurance assessment for national genomic evaluations. GEBV from models that have been tested can be referred to as breeding value estimates with appropriate reliability, and which can be converted to other country scale breeding values using conversion equations derived by Interbull.

Rationale

The GEBV test evaluates:

the unbiasedness of the genomic evaluations through the evaluation of
1. the consistency of the genetic trend captured by GEBV,
2. the consistency of bull rankings before versus after having progeny, and
3. the consistency of the variation of GEBV relative to EBV;
the improvement in selection accuracy from the use of GEBV instead of EBV.

A time-oriented cross-validation is used to test how well genomic evaluations of young bull calves, using current models and phenotypic data from 4 years ago, can predict current progeny performance. The NGEC shall re-run their current evaluation software while excluding the most recent 4 years of daughter phenotypes, to obtain reduced-data genetic (EBVr) and genomic (GEBVr) evaluations. The software will then test if the ranking and variance of bull GEBVr match statistical and genetic expectations relative to ranking and variance of the bull comparisons based on current progeny differences, as an indication of unbiasedness. Furthermore, if the GEBVr are more highly correlated than EBVr with the current progeny phenotypes, it is an indication of accuracy improvement with GEBV.

Linear regression models are used for the validation test, where the expected value of regression slopes equals 1 if validation bulls are an unselected group, and a value less than 1 if only a selected subgroup of the most recent proven bulls have been genotyped. The expected slope is lower with selective genotyping due to effects of selection on variances and covariances used to compute the validation slope. The software will account for effects of selective genotyping on expected slopes, using estimates of selection differential from the differences between average EBV of the genotyped bulls versus all proven bulls born in the period considered for validation testing. Bootstrapping is used for all significance testing, and a combination of statistical and biological limits of tolerance is used by Interbull to assign an overall assessment of pass or fail.

Test data sets

Data formats are described at https://interbull.org/ib/gebvtest_software

Full data sets

Two sets of currently official evaluations for progeny-proven bulls shall be provided for the GEBV test. These will be the EBV and GEBV published or otherwise indirectly used by the NGEC for national selection programs. All bulls provided to Interbull in file300 for MACE shall be included in a conventional EBV file (file300Cf) for the GEBV test, and all these same bulls who are genotyped and have a national GEBV shall be included in the GEBV file (file300Gf).

Conventional national genetic evaluation file (file300Cf)

The national EBV sent by the NGEC as input for the most recent Interbull MACE evaluation will be used to identify validation test candidate bulls, estimate the intensity of selective genotyping, and check bulls birth year and type of proof.

Official national genomic evaluation file (file300Gf)

The national GEBV of current MACE bulls will be used to derive target values reflecting unbiased estimates of average progeny performance for the validation test bulls. The official validation target is derived internally by the software, based on the consistent application for all NGEC of a standardized international method for dGEBV developed by VanRaden(2021).

Reduced data sets

The reduced data sets should be prepared by truncating the phenotypes used as input for both the conventional and the genomic evaluations. The NGEC must exclude phenotypic information from the most recent 4 years and re-run the current models of genetic and genomic evaluation for the traits of interest. The pedigree should not be truncated, just the phenotypes, because each validation bull's predicted genetic contributions in future progeny, based solely on the bull's parent average (EBVr=PA) and on PA plus genomic prediction equations (GEBVr) from the reduced-data evaluations will be needed for the validation test.

Reduced conventional genetic evaluation file (file300Cr)

The NGEC shall carry out a conventional genetic evaluation with no genotypes, while using the truncated data (only phenotypes up to 4 years prior to the date of analysis) but including in the analysis all animals present in the current official evaluations used in MACE (file300Cf).

A minimum of 10 most recent birth years of proven bulls included in file300Cf must also be included in file300Cr. The older proven bulls, with progeny proofs already in the reduced data, are required as a comparative control group, to contrast evaluation changes for younger bulls in the validation test group relative to the older control bulls.

Reduced genomic evaluation file (file300Gr)

The NGEC shall carry out a genomic evaluation that includes the genotypes, while using the truncated data (only phenotypes up to 4 years prior to the date of analysis) but including in the analysis all animals present in the current official evaluations used as input to MACE (file300Cf). All bulls included in the conventional file300Cr who are also genomically evaluated must be included in the genomic file300Gr.

If a significant number of foreign bulls are included in the reference population for national genomic evaluations, and estimations of genomic prediction equations use de-regressed MACE values for these bulls as input, the reduced genomic evaluation can be achieved in three ways, listed by descending order of preference below:

The NGEC can participate in the Interbull truncated-MACE service. By providing reduced-data national EBV to Interbull for truncated MACE, the results returned by Interbull will be the ideal MACE input for reduced-data national genomic evaluation.
The Interbull Centre can make historical files available upon request, which shall include the official MACE results published 4 years earlier. These MACE proofs will be less ideal than truncated MACE proofs, because current evaluation systems were not re-run with older data by any country for the MACE proofs already computed 4 years earlier.
The current MACE proofs can be used by excluding all recently proven bulls in MACE who did not have an official MACE proof 4 years earlier. This approach is an exception that should only be used if both preferred options above are impractical. The main concern with this approach is that reduced-data genomic prediction equations will include contributions from phenotypes in most recent 4 years, through sires of the recently proven bulls, and more generally through MACE proofs of all older bull with any relationship to the validation test bulls whose MACE proofs are being excluded.

Specific instructions for data preparation:

The domestic bulls (type of proof ≠ 21 or 22) that have EDCf ≥ 20 and EDCr = 0 are called test bulls. Test bulls are likely to be included in the genomic reference population with full data, but not with reduced data. Interbull recommends that the reduction in size of the genomic reference population, due to the dropping of test bulls in reduced data, should not exceed 25%
1. If the size of genomic reference population is reduced by too much, then the accuracy of GEBV calculated from truncated data becomes significantly lower than with full data. In that case, the country can use n<4 years as the time difference between full and reduced data sets.
2. If the number of test bulls is too small (<50), then the country may choose to also include foreign bulls that have been used locally (type of proof = 21 or 22) with EDCf ≥ 20 local progeny and EDCr = 0 as part of the validation group, to increase the number of test bulls.
3. In both exceptions above, the criteria used to define test bulls must be communicated to the Interbull Centre.
Appropriate time windows (birth years of test bulls) may vary depending on the trait to be validated, the speed of progeny test programs and other factors. The standard adopted for the GEBV test is to include progeny-proven bulls born since (YYYY-8) as test bulls. For instance, if the evaluation year is 2024 and the most recently proven bulls in file300Cf were born in 2020, then the test bulls would include bulls born between 2016 and 2020. Countries may include a wider window of test bulls, or may shift the window by one year, but the reasons must always be communicated to the Interbull Centre.
Include all available bulls of interest, as described below, in the respective files with their EBVf, EBVr, GEBVf and GEBVr, without editing based on EDCf or EDCr. These final edits, as required for the validation test, are applied within the GEBV test software.
If the GEBV are a combination of DGV and EBV, then both the DGVr and EBVr used to generate the GEBVr must be estimated from the truncated data.
Bulls with EBV in the full data sets only, having no progeny information four years ago (EDCr=0), should be included in the reduced-data files (300Cr and 300Gr). Additionally, a minimum of 10 years of bulls with progeny-based EBV in both the full and reduced data sets should be included in the reduced-data files. After recent updates to the software, bulls with progeny in the reduced data are now additionally required as a statistical control group used to improve statistical tests for bias in the evaluations of validation test bulls.

Test description

Testing for bias in the GEBV

Methodology updates in 2024

The official Interbull GEBV test is now based on VanRaden's de-regressed GEBV (described in the 2021 Interbull bulletin paper, https://journal.interbull.org/index.php/ib/article/view/82) as the official prediction target. The VanRaden dGEBV replaces the previously used dEBV target described by Mantysaari et al (2010). Predicting later GEBV or dGEBV from earlier GEBV is conceptually easier to understand and to verify than predictions of dEBV. The new tests are also more suitable for validating single-step models, where genomic preselection effects are properly accounted in GEBVf, GEBVr and dGEBV, whereas dEBV include genomic preselection bias.

With the implementation of a new validation target, the de-regression method has now been internationally standardized because the dGEBV are derived directly from values based on official publication rules, in file300Gf and file300Gr files, in the same way for all countries.

The software will make sure that full and reduced-data evaluations are on the same genetic base of expression by adjusting the mean and variance of reduced-data evaluations to match the base of expression of full-data evaluations. These adjustments to align the evaluation scales are based on bulls already progeny-proven in the reduced data who have expected changes in evaluations very close or equal to zero, due to either no new progeny or relatively few in the recent data. After aligning the evaluation scales, changes in evaluations for the validation test bulls, who have all their progeny in the recent data, are equivalent to contrasts of evaluation changes for validation bulls relative to previous generations of proven bulls who have expected changes of 0.

Average changes in evaluation between reduced and full data will now have an expectation of 0 for any group of bulls, after the scales are aligned. Additional tests have been added, which account for both the combination of intercept and slope estimates from the validation models, to detect probabilities of bias in below-average versus average versus above-average (top) bulls. A new user option to output base-adjusted evaluations from reduced data, for all or selected traits, can also be used to help isolate reasons for detected biases in the evaluations of any traits failing the GEBV test.

Besides the application of the official GEBV test, the software also allows users to choose different validation targets for further internal research. Below is the list of available options:

file300Df_COUBRD (de-regressed EBV, as used previously in the old test)
file300Cf_COUBRD (EBV from the full-data evaluation)
file300Gf_COUBRD (GEBV from the full-data evaluation)
file300Vf_COUBRD (Any user-defined value, e.g. single step DD, new file)

The user must create whichever input file(s) above are needed for the requested validation target.

A Bootstrapping approach has been implemented to replace the previous t-test for bias in validation slopes, addressing technical concerns that the t-test was not valid, because validation bulls are genetically related, and the validation model residuals are correlated.

The overall validation result, which combines results from either a PASS or FAIL across several sub-tests, will present the following value: PASS, hiSE (i.e. high Standard Error) or FAIL. An overall PASS requires a PASS for the different slope tests plus either a PASS or hiSE for the accuracy test. A result of fail for either the combination of different slope tests or the accuracy test causes an overall FAIL. The new reporting of hiSE indicates too little data to conclusively prove PASS or FAIL in some traits and populations.

Validation regression models

Weighted linear regression models are used to test for bias in both the national genomic and the conventional evaluations, respectively. To pass the official Interbull GEBV test, however, requires only that the GEBVr are unbiased, and not the EBVr. The test for bias in EBVr is provided as comparative and additional information only.

We first define a validation target variable φ that resembles phenotypic progeny averages, and which is based on the progeny contributions in current GEBVf of the validation test bulls. All progeny contributions for the test bulls were from the most recent 4-year period, and contributed to GEBVf and EBVf, but not GEBVr and EBVr. The validation regression models are:

As discussed in the previous section, the validation target φ in the official GEBV test is defined as the dGEBV of VanRaden (2021). The validation test bulls for both models must meet the following criteria: EDCf ≥ 20 and EDCr = 0, born within a pre-defined range of birth years, such as (YYYY-8) to (YYYY-4) inclusive, where YYYY is the current year of evaluation, and having both an EBVr and a GEBVr available. All validation test bulls with an observation in φ are therefore genotyped, and the most recently progeny proven.

The reliability equivalent of information from progeny phenotypes, all of which were included in the full data but not in the reduced data for the test bulls, is used as the regression weight in both models. The progeny information, expressed as an EDC, is first derived from genomic reliabilities based on full (GRELf) and reduced data sets (GRELr), as shown below, and the EDC are then converted back to a reliability equivalent as the bull's regression weight (WT):

The constant λ is a function of the trait heritability but using any value for λ in the pair of equations above will result in the same WT, so these equations can be simplified by substituting λ=1 in both equations. The WT is thus a function of only GRELf and GRELr.

Effects of selective genotyping

The estimated regression coefficients, b₁ and b₃ from the two validation models, are compared with expected values to test H₀: b₁ = E(b₁) for GEBVr and H₀: b₃ = E(b₃) for EBVr. The expected values are equal to 1 for both models if all bulls most recently progeny-proven were also genotyped. The expected values will be lower than 1, however, if only a subset of bulls were genotyped, and the genotyped bulls were non-randomly selected with respect to the given trait. The software includes adjustment for the effects of selective genotyping on E(b₁) and E(b₃).

The first step in deriving the adjustments is to estimate selection differentials for each validated trait. Selection differential is the standardized difference in means between genotyped bulls (g) versus (all) progeny-proven bulls who otherwise qualify as members of the validation test group.

i = (µ_EBVg - µ_EBVall)/ σ_EBVall[3]

Using normal distribution tables from quantitative genetics books (e.g. page 379 from Falconer, D. S. & Mackay, T. F. C. Introduction to Quantitative Genetics, Longman, 4^th ed. 1996) the proportion selected by truncation (p) to generate an equivalent selection differential as the observed i, and the corresponding truncation point x that divides the standard normal density into the selected (p) and non-selected (1-p) proportions, can be obtained.

From the equivalent proportion under truncation selection, the expected values of regression coefficients can be approximated using expected effects of truncation selection on the variances and covariance between φ and the independent variable X, where X is either GEBVr or EBVr. Denoting all variables after selection on φ with a superscript s, defining R²_b = R²_(φ,X) before selection, and following Bulmer (1971) and Henderson (1975):

From the expected C^s(φ,X) and V^s(X) after selection, we get the expected b₁ after selection:

E^s(b₁) = ((1 - k) / (1-k*R²_b)) * E(b₁) [5]

From the observed R²_φ,x after selection, with the following expected value, we can derive the required R²_b in equation [5] as follows:

Substituting [6] into [5] and simplifying, we get expected slope as a function of observed R²_φ,x

Example: Let µ_EBVg = 16.00, µ_EBVall = 11.76, σ_EBVall = 10.00, R²_φ,x= 0.50. Using equation [3], the selection differential (i) for genotyped bulls equals 0.424. For this value of i, the equivalent proportion by truncation selection for genotyping (p) would be 0.75 and the mean deviation of the truncation point from the overall mean (x) would be -0.674 (from reference table). From equation [4] we get k=0.466, and from [6] then [5] we get R_b² = 0.652 and then E^s (b₁) = 0.767, or directly from [7] we also get E^s (b₁) = 0.767.

Table 1 - Examples of expected regression coefficients (E(b₁)) as functions of the selection intensity (i) and the coefficient of determination after selection (R²_φ,x).

Testing for accuracy improvement with genomics

The improvement in prediction of daughter performance due to the addition of genomic information (i.e. genotyping) is tested by bootstrapping the difference in validation model R² for models [1] - [2]. A positive difference (P<.05) indicates a significance increase in accuracy with GEBV and therefore results in a Pass. A negative difference (P<.05) results in a Fail, and a non-significant difference (P>.05) indicates that data were insufficient to conclude either way, which therefore results in a designation of hiSE (high standard error). A Pass or hiSE result is required as one part of the overall requirements to PASS the official GEBV test.

Description of National Genomic Evaluations

National Genetic Centres shall provide a description of their national genomic evaluations to Interbull Centre via the specific genomic electronic form within the PREP database https://prep.interbull.org/

Updated descriptions shall be provided each time changes to the national genomic evaluations are introduced.

References

Sullivan, P.G. 2023. Updated Interbull software for genomic validation tests. Interbull Bulletin 58, p.7-16.

VanRaden, P.M. 2021. Improved genomic validation including extra regressions. Interbull bulletin 56: 65-69.

Mäntysaari, E., Liu, Z and VanRaden P. 2010. Interbull Validation Test for Genomic Evaluations. Interbull Bulletin 41, p. 17-21.

Bulmer, M.G. 1971. The effect of selection on genetic variability. American Nat. 105:201.

Henderson, C.R. 1975. Best Linear Unbiased estimation and prediction under a selection model. Biometrics 31:423-447.

Falconer, D. S. & Mackay, T. F. C. Introduction to Quantitative Genetics, Longman, 4^th ed. 1996