Size: 28135
Comment:
|
← Revision 68 as of 2025-05-19 09:53:25 ⇥
Size: 22894
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
#pragma section-numbers == Interbull CoP - Appendix VIII - Interbull validation test for genomic evaluations - GEBV test == |
#pragma section-numbers = Interbull CoP - Appendix VIII - Interbull validation test for genomic evaluations - GEBV test = |
Line 12: | Line 11: |
'''Definitions:''' |
== Definitions: == |
Line 34: | Line 32: |
'''Motivation ''' |
== Motivation == |
Line 38: | Line 35: |
The '''''GEBV tes''''t''''' will be applied to validate national models used to compute GEBV that the national genetic evaluation centers (NGEC) publish and will eventually submit to Interbull for international genetic evaluations including genomic information. The ''GEBV test'' can also be considered a quality assurance assessment for national genomic evaluations. GEBV from models that have been tested can be referred to as breeding value estimates with appropriate reliability, and which can be converted to other country scale breeding values using conversion equations derived by Interbull. ''' | The '''''GEBV test''''' will be applied to validate national models used to compute GEBV that the national genetic evaluation centers (NGEC) publish and will eventually submit to Interbull for international genetic evaluations including genomic information. The ''GEBV test'' can also be considered a quality assurance assessment for national genomic evaluations. GEBV from models that have been tested can be referred to as breeding value estimates with appropriate reliability, and which can be converted to other country scale breeding values using conversion equations derived by Interbull. |
Line 40: | Line 37: |
Rationale ''' ''' | == Rationale == The '''GEBV test''' evaluates: |
Line 42: | Line 40: |
'''The''' ''GEBV test'' '''evaluates: ''' | 1. the unbiasedness of the genomic evaluations through the evaluation of 1. the consistency of the genetic trend captured by GEBV, 1. the consistency of bull rankings before versus after having progeny, and 1. the consistency of the variation of GEBV relative to EBV; |
Line 44: | Line 45: |
1. '''the unbiasedness of the genomic evaluations through the evaluation of 1. the consistency of the genetic trend captured by GEBV, 1. the consistency of bull rankings before versus after having progeny, and | 1. the improvement in selection accuracy from the use of GEBV instead of EBV. |
Line 46: | Line 47: |
1. the consistency of the variation of GEBV relative to EBV; ''' |
A time-oriented cross-validation is used to test how well genomic evaluations of young bull calves, using current models and phenotypic data from 4 years ago, can predict current progeny performance. The NGEC shall re-run their current evaluation software while excluding the most recent 4 years of daughter phenotypes, to obtain reduced-data genetic (EBVr) and genomic (GEBVr) evaluations. The software will then test if the ranking and variance of bull GEBVr match statistical and genetic expectations relative to ranking and variance of the bull comparisons based on current progeny differences, as an indication of unbiasedness. Furthermore, if the GEBVr are more highly correlated than EBVr with the current progeny phenotypes, it is an indication of accuracy improvement with GEBV. |
Line 49: | Line 49: |
1. '''the improvement in selection accuracy from the use of GEBV instead of EBV. ''' | Linear regression models are used for the validation test, where the expected value of regression slopes equals 1 if validation bulls are an unselected group, and a value less than 1 if only a selected subgroup of the most recent proven bulls have been genotyped. The expected slope is lower with selective genotyping due to effects of selection on variances and covariances used to compute the validation slope. The software will account for effects of selective genotyping on expected slopes, using estimates of selection differential from the differences between average EBV of the genotyped bulls versus all proven bulls born in the period considered for validation testing. Bootstrapping is used for all significance testing, and a combination of statistical and biological limits of tolerance is used by Interbull to assign an overall assessment of pass or fail. |
Line 51: | Line 51: |
'''A time-oriented cross-validation is used to test how well genomic evaluations of young bull calves, using current models and phenotypic data from 4 years ago, can predict current progeny performance. The NGEC shall re-run their current evaluation software while excluding the most recent 4 years of daughter phenotypes, to obtain reduced-data genetic (EBVr) and genomic (GEBVr) evaluations. The software will then test if the ranking and variance of bull GEBVr match statistical and genetic expectations relative to ranking and variance of the bull comparisons based on current progeny differences, as an indication of unbiasedness. Furthermore, if the GEBVr are more highly correlated than EBVr with the current progeny phenotypes, it is an indication of accuracy improvement with GEBV. ''' | == Test data sets == Data formats are described at https://interbull.org/ib/gebvtest_software_2024 |
Line 53: | Line 54: |
'''Linear regression models are used for the validation test, where the expected value of regression slopes equals 1 if validation bulls are an unselected group, and a value less than 1 if only a selected subgroup of the most recent proven bulls have been genotyped. The expected slope is lower with selective genotyping due to effects of selection on variances and covariances used to compute the validation slope. The software will account for effects of selective genotyping on expected slopes, using estimates of selection differential from the differences between average EBV of the genotyped bulls versus all proven bulls born in the period considered for validation testing. Bootstrapping is used for all significance testing, and a combination of statistical and biological limits of tolerance is used by Interbull to assign an overall assessment of pass or fail. ''' | == Full data sets == Two sets of currently official evaluations for progeny-proven bulls shall be provided for the GEBV test. These will be the EBV and GEBV published or otherwise indirectly used by the NGEC for national selection programs. All bulls provided to Interbull in file300 for MACE shall be included in a conventional EBV file (file300Cf) for the GEBV test, and all these same bulls who are genotyped and have a national GEBV shall be included in the GEBV file (file300Gf). |
Line 55: | Line 57: |
Test data sets ''' ''' | * '''Conventional national genetic evaluation file (file300Cf)''' |
Line 57: | Line 59: |
'''Data formats are described at__ [[https://wiki/|https://wiki]].interbull.org/public/GEBVtest_enhanced2024__'''. ''' ''' | The national EBV sent by the NGEC as input for the most recent Interbull MACE evaluation will be used to identify validation test candidate bulls, estimate the intensity of selective genotyping, and check bulls birth year and type of proof. |
Line 59: | Line 61: |
Full data sets ''' ''' | * '''Official national genomic evaluation file (file300Gf)''' |
Line 61: | Line 63: |
'''Two sets of currently official evaluations for progeny-proven bulls shall be provided for the GEBV test. These will be the EBV and GEBV published or otherwise indirectly used by the NGEC for national selection programs. All bulls provided to Interbull in file300 for MACE shall be included in a conventional EBV file (file300Cf) for the GEBV test, and all these same bulls who are genotyped and have a national GEBV shall be included in the GEBV file (file300Gf). ''' | The national GEBV of current MACE bulls will be used to derive target values reflecting unbiased estimates of average progeny performance for the validation test bulls. The official validation target is derived internally by the software, based on the consistent application for all NGEC of a standardized international method for dGEBV developed by !VanRaden(2021). |
Line 63: | Line 65: |
* Conventional national genetic evaluation file (file300Cf) ''' ''' | * '''Reduced data sets''' |
Line 65: | Line 67: |
'''The national EBV sent by the NGEC as input for the most recent Interbull MACE evaluation will be used to identify validation test candidate bulls, estimate the intensity of selective genotyping, and check bulls birth year and type of proof. ''' | The reduced data sets should be prepared by truncating the phenotypes used as input for both the conventional and the genomic evaluations. The NGEC must exclude phenotypic information from the most recent 4 years and re-run the current models of genetic and genomic evaluation for the traits of interest. The pedigree should not be truncated, just the phenotypes, because each validation bull's predicted genetic contributions in future progeny, based solely on the bull's parent average (EBVr=PA) and on PA plus genomic prediction equations (GEBVr) from the reduced-data evaluations will be needed for the validation test. |
Line 67: | Line 69: |
* Official national genomic evaluation file (file300Gf) ''' ''' | * '''Reduced conventional genetic evaluation file (file300Cr)''' |
Line 69: | Line 71: |
'''The national GEBV of current MACE bulls will be used to derive target values reflecting unbiased estimates of average progeny performance for the validation test bulls. The official validation target is derived internally by the software, based on the consistent application for all NGEC of a standardized international method for dGEBV developed by !VanRaden''' '''(2021). ''' | The NGEC shall carry out a conventional genetic evaluation with no genotypes, while using the truncated data (only phenotypes up to 4 years prior to the date of analysis) but including in the analysis all animals present in the current official evaluations used in MACE (file300Cf). |
Line 71: | Line 73: |
* Reduced data sets ''' ''' | A minimum of 10 most recent birth years of proven bulls included in file300Cf must also be included in file300Cr. The older proven bulls, with progeny proofs already in the reduced data, are required as a comparative control group, to contrast evaluation changes for younger bulls in the validation test group relative to the older control bulls. |
Line 73: | Line 75: |
'''The reduced data sets should be prepared by truncating the phenotypes used as input for both the conventional and the genomic evaluations. The NGEC must exclude phenotypic information from the most recent 4 years and re-run the current models of genetic and genomic evaluation for the traits of interest. The pedigree should not be truncated, just the phenotypes, because each validation bull's predicted genetic contributions in future progeny, based solely on the bull's parent average (EBVr=PA) and on PA plus genomic prediction equations (GEBVr) from the reduced-data evaluations will be needed for the validation test. ''' | * '''Reduced genomic evaluation file (file300Gr)''' |
Line 75: | Line 77: |
* Reduced conventional genetic evaluation file (file300Cr)''' ''' | The NGEC shall carry out a genomic evaluation that includes the genotypes, while using the truncated data (only phenotypes up to 4 years prior to the date of analysis) but including in the analysis all animals present in the current official evaluations used as input to MACE (file300Cf). All bulls included in the conventional file300Cr who are also genomically evaluated must be included in the genomic file300Gr. |
Line 77: | Line 79: |
'''The NGEC shall carry out a conventional genetic evaluation with no genotypes, while using the truncated data (only phenotypes up to 4 years prior to the date of analysis) but including in the analysis all animals present in the current official evaluations used in MACE (file300Cf). ''' | If a significant number of foreign bulls are included in the reference population for national genomic evaluations, and estimations of genomic prediction equations use de-regressed MACE values for these bulls as input, the reduced genomic evaluation can be achieved in three ways, listed by descending order of preference below: |
Line 79: | Line 81: |
'''A minimum of 10 most recent birth years of proven bulls included in file300Cf must also be included in file300Cr. The older proven bulls, with progeny proofs already in the reduced data, are required as a comparative control group, to contrast evaluation changes for younger bulls in the validation test group relative to the older control bulls. ''' | 1. The NGEC can participate in the Interbull truncated-MACE service. By providing reduced-data national EBV to Interbull for truncated MACE, the results returned by Interbull will be the ideal MACE input for reduced-data national genomic evaluation. |
Line 81: | Line 83: |
* Reduced genomic evaluation file (file300Gr)''' ''' | 1. The Interbull Centre can make historical files available upon request, which shall include the official MACE results published 4 years earlier. These MACE proofs will be less ideal than truncated MACE proofs, because current evaluation systems were not re-run with older data by any country for the MACE proofs already computed 4 years earlier. |
Line 83: | Line 85: |
'''The NGEC shall carry out a genomic evaluation that includes the genotypes, while using the truncated data (only phenotypes up to 4 years prior to the date of analysis) but including in the analysis all animals present in the current official evaluations used as input to MACE (file300Cf). All bulls included in the conventional file300Cr who are also genomically evaluated must be included in the genomic file300Gr. ''' | 1. The current MACE proofs can be used by excluding all recently proven bulls in MACE who did not have an official MACE proof 4 years earlier. This approach is an exception that should only be used if both preferred options above are impractical. The main concern with this approach is that reduced-data genomic prediction equations will include contributions from phenotypes in most recent 4 years, through sires of the recently proven bulls, and more generally through MACE proofs of all older bull with any relationship to the validation test bulls whose MACE proofs are being excluded. |
Line 85: | Line 87: |
'''If a significant number of foreign bulls are included in the reference population for national genomic evaluations, and estimations of genomic prediction equations use de-regressed MACE values for these bulls as input, the reduced genomic evaluation can be achieved in three ways, listed by descending order of preference below: ''' | == Specific instructions for data preparation: == A. The domestic bulls (type of proof ≠ 21 or 22) that have EDCf ≥ 20 and EDCr = 0 are called test bulls. Test bulls are likely to be included in the genomic reference population with full data, but not with reduced data. Interbull recommends that the reduction in size of the genomic reference population, due to the dropping of test bulls in reduced data, should not exceed 25% i. If the size of genomic reference population is reduced by too much, then the accuracy of GEBV calculated from truncated data becomes significantly lower than with full data. In that case, the country can use n<4 years as the time difference between full and reduced data sets. |
Line 87: | Line 91: |
1. '''The NGEC can participate in the Interbull truncated-MACE service. By providing reduced-data national EBV to Interbull for truncated MACE, the results returned by Interbull will be the ideal MACE input for reduced-data national genomic evaluation. ''' | i. If the number of test bulls is too small (<50), then the country may choose to also include foreign bulls that have been used locally (type of proof = 21 or 22) with EDCf ≥ 20 local progeny and EDCr = 0 as part of the validation group, to increase the number of test bulls. |
Line 89: | Line 93: |
1. '''The Interbull Centre can make historical files available upon request, which shall include the official MACE results published 4 years earlier. These MACE proofs will be less ideal than truncated MACE proofs, because current evaluation systems were not re-run with older data by any country for the MACE proofs already computed 4 years earlier. ''' | i. In both exceptions above, the criteria used to define test bulls must be communicated to the Interbull Centre. |
Line 91: | Line 95: |
1. '''The current MACE proofs can be used by excluding all recently proven bulls in MACE who did not have an official MACE proof 4 years earlier. This approach is an exception that should only be used if both preferred options above are impractical. The main concern with this approach is that reduced-data genomic prediction equations will include contributions from phenotypes in most recent 4 years, through sires of the recently proven bulls, and more generally through MACE proofs of all older bull with any relationship to the validation test bulls whose MACE proofs are being excluded. ''' | A. Appropriate time windows (birth years of test bulls) may vary depending on the trait to be validated, the speed of progeny test programs and other factors. The standard adopted for the GEBV test is to include progeny-proven bulls born since (YYYY-8) as test bulls. For instance, if the evaluation year is 2024 and the most recently proven bulls in file300Cf were born in 2020, then the test bulls would include bulls born between 2016 and 2020. Countries may include a wider window of test bulls, or may shift the window by one year, but the reasons must always be communicated to the Interbull Centre. |
Line 93: | Line 97: |
Specific instructions for data preparation: ''' ''' | A. Include all available bulls of interest, as described below, in the respective files with their EBVf, EBVr, GEBVf and GEBVr, without editing based on EDCf or EDCr. These final edits, as required for the validation test, are applied within the GEBV test software. |
Line 95: | Line 99: |
1. '''The domestic bulls (type of proof ≠ 21 or 22) that have EDCf ≠ 20 and EDCr = 0 are called test bulls. Test bulls are likely to be included in the genomic reference population with full data, but not with reduced data. Interbull recommends that the reduction in size of the genomic reference population, due to the dropping of test bulls in reduced data, should not exceed 25%'''. ''' ''' | A. If the GEBV are a combination of DGV and EBV, then both the DGVr and EBVr used to generate the GEBVr must be estimated from the truncated data. |
Line 97: | Line 101: |
1. '''If the size of genomic reference population is reduced by too much, then the accuracy of GEBV calculated from truncated data becomes significantly lower than with full data. In that case, the country can use n<4 years as the time difference between full and reduced data sets. ''' | A. Bulls with EBV in the full data sets only, having no progeny information four years ago (EDCr=0), should be included in the reduced-data files (300Cr and 300Gr). Additionally, a minimum of 10 years of bulls with progeny-based EBV in both the full and reduced data sets should be included in the reduced-data files. After recent updates to the software, bulls with progeny in the reduced data are now additionally required as a statistical control group used to improve statistical tests for bias in the evaluations of validation test bulls. |
Line 99: | Line 103: |
1. '''If the number of test bulls is too small (<50), then the country may choose to also include foreign bulls that have been used locally (type of proof = 21 or 22) with EDCf ≠ 20 local progeny and EDCr = 0 as part of the validation group, to increase the number of test bulls. ''' | == Test description == == Testing for bias in the GEBV == * '''Methodology updates in 2024 ''' |
Line 101: | Line 107: |
1. '''In both exceptions above, the criteria used to define test bulls must be communicated to the Interbull Centre. ''' | The official Interbull GEBV test is now based on !VanRaden's de-regressed GEBV (described in the 2021 Interbull bulletin paper, https://journal.interbull.org/index.php/ib/article/view/82) as the official prediction target. The !VanRaden dGEBV replaces the previously used dEBV target described by Mantysaari et al (2010). Predicting later GEBV or dGEBV from earlier GEBV is conceptually easier to understand and to verify than predictions of dEBV. The new tests are also more suitable for validating single-step models, where genomic preselection effects are properly accounted in GEBVf, GEBVr and dGEBV, whereas dEBV include genomic preselection bias. |
Line 103: | Line 109: |
1. '''Appropriate time windows (birth years of test bulls) may vary depending on the trait to be validated, the speed of progeny test programs and other factors. The standard adopted for the GEBV test is to include progeny-proven bulls born since (YYYY-8) as test bulls. For instance, if the evaluation year is 2024 and the most recently proven bulls in file300Cf were born in 2020, then the test bulls would include bulls born between 2016 and 2020. Countries may include a wider window of test bulls, or may shift the window by one year, but the reasons must always be communicated to the Interbull Centre. ''' | With the implementation of a new validation target, the de-regression method has now been internationally standardized because the dGEBV are derived directly from values based on official publication rules, in file300Gf and file300Gr files, in the same way for all countries. |
Line 105: | Line 111: |
1. '''Include all available bulls of interest, as described below, in the respective files with their EBVf, EBVr, GEBVf and GEBVr, without editing based on EDCf or EDCr. These final edits, as required for the validation test, are applied within the GEBV test software. ''' | The software will make sure that full and reduced-data evaluations are on the same genetic base of expression by adjusting the mean and variance of reduced-data evaluations to match the base of expression of full-data evaluations. These adjustments to align the evaluation scales are based on bulls already progeny-proven in the reduced data who have expected changes in evaluations very close or equal to zero, due to either no new progeny or relatively few in the recent data. After aligning the evaluation scales, changes in evaluations for the validation test bulls, who have all their progeny in the recent data, are equivalent to contrasts of evaluation changes for validation bulls relative to previous generations of proven bulls who have expected changes of 0. |
Line 107: | Line 113: |
1. '''If the GEBV are a combination of DGV and EBV, then both the DGVr and EBVr used to generate the GEBVr must be estimated from the truncated data. ''' | Average changes in evaluation between reduced and full data will now have an expectation of 0 for any group of bulls, after the scales are aligned. Additional tests have been added, which account for both the combination of intercept and slope estimates from the validation models, to detect probabilities of bias in below-average versus average versus above-average (top) bulls. A new user option to output base-adjusted evaluations from reduced data, for all or selected traits, can also be used to help isolate reasons for detected biases in the evaluations of any traits failing the GEBV test. |
Line 109: | Line 115: |
1. '''Bulls with EBV in the full data sets only, having no progeny information four years ago (EDCr=0), should be included in the reduced-data files (300Cr and 300Gr). Additionally, a minimum of 10 years of bulls with progeny-based EBV in both the full and reduced data sets should be included in the reduced-data files. After recent updates to the software, bulls with progeny in the reduced data are now additionally required as a statistical control group used to improve statistical tests for bias in the evaluations of validation test bulls. ''' | Besides the application of the official GEBV test, the software also allows users to choose different validation targets for further internal research. Below is the list of available options: |
Line 111: | Line 117: |
Test description ''' ''' | * file300Df_COUBRD (de-regressed EBV, as used previously in the old test) |
Line 113: | Line 119: |
Testing for bias in the GEBV ''' ''' | * file300Cf_COUBRD (EBV from the full-data evaluation) |
Line 115: | Line 121: |
* Methodology updates in 2024 ''' ''' | * file300Gf_COUBRD (GEBV from the full-data evaluation) |
Line 117: | Line 123: |
'''The official Interbull GEBV test is now based on !VanRaden's de-regressed GEBV (described in the 2021 Interbull bulletin paper,''' https://journal.interbull.org/index.php/ib/article/view/82''')''' '''as the official prediction target. The !VanRaden dGEBV replaces the previously used dEBV target described by Mantysaari et al (2010). Predicting later GEBV or dGEBV from earlier GEBV is conceptually easier to understand and to verify than predictions of dEBV. The new tests are also more suitable for validating single-step models, where genomic preselection effects are properly accounted in GEBVf, GEBVr and dGEBV, whereas dEBV include genomic preselection bias. ''' | * file300Vf_COUBRD (Any user-defined value, e.g. single step DD, new file) |
Line 119: | Line 125: |
'''With the implementation of a new validation target, the de-regression method has now been internationally standardized because the dGEBV are derived directly from values based on official publication rules, in file300Gf and file300Gr files, in the same way for all countries. ''' | The user must create whichever input file(s) above are needed for the requested validation target. |
Line 121: | Line 127: |
'''The software will make sure that full and reduced-data evaluations are on the same genetic base of expression by adjusting the mean and variance of reduced-data evaluations to match the base of expression of full-data evaluations. These adjustments to align the evaluation scales are based on bulls already progeny-proven in the reduced data who have expected changes in evaluations very close or equal to zero, due to either no new progeny or relatively few in the recent data. After aligning the evaluation scales, changes in evaluations for the validation test bulls, who have all their progeny in the recent data, are equivalent to contrasts of evaluation changes for validation bulls relative to previous generations of proven bulls who have expected changes of 0. ''' | A Bootstrapping approach has been implemented to replace the previous t-test for bias in validation slopes, addressing technical concerns that the t-test was not valid, because validation bulls are genetically related, and the validation model residuals are correlated. |
Line 123: | Line 129: |
'''Average changes in evaluation between reduced and full data will now have an expectation of 0 for any group of bulls, after the scales are aligned. Additional tests have been added, which account for both the combination of intercept and slope estimates from the validation models, to detect probabilities of bias in below-average versus average versus above-average (top) bulls. A new user option to output base-adjusted evaluations from reduced data, for all or selected traits, can also be used to help isolate reasons for detected biases in the evaluations of any traits failing the GEBV test. ''' | The overall validation result, which combines results from either a PASS or FAIL across several sub-tests, will present the following value: PASS, hiSE (i.e. high Standard Error) or FAIL. An overall PASS requires a PASS for the different slope tests plus either a PASS or hiSE for the accuracy test. A result of fail for either the combination of different slope tests or the accuracy test causes an overall FAIL. The new reporting of hiSE indicates too little data to conclusively prove PASS or FAIL in some traits and populations. |
Line 125: | Line 131: |
'''Besides the application of the official GEBV test, the software also allows users to choose different validation targets for further internal research. Below is the list of available options: ''' | * '''Validation regression models''' |
Line 127: | Line 133: |
* '''file300Df_COUBRD (de-regressed EBV, as used previously in the old test) ''' | Weighted linear regression models are used to test for bias in both the national genomic and the conventional evaluations, respectively. To pass the official Interbull GEBV test, however, requires only that the GEBVr are unbiased, and not the EBVr. The test for bias in EBVr is provided as comparative and additional information only. |
Line 129: | Line 135: |
* '''file300Cf_COUBRD (EBV from the full-data evaluation) ''' | We first define a validation target variable φ that resembles phenotypic progeny averages, and which is based on the progeny contributions in current GEBVf of the validation test bulls. All progeny contributions for the test bulls were from the most recent 4-year period, and contributed to GEBVf and EBVf, but not GEBVr and EBVr. The validation regression models are: |
Line 131: | Line 137: |
* '''file300Gf_COUBRD (GEBV from the full-data evaluation) ''' | {{attachment:formula_1_2.png||height="81",width="297"}} |
Line 133: | Line 139: |
* '''file300Vf_COUBRD (Any user-defined value, e.g. single step DD, new file) ''' | As discussed in the previous section, the validation target φ in the official GEBV test is defined as the dGEBV of !VanRaden (2021). The validation test bulls for both models must meet the following criteria: EDCf ≥ 20 and EDCr = 0, born within a pre-defined range of birth years, such as (YYYY-8) to (YYYY-4) inclusive, where YYYY is the current year of evaluation, and having both an EBVr and a GEBVr available. All validation test bulls with an observation in φ are therefore genotyped, and the most recently progeny proven. |
Line 135: | Line 141: |
'''The user must create whichever input file(s) above are needed for the requested validation target. ''' | The reliability equivalent of information from progeny phenotypes, all of which were included in the full data but not in the reduced data for the test bulls, is used as the regression weight in both models. The progeny information, expressed as an EDC, is first derived from genomic reliabilities based on full (GRELf) and reduced data sets (GRELr), as shown below, and the EDC are then converted back to a reliability equivalent as the bull's regression weight (WT): |
Line 137: | Line 143: |
'''A Bootstrapping approach has been implemented to replace the previous t-test for bias in validation slopes, addressing technical concerns that the t-test was not valid, because validation bulls are genetically related, and the validation model residuals are correlated. ''' | {{attachment:EDC_formula.png}} |
Line 139: | Line 145: |
'''The overall validation result, which combines results from either a PASS or FAIL across several sub-tests, will present the following value: PASS, hiSE (i.e. high Standard Error) or FAIL. An overall PASS requires a PASS for the different slope tests plus either a PASS or hiSE for the accuracy test. A result of fail for either the combination of different slope tests or the accuracy test causes an overall FAIL. The new reporting of hiSE indicates too little data to conclusively prove PASS or FAIL in some traits and populations. ''' | The constant λ is a function of the trait heritability but using any value for λ in the pair of equations above will result in the same WT, so these equations can be simplified by substituting λ=1 in both equations. The WT is thus a function of only GRELf and GRELr. |
Line 141: | Line 147: |
* Validation regression models''' ''' | * '''Effects of selective genotyping ''' |
Line 143: | Line 149: |
'''Weighted linear regression models are used to test for bias in both the national genomic and the conventional evaluations, respectively. To pass the official Interbull GEBV test, however, requires only that the GEBVr are unbiased, and not the EBVr. The test for bias in EBVr is provided as comparative and additional information only. ''' | The estimated regression coefficients, b,,1,, and b,,3,, from the two validation models, are compared with expected values to test H,,0,,: b,,1,, = E(b,,1,,) for GEBVr and H,,0,,: b,,3,, = E(b,,3,,) for EBVr. The expected values are equal to 1 for both models if all bulls most recently progeny-proven were also genotyped. The expected values will be lower than 1, however, if only a subset of bulls were genotyped, and the genotyped bulls were non-randomly selected with respect to the given trait. The software includes adjustment for the effects of selective genotyping on E(b,,1,,) and E(b,,3,,). |
Line 145: | Line 151: |
'''We first define a validation target variable ? that resembles phenotypic progeny averages, and which is based on the progeny contributions in current GEBVf of the validation test bulls. All progeny contributions for the test bulls were from the most recent 4-year period, and contributed to GEBVf and EBVf, but not GEBVr and EBVr. The validation regression models are: ''' | The first step in deriving the adjustments is to estimate selection differentials for each validated trait. Selection differential is the standardized difference in means between genotyped bulls (g) versus (all) progeny-proven bulls who otherwise qualify as members of the validation test group. |
Line 147: | Line 153: |
. '''φ '''= b,,0,, + b,,1,,*GEBVr + e,,1,, '''[1]''' ''' ''' '''φ''' = b,,2,, + b,,3,,*EBVr + e,,2,, '''[2]''' ''' ''' |
. i = (µ,,EBVg,, - µ,,EBVall,,)/ σ,,EBVall ,,'''[3]''' |
Line 150: | Line 155: |
'''As discussed in the previous section, the validation target φ in the official GEBV test is defined as the dGEBV of !VanRaden (2021). The validation test bulls for both models must meet the following criteria: EDCf ≥ 20 and ''' | Using normal distribution tables from quantitative genetics books (e.g. page 379 from Falconer, D. S. & Mackay, T. F. C. ''Introduction to Quantitative Genetics'', Longman, 4^th^ ed. 1996) the proportion selected by truncation (''p'') to generate an equivalent selection differential as the observed ''i'', and the corresponding truncation point ''x'' that divides the standard normal density into the selected (''p)'' and non-selected (''1-p'') proportions, can be obtained. |
Line 152: | Line 157: |
'''EDCr = 0, born within a pre-defined range of birth years, such as (YYYY-8) to (YYYY-4) inclusive, where YYYY is the current year of evaluation, and having both an EBVr and a GEBVr available. All validation test bulls with an observation in φ are therefore genotyped, and the most recently progeny proven'''. ''' ''' | From the equivalent proportion under truncation selection, the expected values of regression coefficients can be approximated using expected effects of truncation selection on the variances and covariance between φ and the independent variable X, where X is either GEBVr or EBVr. Denoting all variables after selection on φ with a superscript '''s''', defining ''R^2^,,b,,'' = ''R^2^,,(φ,X),,'' before selection, and following Bulmer (1971) and Henderson (1975): |
Line 154: | Line 159: |
'''The reliability equivalent of information from progeny phenotypes, all of which were included in the full data but not in the reduced data for the test bulls, is used as the regression weight in both models. The progeny information, expressed as an EDC, is first derived from genomic reliabilities based on full (GRELf) and reduced data sets (GRELr), as shown below, and the EDC are then converted back to a reliability equivalent as the bull's regression weight (WT): ''' | {{attachment:formula_4.png||height="168",width="401"}} |
Line 156: | Line 161: |
ADD FORMULA EDC and WEIGHT ''' ''' | From the expected ''C^s^(φ,X) and ''V^s^(X) ''after selection, we get the expected b,,1,, after selection: '' |
Line 158: | Line 163: |
'''The constant '''DELTA''' is a function of the trait heritability but using any value for ? in the pair of equations above will result in the same WT, so these equations can be simplified by substituting '''DELTA'''=1 in both equations. The WT is thus a function of only GRELf and GRELr. ''' | ''E^s^(b,,1,,) = ((1 -'' k)'' / ''(''1-''k*R^2^,,b,,)) * ''E(b,,1,,),, ,, '''[5]''' '' |
Line 160: | Line 165: |
* Effects of selective genotyping ''' ''' | ''From the observed R^2^,,φ,x ,, after selection, with the following expected value, we can derive the required ''R^2^,,b,,'' in equation '''[5]''' as follows: '' |
Line 162: | Line 167: |
The estimated regression coefficients, b,,1,, and b,,3,, from the two validation models, are compared with expected values to test H,,0,,: b,,1,, = E(b,,1,,) for GEBVr and H,,0,,: b,,3,, = E(b,,3,,) for EBVr. The expected values are equal to 1 for both models if all bulls most recently progeny-proven were also genotyped. The expected values will be lower than 1, however, if only a subset of bulls were genotyped, and the genotyped bulls were non-randomly selected with respect to the given trait. The software includes adjustment for the effects of selective genotyping on E(b,,1,,) and E(b,,3,,). ''' ''' | '' {{attachment:formula_6.png||height="158",width="401"}} '' |
Line 164: | Line 169: |
'''The first step in deriving the adjustments is to estimate selection differentials for each validated trait. Selection differential is the standardized difference in means between genotyped bulls (g) versus (all) progeny-proven bulls who otherwise qualify as members of the validation test group. ''' | ''Substituting '''[6]''' into '''[5]''' and simplifying, we get expected slope as a function of observed R^2^,,φ,x ,, '' |
Line 166: | Line 171: |
. ''i ''= (µ,,EBVg,, - µ,,EBVall,,)/ '''σ''',,EBVall ,,'''[3]''' ''' ''' | . '' {{attachment:formula_7.png||height="155",width="536"}} '' |
Line 168: | Line 173: |
'''Using normal distribution tables from quantitative genetics books (e.g. page 379 from Falconer, D. S. & Mackay, T. F. C. ''Introduction to Quantitative Genetics'', Longman, 4^th^ ed. 1996) the proportion selected by truncation (''p'') to generate an equivalent selection differential as the observed ''i'', and the corresponding truncation point ''x'' that divides the standard normal density into the selected (''p)'' and non-selected (''1-p'') proportions, can be obtained. ''' | '''''Example:''' Let µ,,EBVg,, = 16.00, µ,,EBVall,, = 11.76, σ,,EBVall,, = 10.00, R^2^,,φ,x ,,= 0.50. Using equation [3], the selection differential (''i'') for genotyped bulls equals 0.424. For this value of ''i'', the equivalent proportion by truncation selection for genotyping (''p'') would be 0.75 and the mean deviation of the truncation point from the overall mean (''x'') would be -0.674 (from reference table). From equation [4] we get k=0.466, and from [6] then [5] we get R,,b ,,^2^ = 0.652 and then E^s^ (b,,1,,) = 0.767, or directly from [7] we also get E^s^ (b,,1,,) = 0.767. '' |
Line 170: | Line 175: |
'''From the equivalent proportion under truncation selection, the expected values of regression coefficients can be approximated using expected effects of truncation selection on the variances and covariance between ? and the independent variable X, where X is either GEBVr or EBVr. Denoting all variables after selection on ? with a superscript ''s'', defining ''Rb^2'' = ''R_(?,X)^2'' before selection, and following Bulmer (1971) and Henderson (1975): ''' | '''''Table 1 -''' Examples of expected regression coefficients (E(b,,1,,)) as functions of the selection intensity (''i'') and the coefficient of determination after selection (R^2^,,φ,x ,,).'' |
Line 172: | Line 177: |
''k = i * (i - x) '' '''[4] ''' | {{attachment:example_expected_regression.png}} |
Line 174: | Line 179: |
'''ADD FORMULA V, C''' ''' ''' | * '''Testing for accuracy improvement with genomics ''''' '' |
Line 176: | Line 181: |
From the expected ''C^s^(''?,X) and ''V^s^(X) ''after selection, we get the expected b,,1,, after selection: ''' ''' | The improvement in prediction of daughter performance due to the addition of genomic information (i.e. genotyping) is tested by bootstrapping the difference in validation model R^2^ for models [1] - [2]. A positive difference (P<.05) indicates a significance increase in accuracy with GEBV and therefore results in a Pass. A negative difference (P<.05) results in a Fail, and a non-significant difference (P>.05) indicates that data were insufficient to conclude either way, which therefore results in a designation of hiSE (high standard error). A Pass or hiSE result is required as one part of the overall requirements to PASS the official GEBV test. |
Line 178: | Line 183: |
E^s^(b,,1,,) = ((1 -'' k)'' / ''(''1-''k*R^2^,,b,,)) * ''E(b,,1,,),, ,, '''[5]''' ''' ''' | == Description of National Genomic Evaluations == National Genetic Centres shall provide a description of their national genomic evaluations to Interbull Centre, for now by using the GENO forms, and in the near future, by electronic forms within the PREP database that will be replacing GENO forms. |
Line 180: | Line 186: |
From the observed R^2^ ,,?,x ,, after selection, with the following expected value, we can derive the required ''R^2^,,b,,'' in equation '''[5]''' as follows: ''' ''' | Updated descriptions shall be provided each time changes to the national genomic evaluations are introduced. |
Line 182: | Line 188: |
ADD FORMULA ''R^2^'',, ?,x ,, and ''R^2^,,b,,'' ''' ''' | == References == Sullivan, P.G. 2023. Updated Interbull software for genomic validation tests. Interbull Bulletin 58, p.7-16. |
Line 184: | Line 191: |
Substituting '''[6]''' into '''[5]''' and simplifying, we get expected slope as a function of observed ''R^2^'',, ?,x :,, ''' ''' | !VanRaden, P.M. 2021. Improved genomic validation including extra regressions. Interbull bulletin 56: 65-69. |
Line 186: | Line 193: |
ADD FORMULA E^s^(b1) ''' ''' | Mäntysaari, E., Liu, Z and !VanRaden P. 2010. Interbull Validation Test for Genomic Evaluations. Interbull Bulletin 41, p. 17-21. |
Line 188: | Line 195: |
which can also be written: ''' ''' | Bulmer, M.G. 1971. The effect of selection on genetic variability. American Nat. 105:201. |
Line 190: | Line 197: |
ADD FORMULA E^s^(b1) ''' ''' | Henderson, C.R. 1975. Best Linear Unbiased estimation and prediction under a selection model. Biometrics 31:423-447. |
Line 192: | Line 199: |
'''Example:''' Let µ,,EBVg,, = 16.00, µ,,EBVall,, = 11.76, ?,,EBVall,, = 10.00, ''R^2^'',, ?,x ,,= 0.50. Using equation [3], the selection differential (''i'') for genotyped bulls equals 0.424. For this value of ''i'', the equivalent proportion by truncation selection for genotyping (''p'') would be 0.75 and the mean deviation of the truncation point from the overall mean (''x'') would be -0.674 (from reference table). From equation [4] we get k=0.466, and from [6] then [5] we get R,,b,,^2^ = 0.652 and then E^s^ (b,,1,,) = 0.767, or directly from [7] we also get E^s^ (b,,1,,) = 0.767. ''' ''' '''Table 1 -''' Examples of expected regression coefficients (E(b,,1,,)) as functions of the selection intensity (''i'') and the coefficient of determination after selection (''R_(?,X)^2''). || || || ||<#FFD900 style="text-align:center;vertical-align:bottom">''R^2^'',, ?,x,, ||<#FFD900 style="text-align:center;vertical-align:bottom">0.30 ||<#FFD900 style="text-align:center;vertical-align:bottom">0.40 ||<#FFD900 style="text-align:center;vertical-align:bottom">0.50 ||<#FFD900 style="text-align:center;vertical-align:bottom">0.60 ||<#FFD900 style="text-align:center;vertical-align:bottom">0.70 || ||<style="text-align:center;vertical-align:bottom">''i'' ||<style="text-align:center;vertical-align:bottom">''p'' ||<style="text-align:center;vertical-align:bottom">''x'' ||<#FFD900 style="text-align:center;vertical-align:bottom">''k=i*(i-x)'' ||<style="text-align:center;vertical-align:bottom">E(b,,1,,)=1-(1''- R^2^'',, ?,x,,'')*k'' || ||<style="text-align:center;vertical-align:bottom">0.800 ||<style="text-align:center;vertical-align:bottom">50 ||<style="text-align:center;vertical-align:bottom">0.000 ||<#FFD900 style="text-align:center;vertical-align:bottom">0.640 ||<style="text-align:center;vertical-align:top">0.552 ||<style="text-align:center;vertical-align:top">0.616 ||<style="text-align:center;vertical-align:top">0.680 ||<style="text-align:center;vertical-align:top">0.744 ||<style="text-align:center;vertical-align:top">0.808 || ||<style="text-align:center;vertical-align:bottom">0.424 ||<style="text-align:center;vertical-align:bottom">75 ||<style="text-align:center;vertical-align:bottom">-0.674 ||<#FFD900 style="text-align:center;vertical-align:bottom">0.466 ||<style="text-align:center;vertical-align:top">0.674 ||<style="text-align:center;vertical-align:top">0.721 ||<#FFD900 style="text-align:center;vertical-align:top">0.767 ||<style="text-align:center;vertical-align:top">0.814 ||<style="text-align:center;vertical-align:top">0.860 || ||<style="text-align:center;vertical-align:bottom">0.350 ||<style="text-align:center;vertical-align:bottom">80 ||<style="text-align:center;vertical-align:bottom">-0.842 ||<#FFD900 style="text-align:center;vertical-align:bottom">0.417 ||<style="text-align:center;vertical-align:top">0.708 ||<style="text-align:center;vertical-align:top">0.750 ||<style="text-align:center;vertical-align:top">0.791 ||<style="text-align:center;vertical-align:top">0.833 ||<style="text-align:center;vertical-align:top">0.875 || ||<style="text-align:center;vertical-align:bottom">0.274 ||<style="text-align:center;vertical-align:bottom">85 ||<style="text-align:center;vertical-align:bottom">-1.036 ||<#FFD900 style="text-align:center;vertical-align:bottom">0.359 ||<style="text-align:center;vertical-align:top">0.749 ||<style="text-align:center;vertical-align:top">0.785 ||<style="text-align:center;vertical-align:top">0.821 ||<style="text-align:center;vertical-align:top">0.856 ||<style="text-align:center;vertical-align:top">0.892 || ||<style="text-align:center;vertical-align:bottom">0.195 ||<style="text-align:center;vertical-align:bottom">90 ||<style="text-align:center;vertical-align:bottom">-1.282 ||<#FFD900 style="text-align:center;vertical-align:bottom">0.288 ||<style="text-align:center;vertical-align:top">0.798 ||<style="text-align:center;vertical-align:top">0.827 ||<style="text-align:center;vertical-align:top">0.856 ||<style="text-align:center;vertical-align:top">0.885 ||<style="text-align:center;vertical-align:top">0.914 || ||<style="text-align:center;vertical-align:bottom">0.109 ||<style="text-align:center;vertical-align:bottom">95 ||<style="text-align:center;vertical-align:bottom">-1.645 ||<#FFD900 style="text-align:center;vertical-align:bottom">0.191 ||<style="text-align:center;vertical-align:top">0.866 ||<style="text-align:center;vertical-align:top">0.885 ||<style="text-align:center;vertical-align:top">0.904 ||<style="text-align:center;vertical-align:top">0.924 ||<style="text-align:center;vertical-align:top">0.943 || ||<style="text-align:center;vertical-align:bottom">0.000 ||<style="text-align:center;vertical-align:bottom">100 ||<style="text-align:center;vertical-align:bottom">- ||<#FFD900 style="text-align:center;vertical-align:bottom">0.000 ||<style="text-align:center;vertical-align:top">1.000 ||<style="text-align:center;vertical-align:top">1.000 ||<style="text-align:center;vertical-align:top">1.000 ||<style="text-align:center;vertical-align:top">1.000 ||<style="text-align:center;vertical-align:top">1.000 || ''' ''' * Testing for accuracy improvement with genomics ''' ''' '''The improvement in prediction of daughter performance due to the addition of genomic information (i.e. genotyping) is tested by bootstrapping the difference in validation model R^2^ for models [1] - [2]. A positive difference (P<.05) indicates a significance increase in accuracy with GEBV and therefore results in a Pass. A negative difference (P<.05) results in a Fail, and a non-significant difference (P>.05) indicates that data were insufficient to conclude either way, which therefore results in a designation of hiSE (high standard error). A Pass or hiSE result is required as one part of the overall requirements to PASS the official GEBV test. ''' * Description of National Genomic Evaluations ''' ''' '''National Genetic Centres shall provide a description of their national genomic evaluations to Interbull Centre, for now by using the GENO forms, and in the near future, by electronic forms within the PREP database that will be replacing GENO forms. ''' '''Updated descriptions shall be provided each time changes to the national genomic evaluations are introduced. ''' * References ''' ''' '''Sullivan, P.G. 2023. Updated Interbull software for genomic validation tests. Interbull Bulletin 58, p.7-16. ''' '''!VanRaden''', '''P.M. 2021. Improved genomic validation including extra regressions. Interbull bulletin 56: 65-69. ''' '''Mäntysaari, E., Liu, Z and''' !'''VanRaden''' '''P. 2010. Interbull Validation Test for Genomic Evaluations. Interbull Bulletin 41, p. 17-21. ''' '''Bulmer, M.G. 1971. The effect of selection on genetic variability. American Nat. 105:201. ''' '''Henderson, C.R. 1975. Best Linear Unbiased estimation and prediction under a selection model. Biometrics 31:423-447. ''' '''Falconer, D. S. & Mackay, T. F. C. Introduction to Quantitative Genetics, Longman, 4^th^ ed. 1996 ''' |
Falconer, D. S. & Mackay, T. F. C. Introduction to Quantitative Genetics, Longman, 4^th^ ed. 1996 |
Interbull CoP - Appendix VIII - Interbull validation test for genomic evaluations - GEBV test
Document based on:
Sullivan, P.G. 2023. Updated Interbull software for genomic validation tests. Interbull Bulletin 58, p.7-16.
VanRaden, P.M. 2021. Improved genomic validation including extra regressions. Interbull bulletin 56: 65-69.
Mäntysaari, E., Liu, Z and VanRaden P. 2010. Interbull Validation Test for Genomic Evaluations. Interbull Bulletin 41, p. 17-21.
Definitions:
- EBV - Estimated Breeding Value (conventional national evaluations of the trait, free of genomic information, which are submitted to Interbull to be used in MACE evaluations)
- DGV - Direct Estimated Genomic Value (genomic evaluations based on SNP prediction equations)
- GEBV - Genomically Enhanced Estimated Breeding Value (evaluations that combine EBV and DGV)
- dGEBV - De-regressed GEBV
- GREL - Genomic reliability of the bull's GEBV
- EDC - Effective Daughter Contribution
- MACE - Multiple Trait Across Country Evaluation
- PA - Parent Average
- DD - Daughter Deviation
- NGEC - National Genetic Evaluation Centre
Motivation
The inclusion of national genomic information in international comparisons for dairy breeds requires that the national genomic breeding values (GEBV) get validated by Interbull in a similar fashion that conventional EBV are validated as a pre-condition to participate in the MACE evaluations.
The GEBV test will be applied to validate national models used to compute GEBV that the national genetic evaluation centers (NGEC) publish and will eventually submit to Interbull for international genetic evaluations including genomic information. The GEBV test can also be considered a quality assurance assessment for national genomic evaluations. GEBV from models that have been tested can be referred to as breeding value estimates with appropriate reliability, and which can be converted to other country scale breeding values using conversion equations derived by Interbull.
Rationale
The GEBV test evaluates:
- the unbiasedness of the genomic evaluations through the evaluation of
- the consistency of the genetic trend captured by GEBV,
- the consistency of bull rankings before versus after having progeny, and
- the consistency of the variation of GEBV relative to EBV;
- the improvement in selection accuracy from the use of GEBV instead of EBV.
A time-oriented cross-validation is used to test how well genomic evaluations of young bull calves, using current models and phenotypic data from 4 years ago, can predict current progeny performance. The NGEC shall re-run their current evaluation software while excluding the most recent 4 years of daughter phenotypes, to obtain reduced-data genetic (EBVr) and genomic (GEBVr) evaluations. The software will then test if the ranking and variance of bull GEBVr match statistical and genetic expectations relative to ranking and variance of the bull comparisons based on current progeny differences, as an indication of unbiasedness. Furthermore, if the GEBVr are more highly correlated than EBVr with the current progeny phenotypes, it is an indication of accuracy improvement with GEBV.
Linear regression models are used for the validation test, where the expected value of regression slopes equals 1 if validation bulls are an unselected group, and a value less than 1 if only a selected subgroup of the most recent proven bulls have been genotyped. The expected slope is lower with selective genotyping due to effects of selection on variances and covariances used to compute the validation slope. The software will account for effects of selective genotyping on expected slopes, using estimates of selection differential from the differences between average EBV of the genotyped bulls versus all proven bulls born in the period considered for validation testing. Bootstrapping is used for all significance testing, and a combination of statistical and biological limits of tolerance is used by Interbull to assign an overall assessment of pass or fail.
Test data sets
Data formats are described at https://interbull.org/ib/gebvtest_software_2024
Full data sets
Two sets of currently official evaluations for progeny-proven bulls shall be provided for the GEBV test. These will be the EBV and GEBV published or otherwise indirectly used by the NGEC for national selection programs. All bulls provided to Interbull in file300 for MACE shall be included in a conventional EBV file (file300Cf) for the GEBV test, and all these same bulls who are genotyped and have a national GEBV shall be included in the GEBV file (file300Gf).
Conventional national genetic evaluation file (file300Cf)
The national EBV sent by the NGEC as input for the most recent Interbull MACE evaluation will be used to identify validation test candidate bulls, estimate the intensity of selective genotyping, and check bulls birth year and type of proof.
Official national genomic evaluation file (file300Gf)
The national GEBV of current MACE bulls will be used to derive target values reflecting unbiased estimates of average progeny performance for the validation test bulls. The official validation target is derived internally by the software, based on the consistent application for all NGEC of a standardized international method for dGEBV developed by VanRaden(2021).
Reduced data sets
The reduced data sets should be prepared by truncating the phenotypes used as input for both the conventional and the genomic evaluations. The NGEC must exclude phenotypic information from the most recent 4 years and re-run the current models of genetic and genomic evaluation for the traits of interest. The pedigree should not be truncated, just the phenotypes, because each validation bull's predicted genetic contributions in future progeny, based solely on the bull's parent average (EBVr=PA) and on PA plus genomic prediction equations (GEBVr) from the reduced-data evaluations will be needed for the validation test.
Reduced conventional genetic evaluation file (file300Cr)
The NGEC shall carry out a conventional genetic evaluation with no genotypes, while using the truncated data (only phenotypes up to 4 years prior to the date of analysis) but including in the analysis all animals present in the current official evaluations used in MACE (file300Cf).
A minimum of 10 most recent birth years of proven bulls included in file300Cf must also be included in file300Cr. The older proven bulls, with progeny proofs already in the reduced data, are required as a comparative control group, to contrast evaluation changes for younger bulls in the validation test group relative to the older control bulls.
Reduced genomic evaluation file (file300Gr)
The NGEC shall carry out a genomic evaluation that includes the genotypes, while using the truncated data (only phenotypes up to 4 years prior to the date of analysis) but including in the analysis all animals present in the current official evaluations used as input to MACE (file300Cf). All bulls included in the conventional file300Cr who are also genomically evaluated must be included in the genomic file300Gr.
If a significant number of foreign bulls are included in the reference population for national genomic evaluations, and estimations of genomic prediction equations use de-regressed MACE values for these bulls as input, the reduced genomic evaluation can be achieved in three ways, listed by descending order of preference below:
- The NGEC can participate in the Interbull truncated-MACE service. By providing reduced-data national EBV to Interbull for truncated MACE, the results returned by Interbull will be the ideal MACE input for reduced-data national genomic evaluation.
- The Interbull Centre can make historical files available upon request, which shall include the official MACE results published 4 years earlier. These MACE proofs will be less ideal than truncated MACE proofs, because current evaluation systems were not re-run with older data by any country for the MACE proofs already computed 4 years earlier.
- The current MACE proofs can be used by excluding all recently proven bulls in MACE who did not have an official MACE proof 4 years earlier. This approach is an exception that should only be used if both preferred options above are impractical. The main concern with this approach is that reduced-data genomic prediction equations will include contributions from phenotypes in most recent 4 years, through sires of the recently proven bulls, and more generally through MACE proofs of all older bull with any relationship to the validation test bulls whose MACE proofs are being excluded.
Specific instructions for data preparation:
- The domestic bulls (type of proof ≠ 21 or 22) that have EDCf ≥ 20 and EDCr = 0 are called test bulls. Test bulls are likely to be included in the genomic reference population with full data, but not with reduced data. Interbull recommends that the reduction in size of the genomic reference population, due to the dropping of test bulls in reduced data, should not exceed 25%
If the size of genomic reference population is reduced by too much, then the accuracy of GEBV calculated from truncated data becomes significantly lower than with full data. In that case, the country can use n<4 years as the time difference between full and reduced data sets.
If the number of test bulls is too small (<50), then the country may choose to also include foreign bulls that have been used locally (type of proof = 21 or 22) with EDCf ≥ 20 local progeny and EDCr = 0 as part of the validation group, to increase the number of test bulls.
- In both exceptions above, the criteria used to define test bulls must be communicated to the Interbull Centre.
- Appropriate time windows (birth years of test bulls) may vary depending on the trait to be validated, the speed of progeny test programs and other factors. The standard adopted for the GEBV test is to include progeny-proven bulls born since (YYYY-8) as test bulls. For instance, if the evaluation year is 2024 and the most recently proven bulls in file300Cf were born in 2020, then the test bulls would include bulls born between 2016 and 2020. Countries may include a wider window of test bulls, or may shift the window by one year, but the reasons must always be communicated to the Interbull Centre.
- Include all available bulls of interest, as described below, in the respective files with their EBVf, EBVr, GEBVf and GEBVr, without editing based on EDCf or EDCr. These final edits, as required for the validation test, are applied within the GEBV test software.
- If the GEBV are a combination of DGV and EBV, then both the DGVr and EBVr used to generate the GEBVr must be estimated from the truncated data.
- Bulls with EBV in the full data sets only, having no progeny information four years ago (EDCr=0), should be included in the reduced-data files (300Cr and 300Gr). Additionally, a minimum of 10 years of bulls with progeny-based EBV in both the full and reduced data sets should be included in the reduced-data files. After recent updates to the software, bulls with progeny in the reduced data are now additionally required as a statistical control group used to improve statistical tests for bias in the evaluations of validation test bulls.
Test description
Testing for bias in the GEBV
Methodology updates in 2024
The official Interbull GEBV test is now based on VanRaden's de-regressed GEBV (described in the 2021 Interbull bulletin paper, https://journal.interbull.org/index.php/ib/article/view/82) as the official prediction target. The VanRaden dGEBV replaces the previously used dEBV target described by Mantysaari et al (2010). Predicting later GEBV or dGEBV from earlier GEBV is conceptually easier to understand and to verify than predictions of dEBV. The new tests are also more suitable for validating single-step models, where genomic preselection effects are properly accounted in GEBVf, GEBVr and dGEBV, whereas dEBV include genomic preselection bias.
With the implementation of a new validation target, the de-regression method has now been internationally standardized because the dGEBV are derived directly from values based on official publication rules, in file300Gf and file300Gr files, in the same way for all countries.
The software will make sure that full and reduced-data evaluations are on the same genetic base of expression by adjusting the mean and variance of reduced-data evaluations to match the base of expression of full-data evaluations. These adjustments to align the evaluation scales are based on bulls already progeny-proven in the reduced data who have expected changes in evaluations very close or equal to zero, due to either no new progeny or relatively few in the recent data. After aligning the evaluation scales, changes in evaluations for the validation test bulls, who have all their progeny in the recent data, are equivalent to contrasts of evaluation changes for validation bulls relative to previous generations of proven bulls who have expected changes of 0.
Average changes in evaluation between reduced and full data will now have an expectation of 0 for any group of bulls, after the scales are aligned. Additional tests have been added, which account for both the combination of intercept and slope estimates from the validation models, to detect probabilities of bias in below-average versus average versus above-average (top) bulls. A new user option to output base-adjusted evaluations from reduced data, for all or selected traits, can also be used to help isolate reasons for detected biases in the evaluations of any traits failing the GEBV test.
Besides the application of the official GEBV test, the software also allows users to choose different validation targets for further internal research. Below is the list of available options:
- file300Df_COUBRD (de-regressed EBV, as used previously in the old test)
- file300Cf_COUBRD (EBV from the full-data evaluation)
- file300Gf_COUBRD (GEBV from the full-data evaluation)
- file300Vf_COUBRD (Any user-defined value, e.g. single step DD, new file)
The user must create whichever input file(s) above are needed for the requested validation target.
A Bootstrapping approach has been implemented to replace the previous t-test for bias in validation slopes, addressing technical concerns that the t-test was not valid, because validation bulls are genetically related, and the validation model residuals are correlated.
The overall validation result, which combines results from either a PASS or FAIL across several sub-tests, will present the following value: PASS, hiSE (i.e. high Standard Error) or FAIL. An overall PASS requires a PASS for the different slope tests plus either a PASS or hiSE for the accuracy test. A result of fail for either the combination of different slope tests or the accuracy test causes an overall FAIL. The new reporting of hiSE indicates too little data to conclusively prove PASS or FAIL in some traits and populations.
Validation regression models
Weighted linear regression models are used to test for bias in both the national genomic and the conventional evaluations, respectively. To pass the official Interbull GEBV test, however, requires only that the GEBVr are unbiased, and not the EBVr. The test for bias in EBVr is provided as comparative and additional information only.
We first define a validation target variable φ that resembles phenotypic progeny averages, and which is based on the progeny contributions in current GEBVf of the validation test bulls. All progeny contributions for the test bulls were from the most recent 4-year period, and contributed to GEBVf and EBVf, but not GEBVr and EBVr. The validation regression models are:
As discussed in the previous section, the validation target φ in the official GEBV test is defined as the dGEBV of VanRaden (2021). The validation test bulls for both models must meet the following criteria: EDCf ≥ 20 and EDCr = 0, born within a pre-defined range of birth years, such as (YYYY-8) to (YYYY-4) inclusive, where YYYY is the current year of evaluation, and having both an EBVr and a GEBVr available. All validation test bulls with an observation in φ are therefore genotyped, and the most recently progeny proven.
The reliability equivalent of information from progeny phenotypes, all of which were included in the full data but not in the reduced data for the test bulls, is used as the regression weight in both models. The progeny information, expressed as an EDC, is first derived from genomic reliabilities based on full (GRELf) and reduced data sets (GRELr), as shown below, and the EDC are then converted back to a reliability equivalent as the bull's regression weight (WT):
The constant λ is a function of the trait heritability but using any value for λ in the pair of equations above will result in the same WT, so these equations can be simplified by substituting λ=1 in both equations. The WT is thus a function of only GRELf and GRELr.
Effects of selective genotyping
The estimated regression coefficients, b1 and b3 from the two validation models, are compared with expected values to test H0: b1 = E(b1) for GEBVr and H0: b3 = E(b3) for EBVr. The expected values are equal to 1 for both models if all bulls most recently progeny-proven were also genotyped. The expected values will be lower than 1, however, if only a subset of bulls were genotyped, and the genotyped bulls were non-randomly selected with respect to the given trait. The software includes adjustment for the effects of selective genotyping on E(b1) and E(b3).
The first step in deriving the adjustments is to estimate selection differentials for each validated trait. Selection differential is the standardized difference in means between genotyped bulls (g) versus (all) progeny-proven bulls who otherwise qualify as members of the validation test group.
i = (µEBVg - µEBVall)/ σEBVall [3]
Using normal distribution tables from quantitative genetics books (e.g. page 379 from Falconer, D. S. & Mackay, T. F. C. Introduction to Quantitative Genetics, Longman, 4th ed. 1996) the proportion selected by truncation (p) to generate an equivalent selection differential as the observed i, and the corresponding truncation point x that divides the standard normal density into the selected (p) and non-selected (1-p) proportions, can be obtained.
From the equivalent proportion under truncation selection, the expected values of regression coefficients can be approximated using expected effects of truncation selection on the variances and covariance between φ and the independent variable X, where X is either GEBVr or EBVr. Denoting all variables after selection on φ with a superscript s, defining R2b = R2(φ,X) before selection, and following Bulmer (1971) and Henderson (1975):
From the expected Cs(φ,X) and Vs(X) after selection, we get the expected b1 after selection:
Es(b1) = ((1 - k) / (1-k*R2b)) * E(b1) [5]
From the observed R2φ,x after selection, with the following expected value, we can derive the required R2b in equation [5] as follows:
Substituting [6] into [5] and simplifying, we get expected slope as a function of observed R2φ,x
Example: Let µEBVg = 16.00, µEBVall = 11.76, σEBVall = 10.00, R2φ,x = 0.50. Using equation [3], the selection differential (i) for genotyped bulls equals 0.424. For this value of i, the equivalent proportion by truncation selection for genotyping (p) would be 0.75 and the mean deviation of the truncation point from the overall mean (x) would be -0.674 (from reference table). From equation [4] we get k=0.466, and from [6] then [5] we get Rb 2 = 0.652 and then Es (b1) = 0.767, or directly from [7] we also get Es (b1) = 0.767.
Table 1 - Examples of expected regression coefficients (E(b1)) as functions of the selection intensity (i) and the coefficient of determination after selection (R2φ,x ).
Testing for accuracy improvement with genomics
The improvement in prediction of daughter performance due to the addition of genomic information (i.e. genotyping) is tested by bootstrapping the difference in validation model R2 for models [1] - [2]. A positive difference (P<.05) indicates a significance increase in accuracy with GEBV and therefore results in a Pass. A negative difference (P<.05) results in a Fail, and a non-significant difference (P>.05) indicates that data were insufficient to conclude either way, which therefore results in a designation of hiSE (high standard error). A Pass or hiSE result is required as one part of the overall requirements to PASS the official GEBV test.
Description of National Genomic Evaluations
National Genetic Centres shall provide a description of their national genomic evaluations to Interbull Centre, for now by using the GENO forms, and in the near future, by electronic forms within the PREP database that will be replacing GENO forms.
Updated descriptions shall be provided each time changes to the national genomic evaluations are introduced.
References
Sullivan, P.G. 2023. Updated Interbull software for genomic validation tests. Interbull Bulletin 58, p.7-16.
VanRaden, P.M. 2021. Improved genomic validation including extra regressions. Interbull bulletin 56: 65-69.
Mäntysaari, E., Liu, Z and VanRaden P. 2010. Interbull Validation Test for Genomic Evaluations. Interbull Bulletin 41, p. 17-21.
Bulmer, M.G. 1971. The effect of selection on genetic variability. American Nat. 105:201.
Henderson, C.R. 1975. Best Linear Unbiased estimation and prediction under a selection model. Biometrics 31:423-447.
Falconer, D. S. & Mackay, T. F. C. Introduction to Quantitative Genetics, Longman, 4th ed. 1996