Differences between revisions 1 and 13 (spanning 12 versions)
Revision 1 as of 2019-03-28 16:15:35
Size: 4991
Editor: JanErik
Comment:
Revision 13 as of 2019-04-04 10:06:51
Size: 8180
Editor: Valentina
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
{{attachment:ibc_logo.jpg}}
Line 5: Line 6:
Interbull Centre is providing a convert program to help member organizations with a smooth transition from the old file formats
to the new XML based file format that will be used when uploading performance data to the Interbull Centre from now on:
Interbull Centre is providing a convert program to help member organizations with a smooth transition from the old file formats to the new XML based file format that will be used when uploading performance data to the Interbull Centre from now on:
Line 11: Line 11:
''Note that the support provided by the Interbull Centre of the program is time limited and ends at the end of 2019, but it will
still be available for download after that, just not with any support.''

''Note: the aim of the converting program is to facilitate participating countries in the transition from the old system (based on flat files) to the new one (in an XML format).
Interbull Centre will be happy to provide assistance and support with the program until the end of 2019, this in order to stimulate countries developing their own procedure on producing xml files.
(On this regard, please refer to a description of how the xml file for performance data should be constructed
[[https://wiki.interbull.org/public/public/FilePerformanceXML?action=print|here]]).<<BR>>
The program will still remain available for download but no support will be provided after the end of 2019.''
Line 16: Line 20:
   (If you have a choice, prefer Python3 over Python2)   . (If you have a choice, prefer Python3 over Python2)
Line 18: Line 22:
 . c. Copy the python program to the new directory  . c. Download the convert_performance program from [[https://idea.interbull.org/software]] into the new directory
Line 22: Line 26:
The shown commands assume that the working directory in point b above, is used as
current directory, i.e. start by calling `cd <name of directory>` (once) before
running the commands shown below.
The shown commands assume that the working directory in point b above, is used as current directory, i.e. start by calling `cd <name of directory>` (once) before running the commands shown below.

First off, lets look at a simple example of how to execute the program that is sufficient in most cases
(you will need to replace the filenames to whatever is suitable in your situation in order to successfully run this):
{{{
python3 convert_performance.py -o MyPerformanceFile.xml fileawwlim.cze para_aww_lim.cze et_aww_lim.cze 605awwDlim.cze
}}}
Line 28: Line 36:
python3 convert_performance.py [-t] [-v] [-o destfile] [-C | -N] 602file 603file [604file [D605file [M605file]]] python3 convert_performance.py [-t] [-v] [-o destfile] [-C | -N] 602file 603file 604file [D605file [M605file]]
Line 30: Line 38:
Line 33: Line 40:
All arguments with a name that ends in "file" are supposed to be a filename that
may be either relative (i.e. relative to the current directory) or absolute
(i.e. it starts with a slash)
All arguments with a name that ends in "file" are supposed to be a filename that may be either relative (i.e. relative to the current directory) or absolute (i.e. it starts with a slash)
The 604file (and the D605file) argument may refer to an empty file.
Line 37: Line 43:
The `-t` argument is used when no output is desired, i.e. only test the input
files for correctness.
The `-t` argument is used when no XML file is desired, i.e. only test the input files for correctness.
Line 40: Line 45:
The `-v` argument shows additional information that are sometimes useful, but for
normal usage it is recommended to '''not''' use it.
The `-v` argument shows additional information that are sometimes useful, but for normal usage it is recommended to '''not''' use it.
Line 43: Line 47:
If neither the `-t` argument nor the `-o` argument has been provided, the generated
XML data is put in a file in the current directory named `out.xml` (i.e. it
behaves as if `-o out.xml` was specified on the command line).
If neither the `-t` argument nor the `-o` argument has been provided, the generated XML data is put in a file in the current directory named `out.xml` (i.e. it behaves as if `-o out.xml` was specified on the command line).
Line 47: Line 49:
The program need to behave differently for traits that are calving or not, so a
mechanism (`-C` and `-N` arguments) has been provided to explicitly control which
behaviour is needed.
If neither of these arguments has been provided, the behaviour is selected based
on the name of the trait (that is retrieved from the 603 file).
The program need to behave differently for traits that are calving or not, so a mechanism (`-C` and `-N` arguments) has been provided to explicitly control which behaviour is needed. If neither of these arguments has been provided, the behaviour is selected based on the name of the trait (that is retrieved from the 603 file).
Line 53: Line 51:
Running this will show a brief description of how to run the program if you need
a reminder:
Running this will show a brief description of how to run the program if you need a reminder:
Line 58: Line 56:
=== Output From the Program ===
If no problems are discovered in the input files, the following message will be written:
Line 59: Line 59:
=== Output From the Program === {{{
Running convert_performance.py version 2019-04-01 v1.19
Line 61: Line 62:
If no problems are discovered in the input files, you should simply see this (or something very similar):
{{{
Running convert_performance.py version 2019-03-26 v1.17
Trait aww is not calving.
No problems detected.
Line 66: Line 64:
and an output file will be produced (unless `-t` is specified), either `out.xml` in the working directory or in the `destfile` specified.
Line 67: Line 66:
If problems are discovered, the above will be followed by one or more messages
each trying to describe what the problem is and where it was discovered.
If the number of problems discovered is too great, a message to that effect is
shown and the program terminates immediately.
If problems ''are'' discovered, the last line of the above message will be replaced by one or more messages each trying to describe what the problem is and where it was discovered.
If the number of problems discovered is too great, a message to that effect is shown and the program terminates immediately.
At some points in the execution, any previously detected problem will cause immediate termination.
No xml file will be created until all problems are fixed.

Here a list of the most common errors found in the data and the actions needed to correct them.
Line 74: Line 75:
Syntax of the 605 file, 605awwDlim.che(1), do not follow the specification
Data: 605 LIMFRAM008795002134 89 CHE.
RX : 605 ................... .. .. ...
Syntax of the 605 file, 605awwDlim.cou(1), do not follow the specification
Data    : 605 LIMFRAM008795002134 89 COU.
Expected: 605 ................... .. .. ...
Line 78: Line 79:

This message specifies that the 605awwDlim.che file (on line 1) has data that do
not follow the expected format of this type of file (i.e. 605 format files).<<BR>>
The "Data" line shows what is there, and the "RX" line shows what is expected to
be found.<<BR>>
This message specifies that the 605awwDlim.cou file (on line 1) has data that do not follow the expected format of this type of file (i.e. 605 format files).<<BR>>
The "Data" line shows what is there, and the "Expected" line shows what is expected to be found.<<BR>>
Line 87: Line 85:
Syntax of the 605 file, 605caeDlim.irl(1425), do not follow the specification
Data: 605 LIMFRAM001295161096 10010 IRL.
RX : 605 ................... .. .. ...
Syntax of the 605 file, 605caeDlim.cou(1425), do not follow the specification
Data    : 605 LIMFRAM001295161096 10010 COU.
Expected: 605 ................... .. .. ...
Line 91: Line 89:

This message specifies that the 605caeDlim.irl file (on line 1425) has data that
do not follow the expected format of this type of file (i.e. 605 format files).<<BR>>
This message specifies that the 605caeDlim.cou file (on line 1425) has data that do not follow the expected format of this type of file (i.e. 605 format files).<<BR>>
Line 98: Line 94:
The 602 file, filebwtlim.cze(1720), animal LIMCZEF000000104735 has inconsistent (illegal - discarded) name of calf: 'UUUUUUUUUUUUUUUUUUU' Syntax of the 602 file, fileawwlim.cou(1), do not follow the specification
Data : 602 AWW LIM COU LIMFRAF008609359309 1 0 COU000360047467 308 27 7992 2 1550 302500 220 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 100 0 0 0 0.
Expected: tail too short, is 548, expected 567
Line 100: Line 98:
This shows that the length of the records in the file fileawwlim.cou are shorter than expected given the overall number of effects reported (total number of effects*21)
Line 101: Line 100:
This shows that the file filebwtlim.cze, on line 1720, has the specification of
a calf with no useful name of it's own though the name of the dam is provided.
==== Example message 4 ====
{{{
The 602 file, filebwtlim.cou(1720), animal LIMCZEF000000104735 has inconsistent (illegal - discarded) name of calf: 'UUUUUUUUUUUUUUUUUUU'
}}}
This shows that the file filebwtlim.cou, on line 1720,
has the specification of a calf with no useful name of it's own though the name of the dam is provided.
Line 104: Line 106:
This message is not really a problem report but rather an informational message
and thus will not cause termination due to too many problems found.
This message is not really a problem report but rather an informational message and thus will not cause termination due to too many problems found.

==== Example message 5 ====
{{{
Duplicate: 602 CAE BSM COU BSMCZEF000002764617 1 0 CZE003200397411 1 9 1368 BSMDNKM000415240598 1 19990828 BSMCZEM000110893384 1
          43 1849 1999
Duplicate: 602 CAE BSM COU BSMCZEF000002764617 1 0 CZE003200397411 1 9 1368 BSMDNKM000415240598 1 19990828 BSMCZEM000110893384 1
          43 1849 1999
                   1
}}}
This shows presence of a duplicate record in a given file (in this case a 602 format file).
Duplicate records are ignored by the program so they do not really represent a problem per se, but a warning like this is issued.

ibc_logo.jpg

General information

Interbull Centre is providing a convert program to help member organizations with a smooth transition from the old file formats to the new XML based file format that will be used when uploading performance data to the Interbull Centre from now on:

  • convert_performance.py ==> converts a set of the old flat file (602, 603, 604 and 605) formats to the corresponding new XML based performance file

The program will be available for download in IDEA and can be run by an overall script if so desired.

Note: the aim of the converting program is to facilitate participating countries in the transition from the old system (based on flat files) to the new one (in an XML format). Interbull Centre will be happy to provide assistance and support with the program until the end of 2019, this in order to stimulate countries developing their own procedure on producing xml files. (On this regard, please refer to a description of how the xml file for performance data should be constructed here).
The program will still remain available for download but no support will be provided after the end of 2019.

Before Running the Program

  • a. Install Python (either Python3, version 3.4.8 or later, or Python2, version 2.7.5 or later) if necessary

    • (If you have a choice, prefer Python3 over Python2)
  • b. Create a working directory/folder
  • c. Download the convert_performance program from https://idea.interbull.org/software into the new directory

  • d. Copy the files you want to convert to the working directory (This is optional. The path to the files can also be written by the prompt as /path/filename)

Executing the Program

The shown commands assume that the working directory in point b above, is used as current directory, i.e. start by calling cd <name of directory> (once) before running the commands shown below.

First off, lets look at a simple example of how to execute the program that is sufficient in most cases (you will need to replace the filenames to whatever is suitable in your situation in order to successfully run this):

python3 convert_performance.py -o MyPerformanceFile.xml fileawwlim.cze para_aww_lim.cze et_aww_lim.cze 605awwDlim.cze

Synopsis of how to run the program:

python3 convert_performance.py [-t] [-v] [-o destfile] [-C | -N] 602file 603file 604file [D605file [M605file]]

Arguments within square brackets (i.e. []) are optional to specify.
Arguments are meant to be specified in the order indicated.
All arguments with a name that ends in "file" are supposed to be a filename that may be either relative (i.e. relative to the current directory) or absolute (i.e. it starts with a slash) The 604file (and the D605file) argument may refer to an empty file.

The -t argument is used when no XML file is desired, i.e. only test the input files for correctness.

The -v argument shows additional information that are sometimes useful, but for normal usage it is recommended to not use it.

If neither the -t argument nor the -o argument has been provided, the generated XML data is put in a file in the current directory named out.xml (i.e. it behaves as if -o out.xml was specified on the command line).

The program need to behave differently for traits that are calving or not, so a mechanism (-C and -N arguments) has been provided to explicitly control which behaviour is needed. If neither of these arguments has been provided, the behaviour is selected based on the name of the trait (that is retrieved from the 603 file).

Running this will show a brief description of how to run the program if you need a reminder:

python3 convert_performance.py -h

Output From the Program

If no problems are discovered in the input files, the following message will be written:

Running convert_performance.py version 2019-04-01 v1.19

No problems detected.

and an output file will be produced (unless -t is specified), either out.xml in the working directory or in the destfile specified.

If problems are discovered, the last line of the above message will be replaced by one or more messages each trying to describe what the problem is and where it was discovered. If the number of problems discovered is too great, a message to that effect is shown and the program terminates immediately. At some points in the execution, any previously detected problem will cause immediate termination. No xml file will be created until all problems are fixed.

Here a list of the most common errors found in the data and the actions needed to correct them.

Example message 1

Syntax of the 605 file, 605awwDlim.cou(1), do not follow the specification
Data    : 605 LIMFRAM008795002134 89 COU.
Expected: 605 ................... .. .. ...

This message specifies that the 605awwDlim.cou file (on line 1) has data that do not follow the expected format of this type of file (i.e. 605 format files).
The "Data" line shows what is there, and the "Expected" line shows what is expected to be found.
In this case it shows that one column of data is missing in the file.

Example message 2

Syntax of the 605 file, 605caeDlim.cou(1425), do not follow the specification
Data    : 605 LIMFRAM001295161096 10010 COU.
Expected: 605 ................... .. .. ...

This message specifies that the 605caeDlim.cou file (on line 1425) has data that do not follow the expected format of this type of file (i.e. 605 format files).
In this case it shows that one column delimiter is "0" instead of " ".

Example message 3

Syntax of the 602 file, fileawwlim.cou(1), do not follow the specification
Data    : 602 AWW LIM COU LIMFRAF008609359309 1 0 COU000360047467 308         27 7992                 2                    1550                 302500               220                  0                    0                    0                    0                    0                    0                    0                    100                  0                    0                    0                    0                    0                    0                    0                    0                    0                    100                  0                    0                    0                    0.
Expected: tail too short, is 548, expected 567

This shows that the length of the records in the file fileawwlim.cou are shorter than expected given the overall number of effects reported (total number of effects*21)

Example message 4

The 602 file, filebwtlim.cou(1720), animal LIMCZEF000000104735 has inconsistent (illegal - discarded) name of calf: 'UUUUUUUUUUUUUUUUUUU'

This shows that the file filebwtlim.cou, on line 1720, has the specification of a calf with no useful name of it's own though the name of the dam is provided. This specification has thus been discarded.
This message is not really a problem report but rather an informational message and thus will not cause termination due to too many problems found.

Example message 5

Duplicate: 602 CAE BSM COU BSMCZEF000002764617 1 0 CZE003200397411          1   9                 1368 BSMDNKM000415240598                     1             19990828 BSMCZEM000110893384                     1
          43                 1849                 1999
Duplicate: 602 CAE BSM COU BSMCZEF000002764617 1 0 CZE003200397411          1   9                 1368 BSMDNKM000415240598                     1             19990828 BSMCZEM000110893384                     1
          43                 1849                 1999
                   1

This shows presence of a duplicate record in a given file (in this case a 602 format file). Duplicate records are ignored by the program so they do not really represent a problem per se, but a warning like this is issued.

public/performance_convert_program (last edited 2020-11-05 16:47:56 by JanErik)