| Size: 13069 Comment:  | Size: 13349 Comment:  | 
| Deletions are marked like this. | Additions are marked like this. | 
| Line 3: | Line 3: | 
| Line 7: | Line 6: | 
| Line 11: | Line 9: | 
| Line 15: | Line 12: | 
| Line 20: | Line 16: | 
| Advanced, in-depth tutorial on XML: [[http://www.xmlmaster.org/en/article/d01/||XML Certification Program tutorial]] | Advanced, in-depth tutorial on XML: http://www.xmlmaster.org/en/article/d01/ | 
| Line 23: | Line 19: | 
| In our hypothetical example we have two source files, one with the IDs and name of animals, and another one with associated data. In our final XML file the information in these two files should be packed together in one, single XML file. The structures of the source files are very simple. | In our hypothetical example we have two source files, one with the IDs and name of animals, and another one with associated data. In our final XML file the information in these two files should be packed together in one, single XML file. The structures of the source files are very simple. | 
| Line 27: | Line 22: | 
| Line 29: | Line 23: | 
| This file is a fixed width file with two columns, the first being the animal ID (19 characters) and the second being the animal name (30 characters). Every line contains only one animal, which has to be unique within the file. Our example source file, which we can call '''source_id.dat''', looks like this: | This file is a fixed width file with two columns, the first being the animal ID (19 characters) and the second being the animal name (30 characters). Every line contains only one animal, which has to be unique within the file. Our example source file, which we can call '''source_id.dat''', looks like this: | 
| Line 34: | Line 26: | 
| Line 47: | Line 38: | 
| }}} | }}} | 
| Line 51: | Line 40: | 
| This is a 3-column comma separated file, with the first column being the animal ID, the second one being the name of the associated data and the third one being the data itself. An animal may have many pieces of data associated with it. Our example source file, called '''source_ad.dat''', is: | This is a 3-column comma separated file, with the first column being the animal ID, the second one being the name of the associated data and the third one being the data itself. An animal may have many pieces of data associated with it. Our example source file, called '''source_ad.dat''', is: | 
| Line 56: | Line 43: | 
| Line 65: | Line 51: | 
| CHAFRAF004303055320,height,175 | CHAFRAF004303055320,height,175 | 
| Line 83: | Line 69: | 
| }}} | }}} | 
| Line 87: | Line 71: | 
| Line 93: | Line 76: | 
| }}} | }}} | 
| Line 175: | Line 156: | 
| }}} | }}} | 
| Line 179: | Line 158: | 
| Line 184: | Line 161: | 
| character(30) :: aid,aname | character(30) :: aid,aname, tmpaname | 
| Line 187: | Line 164: | 
| Line 189: | Line 166: | 
| Line 196: | Line 173: | 
| Line 204: | Line 181: | 
| Line 207: | Line 184: | 
| Line 211: | Line 188: | 
| Line 213: | Line 190: | 
| NULLIFY(head) | |
| Line 226: | Line 204: | 
| NULLIFY(tmpanimal) | |
| Line 229: | Line 206: | 
| Line 255: | Line 232: | 
| ! write the xml structure to a file open(unit=5, file='output.xml') | |
| Line 257: | Line 236: | 
| DO WHILE (ASSOCIATED(tmpanimal)) PRINT *, tmpanimal%aid | write (5,*) "<interbeef>" NULLIFY(head) do while (ASSOCIATED(tmpanimal)) write (5,*) "<animal id="""//TRIM(tmpanimal%aid)//""" name="""//TRIM(tmpanimal%aname)//""">" | 
| Line 260: | Line 241: | 
| DO WHILE (ASSOCIATED(tmpadata)) PRINT *,tmpadata%aname tmpadata => tmpadata%next END DO | if (ASSOCIATED(tmpadata)) then write (5,*) "<adata>" do while (ASSOCIATED(tmpadata)) write (5,*) "<"//TRIM(tmpadata%aname)//">"//TRIM(tmpadata%avalue)//"</"//TRIM(tmpadata%aname)//">" tmpadata => tmpadata%next end do write (5,*) "</adata>" else write (5,*) "<adata/>" end if write (5,*) "</animal>" | 
| Line 265: | Line 253: | 
| END DO | end do write (5,*) "</interbeef>" | 
| Line 269: | Line 258: | 
| }}} | }}} | 
| Line 275: | Line 260: | 
| Line 295: | Line 279: | 
| Line 304: | Line 288: | 
| Line 310: | Line 294: | 
| for animal,adata in animals.iteritems(): | for animal,adata in animals.iteritems(): | 
| Line 314: | Line 298: | 
| Line 316: | Line 300: | 
| Line 319: | Line 303: | 
| xmladatavalue = ET.SubElement(xmladata, name) # add the name of the associated data as an element | xmladatavalue = ET.SubElement(xmladata, name) # add the name of the associated data as an element | 
| Line 321: | Line 305: | 
| Line 326: | Line 310: | 
| }}} | }}} | 
| Line 330: | Line 312: | 
| Line 333: | Line 314: | 
| #!/usr/bin/python2{ | #!/usr/bin/python | 
| Line 339: | Line 320: | 
| with open('output.xml', 'rt') as f: tree = ElementTree.parse(f) # parse the xmlfile and read it into memory animal={} # dictionary for animal information # find the correct animal for node in tree.iter('animal'): # loop through the xml file if node.attrib['id']==argv[1]: # until we find the correct animal print "Animal ID: {aid}".format(aid=node.attrib['id']) print "Animal name: {name}".format(name=node.attrib['name']) print "Animal associated data:" # get all associated data for adata in node.findall('adata/*'): # loop through all child-elements of the adata element print " {adataname} : {adatavalue}".format(adataname=adata.tag, adatavalue=adata.text) }}} | with open('poutput.xml', 'rt') as f: tree = ElementTree.parse(f) # parse the xmlfile node=tree.find('animal[@id="{aid}"]'.format(aid=argv[1])) # use an xpath to find the correct animal in the xml file if node is not None: # print out the animal data if the animal was found print "Animal ID: {aid}".format(aid=node.attrib['id']) print "Animal name: {name}".format(name=node.attrib['name']) print "Animal associated data:" # get all associated data for adata in node.findall('adata/*'): # loop through all child-elements of the adata element print " {adataname} : {adatavalue}".format(adataname=adata.tag, adatavalue=adata.text) }}} | 
XML tutorials and examples
Contents
Introduction
When fixed character width data file formats become too complex, with for example the number of columns depending on values in previous columns or from data in other files, the effort to write parsers to read and write the information and make sure it is correctly understood becomes very time- and resource consuming. One possible solution is to use an XML format for the data exchange files, in order to draw from the large amount of packages and libraries already written to parse such documents. In this page you will find links and examples that will help with getting familiar with the XML format and how to write tools utilizing it.
XML Overview
See the Wikipedia entry for XML for a description of the XML markup language, link here. Especially take note of the terminology of the different parts of the XML structure.
XML tutorials
An XML tutorial focusing on the web aspect: W3 Schools XML tutorial
A video XML tutorial on youtube: XML basics
Advanced, in-depth tutorial on XML: http://www.xmlmaster.org/en/article/d01/
XML concrete example
In our hypothetical example we have two source files, one with the IDs and name of animals, and another one with associated data. In our final XML file the information in these two files should be packed together in one, single XML file. The structures of the source files are very simple.
Our source files
Animal ID file
This file is a fixed width file with two columns, the first being the animal ID (19 characters) and the second being the animal name (30 characters). Every line contains only one animal, which has to be unique within the file. Our example source file, which we can call source_id.dat, looks like this:
LIMFRAF001521469226 Rosa HOLUSAF000017059414 Bossy AANFINF000006314316 Muhmuh HERFINF000003266465 Greta LIMFRAF001930958553 Linda CHAFINM000008365662 Cowlin LIMFRAM003150038969 Bryan CHAFRAF002350102162 Linda SIMDEUF000922204654 Angel HOLDEUF001006117458 Hermione CHAFRAF004303055320 Samantha HERFINM000008405652 Frodo
Associated data file
This is a 3-column comma separated file, with the first column being the animal ID, the second one being the name of the associated data and the third one being the data itself. An animal may have many pieces of data associated with it. Our example source file, called source_ad.dat, is:
HOLUSAF000017059414,recommendation,medium CHAFRAF002350102162,recommendation,rare HOLUSAF000017059414,height,180 CHAFINM000008365662,height,200 LIMFRAM003150038969,height,205 CHAFRAF002350102162,height,175 SIMDEUF000922204654,height,190 HOLDEUF001006117458,height,180 CHAFRAF004303055320,height,175 HERFINM000008405652,recommendation,med-rare HOLUSAF000017059414,offspring,4 AANFINF000006314316,offspring,5 HERFINF000003266465,offspring,2 LIMFRAF001930958553,offspring,0 CHAFINM000008365662,offspring,45 LIMFRAM003150038969,offspring,30 CHAFRAF002350102162,offspring,3 SIMDEUF000922204654,offspring,10 HOLDEUF001006117458,offspring,7 CHAFRAF004303055320,offspring,3 HERFINM000008405652,offspring,18 LIMFRAF001930958553,attitude,friendly SIMDEUF000922204654,attitude,touchy CHAFRAF002350102162,attitude,classy AANFINF000006314316,attitude,angry HOLDEUF001006117458,attitude,curious
Our final XML file
This is the output of the final XML file, when it has merged the information in the two source files:
   1 <interbeef><animal id="LIMFRAF001521469226" name="Rosa"><adata /></animal><animal id="HOLDEUF001006117458" name="Hermione"><adata><attitude>curious</attitude><offspring>7</offspring><height>180</height></adata></animal><animal id="AANFINF000006314316" name="Muhmuh"><adata><attitude>angry</attitude><offspring>5</offspring></adata></animal><animal id="HERFINM000008405652" name="Frodo"><adata><offspring>18</offspring><recommendation>med-rare</recommendation></adata></animal><animal id="SIMDEUF000922204654" name="Angel"><adata><attitude>touchy</attitude><offspring>10</offspring><height>190</height></adata></animal><animal id="HOLUSAF000017059414" name="Bossy"><adata><height>180</height><offspring>4</offspring><recommendation>medium</recommendation></adata></animal><animal id="CHAFRAF002350102162" name="Linda"><adata><attitude>classy</attitude><height>175</height><offspring>3</offspring><recommendation>rare</recommendation></adata></animal><animal id="CHAFINM000008365662" name="Cowlin"><adata><offspring>45</offspring><height>200</height></adata></animal><animal id="LIMFRAF001930958553" name="Linda"><adata><attitude>friendly</attitude><offspring>0</offspring></adata></animal><animal id="CHAFRAF004303055320" name="Samantha"><adata><offspring>3</offspring><height>175 </height></adata></animal><animal id="LIMFRAM003150038969" name="Bryan"><adata><offspring>30</offspring><height>205</height></adata></animal><animal id="HERFINF000003266465" name="Greta"><adata><offspring>2</offspring></adata></animal></interbeef>
After the compact version above has been pretty printed for easier viewing (with the structure intact) it looks like:
   1 <interbeef>
   2   <animal id="LIMFRAF001521469226" name="Rosa">
   3     <adata />
   4   </animal>
   5   <animal id="HOLDEUF001006117458" name="Hermione">
   6     <adata>
   7       <attitude>curious</attitude>
   8       <offspring>7</offspring>
   9       <height>180</height>
  10     </adata>
  11   </animal>
  12   <animal id="AANFINF000006314316" name="Muhmuh">
  13     <adata>
  14       <attitude>angry</attitude>
  15       <offspring>5</offspring>
  16     </adata>
  17   </animal>
  18   <animal id="HERFINM000008405652" name="Frodo">
  19     <adata>
  20       <offspring>18</offspring>
  21       <recommendation>med-rare</recommendation>
  22     </adata>
  23   </animal>
  24   <animal id="SIMDEUF000922204654" name="Angel">
  25     <adata>
  26       <attitude>touchy</attitude>
  27       <offspring>10</offspring>
  28       <height>190</height>
  29     </adata>
  30   </animal>
  31   <animal id="HOLUSAF000017059414" name="Bossy">
  32     <adata>
  33       <height>180</height>
  34       <offspring>4</offspring>
  35       <recommendation>medium</recommendation>
  36     </adata>
  37   </animal>
  38   <animal id="CHAFRAF002350102162" name="Linda">
  39     <adata>
  40       <attitude>classy</attitude>
  41       <height>175</height>
  42       <offspring>3</offspring>
  43       <recommendation>rare</recommendation>
  44     </adata>
  45   </animal>
  46   <animal id="CHAFINM000008365662" name="Cowlin">
  47     <adata>
  48       <offspring>45</offspring>
  49       <height>200</height>
  50     </adata>
  51   </animal>
  52   <animal id="LIMFRAF001930958553" name="Linda">
  53     <adata>
  54       <attitude>friendly</attitude>
  55       <offspring>0</offspring>
  56     </adata>
  57   </animal>
  58   <animal id="CHAFRAF004303055320" name="Samantha">
  59     <adata>
  60       <offspring>3</offspring>
  61       <height>175</height>
  62     </adata>
  63   </animal>
  64   <animal id="LIMFRAM003150038969" name="Bryan">
  65     <adata>
  66       <offspring>30</offspring>
  67       <height>205</height>
  68     </adata>
  69   </animal>
  70   <animal id="HERFINF000003266465" name="Greta">
  71     <adata>
  72       <offspring>2</offspring>
  73     </adata>
  74   </animal>
  75 </interbeef>
Fortran example of creating our XML file from the source files
   1 program merge
   2 
   3     character(30) :: aid,aname, tmpaname
   4     character(30) :: aidad, adname, adval
   5 
   6     integer :: ios=0, first=0
   7 
   8     ! declare an associated data structure as a linked list structure
   9     TYPE Adata
  10         character(30) :: aname
  11         character(30) :: avalue
  12         TYPE(Adata), POINTER :: next
  13     END TYPE Adata
  14 
  15     ! declare an animal data structure as a linked list structure
  16     TYPE :: Animal
  17         character(30) :: aname
  18         character(19) :: aid
  19         TYPE(Animal), POINTER :: next
  20         TYPE(Adata), POINTER :: adata
  21     END TYPE Animal
  22 
  23     TYPE(Animal), POINTER :: head, tmpanimal
  24     TYPE(Adata), POINTER :: tmpadata
  25 
  26     ! open files
  27     open(unit=2, file='source_id.dat')
  28     open(unit=3, file='source_ad.dat')
  29 
  30     ! read animal data into the animal data structure
  31     NULLIFY(head)
  32     do while (ios==0)
  33         read (2,*, iostat=ios) aid, aname
  34         if (ios==0) then
  35             ALLOCATE(tmpanimal)
  36             tmpanimal%aname=aname
  37             tmpanimal%aid=aid
  38             NULLIFY(tmpanimal%adata)
  39             if (ASSOCIATED(head)) then
  40                 tmpanimal%next => head
  41             else
  42                 NULLIFY(tmpanimal%next)
  43             end if
  44             head => tmpanimal
  45         end if
  46     end do
  47 
  48     ! read associated data into the animal dat structure
  49     ios=0
  50     do while (ios==0)
  51         read (3,*, iostat=ios) aidad, adname, adval
  52         if (ios==0) then
  53             tmpanimal => head
  54             do while (ASSOCIATED(tmpanimal))
  55                 if (aidad==tmpanimal%aid) then
  56                     exit
  57                 end if
  58                 tmpanimal => tmpanimal%next
  59             end do
  60             if (ASSOCIATED(tmpanimal)) then
  61                 ALLOCATE(tmpadata)
  62                 tmpadata%aname=adname
  63                 tmpadata%avalue=adval
  64                 if (ASSOCIATED(tmpanimal%adata)) then
  65                     tmpadata%next => tmpanimal%adata
  66                 else
  67                     NULLIFY(tmpadata%next)
  68                 end if
  69                 tmpanimal%adata => tmpadata
  70             end if
  71         end if
  72     end do
  73 
  74     ! write the xml structure to a file
  75     open(unit=5, file='output.xml')
  76     tmpanimal => head
  77     write (5,*) "<interbeef>"
  78     NULLIFY(head)
  79     do while (ASSOCIATED(tmpanimal))
  80         write (5,*) "<animal id="""//TRIM(tmpanimal%aid)//""" name="""//TRIM(tmpanimal%aname)//""">"
  81         tmpadata => tmpanimal%adata
  82         if (ASSOCIATED(tmpadata)) then
  83             write (5,*) "<adata>"
  84             do while (ASSOCIATED(tmpadata))
  85                 write (5,*) "<"//TRIM(tmpadata%aname)//">"//TRIM(tmpadata%avalue)//"</"//TRIM(tmpadata%aname)//">"
  86                 tmpadata => tmpadata%next
  87             end do
  88             write (5,*) "</adata>"
  89         else
  90             write (5,*) "<adata/>"
  91         end if
  92         write (5,*) "</animal>"
  93         tmpanimal => tmpanimal%next
  94     end do
  95     write (5,*) "</interbeef>"
  96 
  97 stop
  98 end
Python example of creating our XML file from the source files
This python program takes the two source files and produces the correct XML file from it.
   1 #!/usr/bin/python2
   2 
   3 import xml.etree.ElementTree as ET # ElementTree is a good, python xml parser
   4 
   5 # open and read the source files
   6 
   7 s1 =open('source_id.dat','r')
   8 s2 =open('source_ad.dat','r')
   9 
  10 # build a data structure with animals from the animal ID file
  11 
  12 animals={}
  13 for line in s1:
  14     aid=line[0:19]
  15     name=line[20:].strip()
  16 
  17     animals[aid]=[{}, name] # the aid is tied to an empty dictionary for associated data and the name of the animal
  18 
  19 # add associated data to the data structure
  20 
  21 for line in s2:
  22     aid, adname, adval = line.strip('\n').split(',')
  23     if aid not in animals:
  24         print 'Warning: animal {aid} not in animal ID file!'.format(aid=aid)
  25 
  26     animals[aid][0].update({adname:adval})
  27 
  28 # create and write out the xml file from the data structure
  29 
  30 root = ET.Element('interbeef') # create the root XML element (called interbeef)
  31 for animal,adata in animals.iteritems():
  32     xmlanimal=ET.SubElement(root,'animal') # add the animal element to the root element
  33     xmlanimal.attrib['id']=animal # add the id as an attribute to the animal element
  34     xmlanimal.attrib['name']=adata[1] # add the name of the animal as an attribute to the animal element
  35 
  36     xmladata = ET.SubElement(xmlanimal, 'adata') # add an associated data element (called adata) to the animal element
  37 
  38     # add all the associated data as child elements to the animal's adata element
  39     for name, value in adata[0].iteritems():
  40         xmladatavalue = ET.SubElement(xmladata, name) # add the name of the associated data as an element
  41         xmladatavalue.text = value # add the value as the content/text of the element
  42 
  43 # write out the xml file
  44 xmlfile = open('output.xml','w')
  45 xmlfile.write(ET.tostring(root,'UTF-8'))
  46 xmlfile.close()
Python example of reading our XML file
   1 #!/usr/bin/python
   2 
   3 from sys import argv
   4 from xml.etree import ElementTree
   5 
   6 # open the xml file and parse it
   7 with open('poutput.xml', 'rt') as f:
   8     tree = ElementTree.parse(f) # parse the xmlfile
   9 
  10 node=tree.find('animal[@id="{aid}"]'.format(aid=argv[1])) # use an xpath to find the correct animal in the xml file
  11 
  12 if node is not None: # print out the animal data if the animal was found
  13     print "Animal ID: {aid}".format(aid=node.attrib['id'])
  14     print "Animal name: {name}".format(name=node.attrib['name'])
  15     print "Animal associated data:"
  16 
  17     # get all associated data
  18     for adata in node.findall('adata/*'): # loop through all child-elements of the adata element
  19         print "  {adataname} : {adatavalue}".format(adataname=adata.tag, adatavalue=adata.text)
