XML tutorials and examples

Introduction

When fixed character width data file formats become too complex, with for example the number of columns depending on values in previous columns or from data in other files, the effort to write parsers to read and write the information and make sure it is correctly understood becomes very time- and resource consuming. One possible solution is to use an XML format for the data exchange files, in order to draw from the large amount of packages and libraries already written to parse such documents. In this page you will find links and examples that will help with getting familiar with the XML format and how to write tools utilizing it.

XML Overview

See the Wikipedia entry for XML for a description of the XML markup language, link here. Especially take note of the terminology of the different parts of the XML structure.

XML tutorials

An XML tutorial focusing on the web aspect: W3 Schools XML tutorial

A video XML tutorial on youtube: XML basics

Advanced, in-depth tutorial on XML: http://www.xmlmaster.org/en/article/d01/

XML concrete example

In our hypothetical example we have two source files, one with the IDs and name of animals, and another one with associated data. In our final XML file the information in these two files should be packed together in one, single XML file. The structures of the source files are very simple.

Our source files

Animal ID file

This file is a fixed width file with two columns, the first being the animal ID (19 characters) and the second being the animal name (30 characters). Every line contains only one animal, which has to be unique within the file. Our example source file, which we can call source_id.dat, looks like this:

LIMFRAF001521469226 Rosa
HOLUSAF000017059414 Bossy
AANFINF000006314316 Muhmuh
HERFINF000003266465 Greta
LIMFRAF001930958553 Linda
CHAFINM000008365662 Cowlin
LIMFRAM003150038969 Bryan
CHAFRAF002350102162 Linda
SIMDEUF000922204654 Angel
HOLDEUF001006117458 Hermione
CHAFRAF004303055320 Samantha
HERFINM000008405652 Frodo

Associated data file

This is a 3-column comma separated file, with the first column being the animal ID, the second one being the name of the associated data and the third one being the data itself. An animal may have many pieces of data associated with it. Our example source file, called source_ad.dat, is:

HOLUSAF000017059414,recommendation,medium
CHAFRAF002350102162,recommendation,rare
HOLUSAF000017059414,height,180
CHAFINM000008365662,height,200
LIMFRAM003150038969,height,205
CHAFRAF002350102162,height,175
SIMDEUF000922204654,height,190
HOLDEUF001006117458,height,180
CHAFRAF004303055320,height,175 
HERFINM000008405652,recommendation,med-rare
HOLUSAF000017059414,offspring,4
AANFINF000006314316,offspring,5
HERFINF000003266465,offspring,2
LIMFRAF001930958553,offspring,0
CHAFINM000008365662,offspring,45
LIMFRAM003150038969,offspring,30
CHAFRAF002350102162,offspring,3
SIMDEUF000922204654,offspring,10
HOLDEUF001006117458,offspring,7
CHAFRAF004303055320,offspring,3
HERFINM000008405652,offspring,18
LIMFRAF001930958553,attitude,friendly
SIMDEUF000922204654,attitude,touchy
CHAFRAF002350102162,attitude,classy
AANFINF000006314316,attitude,angry
HOLDEUF001006117458,attitude,curious

Our final XML file

This is the output of the final XML file, when it has merged the information in the two source files:

   1 <interbeef><animal id="LIMFRAF001521469226" name="Rosa"><adata /></animal><animal id="HOLDEUF001006117458" name="Hermione"><adata><attitude>curious</attitude><offspring>7</offspring><height>180</height></adata></animal><animal id="AANFINF000006314316" name="Muhmuh"><adata><attitude>angry</attitude><offspring>5</offspring></adata></animal><animal id="HERFINM000008405652" name="Frodo"><adata><offspring>18</offspring><recommendation>med-rare</recommendation></adata></animal><animal id="SIMDEUF000922204654" name="Angel"><adata><attitude>touchy</attitude><offspring>10</offspring><height>190</height></adata></animal><animal id="HOLUSAF000017059414" name="Bossy"><adata><height>180</height><offspring>4</offspring><recommendation>medium</recommendation></adata></animal><animal id="CHAFRAF002350102162" name="Linda"><adata><attitude>classy</attitude><height>175</height><offspring>3</offspring><recommendation>rare</recommendation></adata></animal><animal id="CHAFINM000008365662" name="Cowlin"><adata><offspring>45</offspring><height>200</height></adata></animal><animal id="LIMFRAF001930958553" name="Linda"><adata><attitude>friendly</attitude><offspring>0</offspring></adata></animal><animal id="CHAFRAF004303055320" name="Samantha"><adata><offspring>3</offspring><height>175 </height></adata></animal><animal id="LIMFRAM003150038969" name="Bryan"><adata><offspring>30</offspring><height>205</height></adata></animal><animal id="HERFINF000003266465" name="Greta"><adata><offspring>2</offspring></adata></animal></interbeef>

After the compact version above has been pretty printed for easier viewing (with the structure intact) it looks like:

   1 <interbeef>
   2   <animal id="LIMFRAF001521469226" name="Rosa">
   3     <adata />
   4   </animal>
   5   <animal id="HOLDEUF001006117458" name="Hermione">
   6     <adata>
   7       <attitude>curious</attitude>
   8       <offspring>7</offspring>
   9       <height>180</height>
  10     </adata>
  11   </animal>
  12   <animal id="AANFINF000006314316" name="Muhmuh">
  13     <adata>
  14       <attitude>angry</attitude>
  15       <offspring>5</offspring>
  16     </adata>
  17   </animal>
  18   <animal id="HERFINM000008405652" name="Frodo">
  19     <adata>
  20       <offspring>18</offspring>
  21       <recommendation>med-rare</recommendation>
  22     </adata>
  23   </animal>
  24   <animal id="SIMDEUF000922204654" name="Angel">
  25     <adata>
  26       <attitude>touchy</attitude>
  27       <offspring>10</offspring>
  28       <height>190</height>
  29     </adata>
  30   </animal>
  31   <animal id="HOLUSAF000017059414" name="Bossy">
  32     <adata>
  33       <height>180</height>
  34       <offspring>4</offspring>
  35       <recommendation>medium</recommendation>
  36     </adata>
  37   </animal>
  38   <animal id="CHAFRAF002350102162" name="Linda">
  39     <adata>
  40       <attitude>classy</attitude>
  41       <height>175</height>
  42       <offspring>3</offspring>
  43       <recommendation>rare</recommendation>
  44     </adata>
  45   </animal>
  46   <animal id="CHAFINM000008365662" name="Cowlin">
  47     <adata>
  48       <offspring>45</offspring>
  49       <height>200</height>
  50     </adata>
  51   </animal>
  52   <animal id="LIMFRAF001930958553" name="Linda">
  53     <adata>
  54       <attitude>friendly</attitude>
  55       <offspring>0</offspring>
  56     </adata>
  57   </animal>
  58   <animal id="CHAFRAF004303055320" name="Samantha">
  59     <adata>
  60       <offspring>3</offspring>
  61       <height>175</height>
  62     </adata>
  63   </animal>
  64   <animal id="LIMFRAM003150038969" name="Bryan">
  65     <adata>
  66       <offspring>30</offspring>
  67       <height>205</height>
  68     </adata>
  69   </animal>
  70   <animal id="HERFINF000003266465" name="Greta">
  71     <adata>
  72       <offspring>2</offspring>
  73     </adata>
  74   </animal>
  75 </interbeef>

Fortran example of creating our XML file from the source files

Python example of creating our XML file from the source files

This python program takes the two source files and produces the correct XML file from it.

   1 #!/usr/bin/python2
   2 
   3 import xml.etree.ElementTree as ET # ElementTree is a good, python xml parser
   4 
   5 # open and read the source files
   6 
   7 s1 =open('source_id.dat','r')
   8 s2 =open('source_ad.dat','r')
   9 
  10 # build a data structure with animals from the animal ID file
  11 
  12 animals={}
  13 for line in s1:
  14     aid=line[0:19]
  15     name=line[20:].strip()
  16     
  17     animals[aid]=[{}, name] # the aid is tied to an empty dictionary for associated data and the name of the animal
  18 
  19 # add associated data to the data structure
  20 
  21 for line in s2:
  22     aid, adname, adval = line.strip('\n').split(',')
  23     if aid not in animals:
  24         print 'Warning: animal {aid} not in animal ID file!'.format(aid=aid)
  25     
  26     animals[aid][0].update({adname:adval})
  27 
  28 # create and write out the xml file from the data structure
  29 
  30 root = ET.Element('interbeef') # create the root XML element (called interbeef)
  31 for animal,adata in animals.iteritems(): 
  32     xmlanimal=ET.SubElement(root,'animal') # add the animal element to the root element
  33     xmlanimal.attrib['id']=animal # add the id as an attribute to the animal element
  34     xmlanimal.attrib['name']=adata[1] # add the name of the animal as an attribute to the animal element
  35     
  36     xmladata = ET.SubElement(xmlanimal, 'adata') # add an associated data element (called adata) to the animal element
  37     
  38     # add all the associated data as child elements to the animal's adata element
  39     for name, value in adata[0].iteritems():
  40         xmladatavalue = ET.SubElement(xmladata, name) # add the name of the associated data as an element 
  41         xmladatavalue.text = value # add the value as the content/text of the element
  42         
  43 # write out the xml file
  44 xmlfile = open('output.xml','w')
  45 xmlfile.write(ET.tostring(root,'UTF-8'))
  46 xmlfile.close()

Python example of reading our XML file