#pragma supplementation-page on = XML tutorials and examples = <> == Introduction == When fixed character width data file formats become too complex, with for example the number of columns depending on values in previous columns or from data in other files, the effort to write parsers to read and write the information and make sure it is correctly understood becomes very time- and resource consuming. One possible solution is to use an XML format for the data exchange files, in order to draw from the large amount of packages and libraries already written to parse such documents. In this page you will find links and examples that will help with getting familiar with the XML format and how to write tools utilizing it. == XML Overview == See the Wikipedia entry for XML for a description of the XML markup language, link [[https://en.wikipedia.org/wiki/Xml|here]]. Especially take note of the terminology of the different parts of the XML structure. == XML tutorials == An XML tutorial focusing on the web aspect: [[http://www.w3schools.com/xml/xml_whatis.asp|W3 Schools XML tutorial]] A video XML tutorial on youtube: [[http://www.youtube.com/watch?v=qgZVAznwX38|XML basics]] Advanced, in-depth tutorial on XML: http://www.xmlmaster.org/en/article/d01/ === Xpath === Xpath is a powerful standard for querying XML files, very much like querying a database. For a tutorial on xpaths see [[http://www.w3schools.com/XPath/xpath_intro.asp|W3 Schools Xpath tutorial]]. == XML concrete example == In our hypothetical example we have two source files, one with the IDs and name of animals, and another one with associated data. In our final XML file the information in these two files should be packed together in one, single XML file. The structures of the source files are very simple. === Our source files === ==== Animal ID file ==== This file is a fixed width file with two columns, the first being the animal ID (19 characters) and the second being the animal name (30 characters). Every line contains only one animal, which has to be unique within the file. Our example source file, which we can call '''source_id.dat''', looks like this: {{{ LIMFRAF001521469226 Rosa HOLUSAF000017059414 Bossy AANFINF000006314316 Muhmuh HERFINF000003266465 Greta LIMFRAF001930958553 Linda CHAFINM000008365662 Cowlin LIMFRAM003150038969 Bryan CHAFRAF002350102162 Linda SIMDEUF000922204654 Angel HOLDEUF001006117458 Hermione CHAFRAF004303055320 Samantha HERFINM000008405652 Frodo }}} ==== Associated data file ==== This is a 3-column comma separated file, with the first column being the animal ID, the second one being the name of the associated data and the third one being the data itself. An animal may have many pieces of data associated with it. Our example source file, called '''source_ad.dat''', is: {{{ HOLUSAF000017059414,recommendation,medium CHAFRAF002350102162,recommendation,rare HOLUSAF000017059414,height,180 CHAFINM000008365662,height,200 LIMFRAM003150038969,height,205 CHAFRAF002350102162,height,175 SIMDEUF000922204654,height,190 HOLDEUF001006117458,height,180 CHAFRAF004303055320,height,175 HERFINM000008405652,recommendation,med-rare HOLUSAF000017059414,offspring,4 AANFINF000006314316,offspring,5 HERFINF000003266465,offspring,2 LIMFRAF001930958553,offspring,0 CHAFINM000008365662,offspring,45 LIMFRAM003150038969,offspring,30 CHAFRAF002350102162,offspring,3 SIMDEUF000922204654,offspring,10 HOLDEUF001006117458,offspring,7 CHAFRAF004303055320,offspring,3 HERFINM000008405652,offspring,18 LIMFRAF001930958553,attitude,friendly SIMDEUF000922204654,attitude,touchy CHAFRAF002350102162,attitude,classy AANFINF000006314316,attitude,angry HOLDEUF001006117458,attitude,curious }}} === Our final XML file === This is the output of the final XML file, when it has merged the information in the two source files: {{{#!highlight xml curious7180angry518med-raretouchy101901804mediumclassy1753rare45200friendly03175 302052 }}} After the compact version above has been pretty printed for easier viewing (with the structure intact) it looks like: {{{#!highlight xml curious 7 180 angry 5 18 med-rare touchy 10 190 180 4 medium classy 175 3 rare 45 200 friendly 0 3 175 30 205 2 }}} === Programming examples of creating/reading these XML files === The following programs are just simple coding examples on how to use XML to write/read the data above. They are not meant to be full-fledged applications to be used in a production environment, as they lack error handling, efficiency optimization and other stuff. They are only here to show the fundamental logic of dealing with XML. With the exception of the first program, the examples utilize prewritten modules/libraries to handle the XML files, so a lot of the heavy lifting has already been done. Note that even though this is a really simple example of a data structure, the amount of code needed for XML reading is still the same or less than the code needed to read the data from the flat files. With increased complexity of the data structure, the benefits of using the XML format would shine through even more. ==== Fortran example of creating our XML file from the source files ==== This program uses plain fortran to read the source files and write out an XML file with the data. {{{#!highlight fortran program merge character(30) :: aid,aname, tmpaname character(30) :: aidad, adname, adval integer :: ios=0, first=0 ! declare an associated data structure as a linked list structure TYPE Adata character(30) :: aname character(30) :: avalue TYPE(Adata), POINTER :: next END TYPE Adata ! declare an animal data structure as a linked list structure TYPE :: Animal character(30) :: aname character(19) :: aid TYPE(Animal), POINTER :: next TYPE(Adata), POINTER :: adata END TYPE Animal TYPE(Animal), POINTER :: head, tmpanimal TYPE(Adata), POINTER :: tmpadata ! open files open(unit=2, file='source_id.dat') open(unit=3, file='source_ad.dat') ! read animal data into the animal data structure NULLIFY(head) do while (ios==0) read (2,*, iostat=ios) aid, aname if (ios==0) then ALLOCATE(tmpanimal) tmpanimal%aname=aname tmpanimal%aid=aid NULLIFY(tmpanimal%adata) if (ASSOCIATED(head)) then tmpanimal%next => head else NULLIFY(tmpanimal%next) end if head => tmpanimal end if end do ! read associated data into the animal dat structure ios=0 do while (ios==0) read (3,*, iostat=ios) aidad, adname, adval if (ios==0) then tmpanimal => head do while (ASSOCIATED(tmpanimal)) if (aidad==tmpanimal%aid) then exit end if tmpanimal => tmpanimal%next end do if (ASSOCIATED(tmpanimal)) then ALLOCATE(tmpadata) tmpadata%aname=adname tmpadata%avalue=adval if (ASSOCIATED(tmpanimal%adata)) then tmpadata%next => tmpanimal%adata else NULLIFY(tmpadata%next) end if tmpanimal%adata => tmpadata end if end if end do ! write the xml structure to a file open(unit=5, file='output.xml') tmpanimal => head write (5,*) "" NULLIFY(head) do while (ASSOCIATED(tmpanimal)) write (5,*) "" tmpadata => tmpanimal%adata if (ASSOCIATED(tmpadata)) then write (5,*) "" do while (ASSOCIATED(tmpadata)) write (5,*) "<"//TRIM(tmpadata%aname)//">"//TRIM(tmpadata%avalue)//"" tmpadata => tmpadata%next end do write (5,*) "" else write (5,*) "" end if write (5,*) "" tmpanimal => tmpanimal%next end do write (5,*) "" stop end }}} ==== Python example of creating our XML file from the source files ==== This python program takes the two source files and produces the correct XML file from it. It utilizes the ElementTree Python XML module (which is Python standard module since version 2.5) to create an XML structure which is then written out to a file. {{{#!highlight python #!/usr/bin/python2 import xml.etree.ElementTree as ET # ElementTree is a good, python xml parser # open and read the source files s1 =open('source_id.dat','r') s2 =open('source_ad.dat','r') # build a data structure with animals from the animal ID file animals={} for line in s1: aid=line[0:19] name=line[20:].strip() animals[aid]=[{}, name] # the aid is tied to an empty dictionary for associated data and the name of the animal # add associated data to the data structure for line in s2: aid, adname, adval = line.strip('\n').split(',') if aid not in animals: print 'Warning: animal {aid} not in animal ID file!'.format(aid=aid) animals[aid][0].update({adname:adval}) # create and write out the xml file from the data structure root = ET.Element('interbeef') # create the root XML element (called interbeef) for animal,adata in animals.iteritems(): xmlanimal=ET.SubElement(root,'animal') # add the animal element to the root element xmlanimal.attrib['id']=animal # add the id as an attribute to the animal element xmlanimal.attrib['name']=adata[1] # add the name of the animal as an attribute to the animal element xmladata = ET.SubElement(xmlanimal, 'adata') # add an associated data element (called adata) to the animal element # add all the associated data as child elements to the animal's adata element for name, value in adata[0].iteritems(): xmladatavalue = ET.SubElement(xmladata, name) # add the name of the associated data as an element xmladatavalue.text = value # add the value as the content/text of the element # write out the xml file xmlfile = open('output.xml','w') xmlfile.write(ET.tostring(root,'UTF-8')) xmlfile.close() }}} ==== Fortran example of reading our XML file ==== This fortran program utilizes the [[http://xml-fortran.sourceforge.net/|XML Fortran's]] xmlparser module to read and parse the xml file, to then conveniently loop through the elements to find the correct one. The program prints out the animal with the animal ID submitted as the first argument to it, and its associated data. In order to compile the program, it must be compiled together with the xmlparse.f90 file from the XML Fortran project. {{{#!highlight fortran program readxml use xmlparse ! load the xml parsing module character(50) :: aid, tag, xmlaid, xmlname type(XML_PARSE) :: info character(len=80),dimension(1:2,1:20) :: attribs integer :: no_attribs logical :: endtag character(len=200),dimension(1:100) :: data integer :: no_data integer :: i call getarg(1, aid) ! get the first argument, ie. the animal id to search for call xml_open(info,"output.xml", .true.) ! open the xml document and parse it call xml_options(info, ignore_whitespace = .true.) ! set xml options to ignore whitespace do call xml_get(info,tag,endtag,attribs,no_attribs,data,no_data) ! get an element from xml structure if (tag=="animal") then do i=1,no_attribs ! get animal animal attributes; id and name if (attribs(1,i)=="id") xmlaid=attribs(2,i) if (attribs(1,i)=="name") xmlname=attribs(2,i) end do if (xmlaid==aid) then ! check if the animal is the one we search for write (*,*) "Animal ID: "//xmlaid write (*,*) "Animal name: "//xmlname call xml_get(info,tag,endtag,attribs,no_attribs,data,no_data) ! get the animals associated data element if (tag=="adata" .and. .not. endtag) then do ! loop through the associated data elements and display them call xml_get(info,tag,endtag,attribs,no_attribs,data,no_data) if ((tag=="adata" .and. endtag) .or. (.not. xml_ok(info))) exit if (.not. endtag) write (*,*) " " // trim(tag) // " : " // trim(data(1)) end do end if end if xmlaid="" xmlname="" end if if (.not. xml_ok(info)) exit ! exit the loop at the end of the xml structure end do end program }}} ==== Python example of reading our XML file ==== This Python program does the exact same thing as the fortran code above, but because of the efficient design of the ElementTree XML module and the use of Xpaths to find the correct data elements, the code required is much less. The code finds and prints the data of the animal with the animal id supplied as the first argument. {{{#!highlight python #!/usr/bin/python from sys import argv from xml.etree import ElementTree # open the xml file and parse it with open('poutput.xml', 'rt') as f: tree = ElementTree.parse(f) # parse the xmlfile node=tree.find('animal[@id="{aid}"]'.format(aid=argv[1])) # use an xpath to find the correct animal in the xml file if node is not None: # print out the animal data if the animal was found print "Animal ID: {aid}".format(aid=node.attrib['id']) print "Animal name: {name}".format(name=node.attrib['name']) print "Animal associated data:" # get all associated data for adata in node.findall('adata/*'): # loop through all child-elements of the adata element print " {adataname} : {adatavalue}".format(adataname=adata.tag, adatavalue=adata.text) }}}