Differences between revisions 5 and 24 (spanning 19 versions)
Revision 5 as of 2014-04-24 11:11:25
Size: 2849
Comment:
Revision 24 as of 2014-05-06 10:48:58
Size: 17468
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
#pragma supplementation-page on
Line 2: Line 3:
Line 6: Line 6:
Line 10: Line 9:
Line 14: Line 12:
Line 19: Line 16:
Advanced an in-depth tutorial on XML: [[http://www.xmlmaster.org/en/article/d01/||XML Certification Program tutorial]] Advanced, in-depth tutorial on XML: http://www.xmlmaster.org/en/article/d01/

=== Xpath ===

Xpath is a powerful standard for querying XML files, very much like querying a database. For a tutorial on xpaths see [[http://www.w3schools.com/XPath/xpath_intro.asp|W3 Schools Xpath tutorial]].
Line 22: Line 23:

In our hypothetical example we have two source files, one with the IDs and name of animals, and another one with associated data. In our final XML file the information in these two files should be packed together in one, single XML file. The structures of the source files are very simple. 
In our hypothetical example we have two source files, one with the IDs and name of animals, and another one with associated data. In our final XML file the information in these two files should be packed together in one, single XML file. The structures of the source files are very simple.
Line 26: Line 26:
Line 28: Line 27:

This file is a fixed width file with two columns, the first being the animal ID (19 characters) and the second being the animal name (30 characters). Every line contains only one animal, which has to be unique within the file.
Our example source file looks like this:
This file is a fixed width file with two columns, the first being the animal ID (19 characters) and the second being the animal name (30 characters). Every line contains only one animal, which has to be unique within the file. Our example source file, which we can call '''source_id.dat''', looks like this:
Line 33: Line 30:
Line 46: Line 42:

}}}
}}}
Line 50: Line 44:

This is a 3-column fixed width file, with the first column being the animal ID (19 characters), the second one being the name of the associated data (15 characters) and the third one being the data itself (10 characters).
Our example source file is:
This is a 3-column comma separated file, with the first column being the animal ID, the second one being the name of the associated data and the third one being the data itself. An animal may have many pieces of data associated with it. Our example source file, called '''source_ad.dat''', is:
Line 55: Line 47:



}}}
HOLUSAF000017059414,recommendation,medium
CHAFRAF002350102162,recommendation,rare
HOLUSAF000017059414,height,180
CHAFINM000008365662,height,200
LIMFRAM003150038969,height,205
CHAFRAF002350102162,height,175
SIMDEUF000922204654,height,190
HOLDEUF001006117458,height,180
CHAFRAF004303055320,height,175
HERFINM000008405652,recommendation,med-rare
HOLUSAF000017059414,offspring,4
AANFINF000006314316,offspring,5
HERFINF000003266465,offspring,2
LIMFRAF001930958553,offspring,0
CHAFINM000008365662,offspring,45
LIMFRAM003150038969,offspring,30
CHAFRAF002350102162,offspring,3
SIMDEUF000922204654,offspring,10
HOLDEUF001006117458,offspring,7
CHAFRAF004303055320,offspring,3
HERFINM000008405652,offspring,18
LIMFRAF001930958553,attitude,friendly
SIMDEUF000922204654,attitude,touchy
CHAFRAF002350102162,attitude,classy
AANFINF000006314316,attitude,angry
HOLDEUF001006117458,attitude,curious
}}}
Line 61: Line 75:

=== Fortran example of creating our XML file from the source files ===

=== Python example of creating our XML file from the source files ===

=== Python example of reading our XML file ===
This is the output of the final XML file, when it has merged the information in the two source files:

{{{#!highlight xml

<interbeef><animal id="LIMFRAF001521469226" name="Rosa"><adata /></animal><animal id="HOLDEUF001006117458" name="Hermione"><adata><attitude>curious</attitude><offspring>7</offspring><height>180</height></adata></animal><animal id="AANFINF000006314316" name="Muhmuh"><adata><attitude>angry</attitude><offspring>5</offspring></adata></animal><animal id="HERFINM000008405652" name="Frodo"><adata><offspring>18</offspring><recommendation>med-rare</recommendation></adata></animal><animal id="SIMDEUF000922204654" name="Angel"><adata><attitude>touchy</attitude><offspring>10</offspring><height>190</height></adata></animal><animal id="HOLUSAF000017059414" name="Bossy"><adata><height>180</height><offspring>4</offspring><recommendation>medium</recommendation></adata></animal><animal id="CHAFRAF002350102162" name="Linda"><adata><attitude>classy</attitude><height>175</height><offspring>3</offspring><recommendation>rare</recommendation></adata></animal><animal id="CHAFINM000008365662" name="Cowlin"><adata><offspring>45</offspring><height>200</height></adata></animal><animal id="LIMFRAF001930958553" name="Linda"><adata><attitude>friendly</attitude><offspring>0</offspring></adata></animal><animal id="CHAFRAF004303055320" name="Samantha"><adata><offspring>3</offspring><height>175 </height></adata></animal><animal id="LIMFRAM003150038969" name="Bryan"><adata><offspring>30</offspring><height>205</height></adata></animal><animal id="HERFINF000003266465" name="Greta"><adata><offspring>2</offspring></adata></animal></interbeef>
}}}
After the compact version above has been pretty printed for easier viewing (with the structure intact) it looks like:

{{{#!highlight xml

<interbeef>
  <animal id="LIMFRAF001521469226" name="Rosa">
    <adata />
  </animal>
  <animal id="HOLDEUF001006117458" name="Hermione">
    <adata>
      <attitude>curious</attitude>
      <offspring>7</offspring>
      <height>180</height>
    </adata>
  </animal>
  <animal id="AANFINF000006314316" name="Muhmuh">
    <adata>
      <attitude>angry</attitude>
      <offspring>5</offspring>
    </adata>
  </animal>
  <animal id="HERFINM000008405652" name="Frodo">
    <adata>
      <offspring>18</offspring>
      <recommendation>med-rare</recommendation>
    </adata>
  </animal>
  <animal id="SIMDEUF000922204654" name="Angel">
    <adata>
      <attitude>touchy</attitude>
      <offspring>10</offspring>
      <height>190</height>
    </adata>
  </animal>
  <animal id="HOLUSAF000017059414" name="Bossy">
    <adata>
      <height>180</height>
      <offspring>4</offspring>
      <recommendation>medium</recommendation>
    </adata>
  </animal>
  <animal id="CHAFRAF002350102162" name="Linda">
    <adata>
      <attitude>classy</attitude>
      <height>175</height>
      <offspring>3</offspring>
      <recommendation>rare</recommendation>
    </adata>
  </animal>
  <animal id="CHAFINM000008365662" name="Cowlin">
    <adata>
      <offspring>45</offspring>
      <height>200</height>
    </adata>
  </animal>
  <animal id="LIMFRAF001930958553" name="Linda">
    <adata>
      <attitude>friendly</attitude>
      <offspring>0</offspring>
    </adata>
  </animal>
  <animal id="CHAFRAF004303055320" name="Samantha">
    <adata>
      <offspring>3</offspring>
      <height>175</height>
    </adata>
  </animal>
  <animal id="LIMFRAM003150038969" name="Bryan">
    <adata>
      <offspring>30</offspring>
      <height>205</height>
    </adata>
  </animal>
  <animal id="HERFINF000003266465" name="Greta">
    <adata>
      <offspring>2</offspring>
    </adata>
  </animal>
</interbeef>
}}}

=== Programming examples of creating/reading these XML files ===

The following programs are just simple coding examples on how to use XML to write/read the data above. They are not meant to be full-fledged applications to be used in a production environment, as they lack error handling, efficiency optimization and other stuff. They are only here to show the fundamental logic of dealing with XML.

With the exception of the first program, the examples utilize prewritten modules/libraries to handle the XML files, so a lot of the heavy lifting has already been done. Note that even though this is a really simple example of a data structure, the amount of code needed for XML reading is still the same or less than the code needed to read the data from the flat files. With increased complexity of the data structure, the benefits of using the XML format would shine through even more.

==== Fortran example of creating our XML file from the source files ====

This program uses plain fortran to read the source files and write out an XML file with the data.

{{{#!highlight fortran

program merge

    character(30) :: aid,aname, tmpaname
    character(30) :: aidad, adname, adval

    integer :: ios=0, first=0

    ! declare an associated data structure as a linked list structure
    TYPE Adata
        character(30) :: aname
        character(30) :: avalue
        TYPE(Adata), POINTER :: next
    END TYPE Adata

    ! declare an animal data structure as a linked list structure
    TYPE :: Animal
        character(30) :: aname
        character(19) :: aid
        TYPE(Animal), POINTER :: next
        TYPE(Adata), POINTER :: adata
    END TYPE Animal

    TYPE(Animal), POINTER :: head, tmpanimal
    TYPE(Adata), POINTER :: tmpadata

    ! open files
    open(unit=2, file='source_id.dat')
    open(unit=3, file='source_ad.dat')

    ! read animal data into the animal data structure
    NULLIFY(head)
    do while (ios==0)
        read (2,*, iostat=ios) aid, aname
        if (ios==0) then
            ALLOCATE(tmpanimal)
            tmpanimal%aname=aname
            tmpanimal%aid=aid
            NULLIFY(tmpanimal%adata)
            if (ASSOCIATED(head)) then
                tmpanimal%next => head
            else
                NULLIFY(tmpanimal%next)
            end if
            head => tmpanimal
        end if
    end do

    ! read associated data into the animal dat structure
    ios=0
    do while (ios==0)
        read (3,*, iostat=ios) aidad, adname, adval
        if (ios==0) then
            tmpanimal => head
            do while (ASSOCIATED(tmpanimal))
                if (aidad==tmpanimal%aid) then
                    exit
                end if
                tmpanimal => tmpanimal%next
            end do
            if (ASSOCIATED(tmpanimal)) then
                ALLOCATE(tmpadata)
                tmpadata%aname=adname
                tmpadata%avalue=adval
                if (ASSOCIATED(tmpanimal%adata)) then
                    tmpadata%next => tmpanimal%adata
                else
                    NULLIFY(tmpadata%next)
                end if
                tmpanimal%adata => tmpadata
            end if
        end if
    end do

    ! write the xml structure to a file
    open(unit=5, file='output.xml')
    tmpanimal => head
    write (5,*) "<interbeef>"
    NULLIFY(head)
    do while (ASSOCIATED(tmpanimal))
        write (5,*) "<animal id="""//TRIM(tmpanimal%aid)//""" name="""//TRIM(tmpanimal%aname)//""">"
        tmpadata => tmpanimal%adata
        if (ASSOCIATED(tmpadata)) then
            write (5,*) "<adata>"
            do while (ASSOCIATED(tmpadata))
                write (5,*) "<"//TRIM(tmpadata%aname)//">"//TRIM(tmpadata%avalue)//"</"//TRIM(tmpadata%aname)//">"
                tmpadata => tmpadata%next
            end do
            write (5,*) "</adata>"
        else
            write (5,*) "<adata/>"
        end if
        write (5,*) "</animal>"
        tmpanimal => tmpanimal%next
    end do
    write (5,*) "</interbeef>"

stop
end
}}}

==== Python example of creating our XML file from the source files ====

This python program takes the two source files and produces the correct XML file from it. It utilizes the ElementTree Python XML module (which is Python standard module since version 2.5) to create an XML structure which is then written out to a file.

{{{#!highlight python

#!/usr/bin/python2

import xml.etree.ElementTree as ET # ElementTree is a good, python xml parser

# open and read the source files

s1 =open('source_id.dat','r')
s2 =open('source_ad.dat','r')

# build a data structure with animals from the animal ID file

animals={}
for line in s1:
    aid=line[0:19]
    name=line[20:].strip()

    animals[aid]=[{}, name] # the aid is tied to an empty dictionary for associated data and the name of the animal

# add associated data to the data structure

for line in s2:
    aid, adname, adval = line.strip('\n').split(',')
    if aid not in animals:
        print 'Warning: animal {aid} not in animal ID file!'.format(aid=aid)

    animals[aid][0].update({adname:adval})

# create and write out the xml file from the data structure

root = ET.Element('interbeef') # create the root XML element (called interbeef)
for animal,adata in animals.iteritems():
    xmlanimal=ET.SubElement(root,'animal') # add the animal element to the root element
    xmlanimal.attrib['id']=animal # add the id as an attribute to the animal element
    xmlanimal.attrib['name']=adata[1] # add the name of the animal as an attribute to the animal element

    xmladata = ET.SubElement(xmlanimal, 'adata') # add an associated data element (called adata) to the animal element

    # add all the associated data as child elements to the animal's adata element
    for name, value in adata[0].iteritems():
        xmladatavalue = ET.SubElement(xmladata, name) # add the name of the associated data as an element
        xmladatavalue.text = value # add the value as the content/text of the element

# write out the xml file
xmlfile = open('output.xml','w')
xmlfile.write(ET.tostring(root,'UTF-8'))
xmlfile.close()
}}}

==== Fortran example of reading our XML file ====

This fortran program utilizes the [[http://xml-fortran.sourceforge.net/|XML Fortran's]] xmlparser module to read and parse the xml file, to then conveniently loop through the elements to find the correct one. The program prints out the animal with the animal ID submitted as the first argument to it, and its associated data.
In order to compile the program, it must be compiled together with the xmlparse.f90 file from the XML Fortran project.

{{{#!highlight fortran

program readxml

use xmlparse ! load the xml parsing module

character(50) :: aid, tag, xmlaid, xmlname
type(XML_PARSE) :: info
character(len=80),dimension(1:2,1:20) :: attribs
integer :: no_attribs
logical :: endtag
character(len=200),dimension(1:100) :: data
integer :: no_data
integer :: i

call getarg(1, aid) ! get the first argument, ie. the animal id to search for
call xml_open(info,"output.xml", .true.) ! open the xml document and parse it
call xml_options(info, ignore_whitespace = .true.) ! set xml options to ignore whitespace

do

    call xml_get(info,tag,endtag,attribs,no_attribs,data,no_data) ! get an element from xml structure

    if (tag=="animal") then
        do i=1,no_attribs ! get animal animal attributes; id and name
            if (attribs(1,i)=="id") xmlaid=attribs(2,i)
            if (attribs(1,i)=="name") xmlname=attribs(2,i)
        end do
        
        if (xmlaid==aid) then ! check if the animal is the one we search for
            write (*,*) "Animal ID: "//xmlaid
            write (*,*) "Animal name: "//xmlname
            
            call xml_get(info,tag,endtag,attribs,no_attribs,data,no_data) ! get the animals associated data element
            if (tag=="adata" .and. .not. endtag) then
                do ! loop through the associated data elements and display them
                    call xml_get(info,tag,endtag,attribs,no_attribs,data,no_data)
                    if ((tag=="adata" .and. endtag) .or. (.not. xml_ok(info))) exit
                    if (.not. endtag) write (*,*) " " // trim(tag) // " : " // trim(data(1))
                end do
            end if
        end if
        xmlaid=""
        xmlname=""
    
    end if
        
    if (.not. xml_ok(info)) exit ! exit the loop at the end of the xml structure
    
end do

end program

}}}

==== Python example of reading our XML file ====

This Python program does the exact same thing as the fortran code above, but because of the efficient design of the ElementTree XML module and the use of Xpaths to find the correct data elements, the code required is much less. The code finds and prints the data of the animal with the animal id supplied as the first argument.

{{{#!highlight python

#!/usr/bin/python

from sys import argv
from xml.etree import ElementTree

# open the xml file and parse it
with open('poutput.xml', 'rt') as f:
    tree = ElementTree.parse(f) # parse the xmlfile

node=tree.find('animal[@id="{aid}"]'.format(aid=argv[1])) # use an xpath to find the correct animal in the xml file

if node is not None: # print out the animal data if the animal was found
    print "Animal ID: {aid}".format(aid=node.attrib['id'])
    print "Animal name: {name}".format(name=node.attrib['name'])
    print "Animal associated data:"

    # get all associated data
    for adata in node.findall('adata/*'): # loop through all child-elements of the adata element
        print " {adataname} : {adatavalue}".format(adataname=adata.tag, adatavalue=adata.text)
}}}

XML tutorials and examples

Introduction

When fixed character width data file formats become too complex, with for example the number of columns depending on values in previous columns or from data in other files, the effort to write parsers to read and write the information and make sure it is correctly understood becomes very time- and resource consuming. One possible solution is to use an XML format for the data exchange files, in order to draw from the large amount of packages and libraries already written to parse such documents. In this page you will find links and examples that will help with getting familiar with the XML format and how to write tools utilizing it.

XML Overview

See the Wikipedia entry for XML for a description of the XML markup language, link here. Especially take note of the terminology of the different parts of the XML structure.

XML tutorials

An XML tutorial focusing on the web aspect: W3 Schools XML tutorial

A video XML tutorial on youtube: XML basics

Advanced, in-depth tutorial on XML: http://www.xmlmaster.org/en/article/d01/

Xpath

Xpath is a powerful standard for querying XML files, very much like querying a database. For a tutorial on xpaths see W3 Schools Xpath tutorial.

XML concrete example

In our hypothetical example we have two source files, one with the IDs and name of animals, and another one with associated data. In our final XML file the information in these two files should be packed together in one, single XML file. The structures of the source files are very simple.

Our source files

Animal ID file

This file is a fixed width file with two columns, the first being the animal ID (19 characters) and the second being the animal name (30 characters). Every line contains only one animal, which has to be unique within the file. Our example source file, which we can call source_id.dat, looks like this:

LIMFRAF001521469226 Rosa
HOLUSAF000017059414 Bossy
AANFINF000006314316 Muhmuh
HERFINF000003266465 Greta
LIMFRAF001930958553 Linda
CHAFINM000008365662 Cowlin
LIMFRAM003150038969 Bryan
CHAFRAF002350102162 Linda
SIMDEUF000922204654 Angel
HOLDEUF001006117458 Hermione
CHAFRAF004303055320 Samantha
HERFINM000008405652 Frodo

Associated data file

This is a 3-column comma separated file, with the first column being the animal ID, the second one being the name of the associated data and the third one being the data itself. An animal may have many pieces of data associated with it. Our example source file, called source_ad.dat, is:

HOLUSAF000017059414,recommendation,medium
CHAFRAF002350102162,recommendation,rare
HOLUSAF000017059414,height,180
CHAFINM000008365662,height,200
LIMFRAM003150038969,height,205
CHAFRAF002350102162,height,175
SIMDEUF000922204654,height,190
HOLDEUF001006117458,height,180
CHAFRAF004303055320,height,175
HERFINM000008405652,recommendation,med-rare
HOLUSAF000017059414,offspring,4
AANFINF000006314316,offspring,5
HERFINF000003266465,offspring,2
LIMFRAF001930958553,offspring,0
CHAFINM000008365662,offspring,45
LIMFRAM003150038969,offspring,30
CHAFRAF002350102162,offspring,3
SIMDEUF000922204654,offspring,10
HOLDEUF001006117458,offspring,7
CHAFRAF004303055320,offspring,3
HERFINM000008405652,offspring,18
LIMFRAF001930958553,attitude,friendly
SIMDEUF000922204654,attitude,touchy
CHAFRAF002350102162,attitude,classy
AANFINF000006314316,attitude,angry
HOLDEUF001006117458,attitude,curious

Our final XML file

This is the output of the final XML file, when it has merged the information in the two source files:

   1 <interbeef><animal id="LIMFRAF001521469226" name="Rosa"><adata /></animal><animal id="HOLDEUF001006117458" name="Hermione"><adata><attitude>curious</attitude><offspring>7</offspring><height>180</height></adata></animal><animal id="AANFINF000006314316" name="Muhmuh"><adata><attitude>angry</attitude><offspring>5</offspring></adata></animal><animal id="HERFINM000008405652" name="Frodo"><adata><offspring>18</offspring><recommendation>med-rare</recommendation></adata></animal><animal id="SIMDEUF000922204654" name="Angel"><adata><attitude>touchy</attitude><offspring>10</offspring><height>190</height></adata></animal><animal id="HOLUSAF000017059414" name="Bossy"><adata><height>180</height><offspring>4</offspring><recommendation>medium</recommendation></adata></animal><animal id="CHAFRAF002350102162" name="Linda"><adata><attitude>classy</attitude><height>175</height><offspring>3</offspring><recommendation>rare</recommendation></adata></animal><animal id="CHAFINM000008365662" name="Cowlin"><adata><offspring>45</offspring><height>200</height></adata></animal><animal id="LIMFRAF001930958553" name="Linda"><adata><attitude>friendly</attitude><offspring>0</offspring></adata></animal><animal id="CHAFRAF004303055320" name="Samantha"><adata><offspring>3</offspring><height>175 </height></adata></animal><animal id="LIMFRAM003150038969" name="Bryan"><adata><offspring>30</offspring><height>205</height></adata></animal><animal id="HERFINF000003266465" name="Greta"><adata><offspring>2</offspring></adata></animal></interbeef>

After the compact version above has been pretty printed for easier viewing (with the structure intact) it looks like:

   1 <interbeef>
   2   <animal id="LIMFRAF001521469226" name="Rosa">
   3     <adata />
   4   </animal>
   5   <animal id="HOLDEUF001006117458" name="Hermione">
   6     <adata>
   7       <attitude>curious</attitude>
   8       <offspring>7</offspring>
   9       <height>180</height>
  10     </adata>
  11   </animal>
  12   <animal id="AANFINF000006314316" name="Muhmuh">
  13     <adata>
  14       <attitude>angry</attitude>
  15       <offspring>5</offspring>
  16     </adata>
  17   </animal>
  18   <animal id="HERFINM000008405652" name="Frodo">
  19     <adata>
  20       <offspring>18</offspring>
  21       <recommendation>med-rare</recommendation>
  22     </adata>
  23   </animal>
  24   <animal id="SIMDEUF000922204654" name="Angel">
  25     <adata>
  26       <attitude>touchy</attitude>
  27       <offspring>10</offspring>
  28       <height>190</height>
  29     </adata>
  30   </animal>
  31   <animal id="HOLUSAF000017059414" name="Bossy">
  32     <adata>
  33       <height>180</height>
  34       <offspring>4</offspring>
  35       <recommendation>medium</recommendation>
  36     </adata>
  37   </animal>
  38   <animal id="CHAFRAF002350102162" name="Linda">
  39     <adata>
  40       <attitude>classy</attitude>
  41       <height>175</height>
  42       <offspring>3</offspring>
  43       <recommendation>rare</recommendation>
  44     </adata>
  45   </animal>
  46   <animal id="CHAFINM000008365662" name="Cowlin">
  47     <adata>
  48       <offspring>45</offspring>
  49       <height>200</height>
  50     </adata>
  51   </animal>
  52   <animal id="LIMFRAF001930958553" name="Linda">
  53     <adata>
  54       <attitude>friendly</attitude>
  55       <offspring>0</offspring>
  56     </adata>
  57   </animal>
  58   <animal id="CHAFRAF004303055320" name="Samantha">
  59     <adata>
  60       <offspring>3</offspring>
  61       <height>175</height>
  62     </adata>
  63   </animal>
  64   <animal id="LIMFRAM003150038969" name="Bryan">
  65     <adata>
  66       <offspring>30</offspring>
  67       <height>205</height>
  68     </adata>
  69   </animal>
  70   <animal id="HERFINF000003266465" name="Greta">
  71     <adata>
  72       <offspring>2</offspring>
  73     </adata>
  74   </animal>
  75 </interbeef>

Programming examples of creating/reading these XML files

The following programs are just simple coding examples on how to use XML to write/read the data above. They are not meant to be full-fledged applications to be used in a production environment, as they lack error handling, efficiency optimization and other stuff. They are only here to show the fundamental logic of dealing with XML.

With the exception of the first program, the examples utilize prewritten modules/libraries to handle the XML files, so a lot of the heavy lifting has already been done. Note that even though this is a really simple example of a data structure, the amount of code needed for XML reading is still the same or less than the code needed to read the data from the flat files. With increased complexity of the data structure, the benefits of using the XML format would shine through even more.

Fortran example of creating our XML file from the source files

This program uses plain fortran to read the source files and write out an XML file with the data.

   1 program merge
   2 
   3     character(30) :: aid,aname, tmpaname
   4     character(30) :: aidad, adname, adval
   5 
   6     integer :: ios=0, first=0
   7 
   8     ! declare an associated data structure as a linked list structure
   9     TYPE Adata
  10         character(30) :: aname
  11         character(30) :: avalue
  12         TYPE(Adata), POINTER :: next
  13     END TYPE Adata
  14 
  15     ! declare an animal data structure as a linked list structure
  16     TYPE :: Animal
  17         character(30) :: aname
  18         character(19) :: aid
  19         TYPE(Animal), POINTER :: next
  20         TYPE(Adata), POINTER :: adata
  21     END TYPE Animal
  22 
  23     TYPE(Animal), POINTER :: head, tmpanimal
  24     TYPE(Adata), POINTER :: tmpadata
  25 
  26     ! open files
  27     open(unit=2, file='source_id.dat')
  28     open(unit=3, file='source_ad.dat')
  29 
  30     ! read animal data into the animal data structure
  31     NULLIFY(head)
  32     do while (ios==0)
  33         read (2,*, iostat=ios) aid, aname
  34         if (ios==0) then
  35             ALLOCATE(tmpanimal)
  36             tmpanimal%aname=aname
  37             tmpanimal%aid=aid
  38             NULLIFY(tmpanimal%adata)
  39             if (ASSOCIATED(head)) then
  40                 tmpanimal%next => head
  41             else
  42                 NULLIFY(tmpanimal%next)
  43             end if
  44             head => tmpanimal
  45         end if
  46     end do
  47 
  48     ! read associated data into the animal dat structure
  49     ios=0
  50     do while (ios==0)
  51         read (3,*, iostat=ios) aidad, adname, adval
  52         if (ios==0) then
  53             tmpanimal => head
  54             do while (ASSOCIATED(tmpanimal))
  55                 if (aidad==tmpanimal%aid) then
  56                     exit
  57                 end if
  58                 tmpanimal => tmpanimal%next
  59             end do
  60             if (ASSOCIATED(tmpanimal)) then
  61                 ALLOCATE(tmpadata)
  62                 tmpadata%aname=adname
  63                 tmpadata%avalue=adval
  64                 if (ASSOCIATED(tmpanimal%adata)) then
  65                     tmpadata%next => tmpanimal%adata
  66                 else
  67                     NULLIFY(tmpadata%next)
  68                 end if
  69                 tmpanimal%adata => tmpadata
  70             end if
  71         end if
  72     end do
  73 
  74     ! write the xml structure to a file
  75     open(unit=5, file='output.xml')
  76     tmpanimal => head
  77     write (5,*) "<interbeef>"
  78     NULLIFY(head)
  79     do while (ASSOCIATED(tmpanimal))
  80         write (5,*) "<animal id="""//TRIM(tmpanimal%aid)//""" name="""//TRIM(tmpanimal%aname)//""">"
  81         tmpadata => tmpanimal%adata
  82         if (ASSOCIATED(tmpadata)) then
  83             write (5,*) "<adata>"
  84             do while (ASSOCIATED(tmpadata))
  85                 write (5,*) "<"//TRIM(tmpadata%aname)//">"//TRIM(tmpadata%avalue)//"</"//TRIM(tmpadata%aname)//">"
  86                 tmpadata => tmpadata%next
  87             end do
  88             write (5,*) "</adata>"
  89         else
  90             write (5,*) "<adata/>"
  91         end if
  92         write (5,*) "</animal>"
  93         tmpanimal => tmpanimal%next
  94     end do
  95     write (5,*) "</interbeef>"
  96 
  97 stop
  98 end

Python example of creating our XML file from the source files

This python program takes the two source files and produces the correct XML file from it. It utilizes the ElementTree Python XML module (which is Python standard module since version 2.5) to create an XML structure which is then written out to a file.

   1 #!/usr/bin/python2
   2 
   3 import xml.etree.ElementTree as ET # ElementTree is a good, python xml parser
   4 
   5 # open and read the source files
   6 
   7 s1 =open('source_id.dat','r')
   8 s2 =open('source_ad.dat','r')
   9 
  10 # build a data structure with animals from the animal ID file
  11 
  12 animals={}
  13 for line in s1:
  14     aid=line[0:19]
  15     name=line[20:].strip()
  16 
  17     animals[aid]=[{}, name] # the aid is tied to an empty dictionary for associated data and the name of the animal
  18 
  19 # add associated data to the data structure
  20 
  21 for line in s2:
  22     aid, adname, adval = line.strip('\n').split(',')
  23     if aid not in animals:
  24         print 'Warning: animal {aid} not in animal ID file!'.format(aid=aid)
  25 
  26     animals[aid][0].update({adname:adval})
  27 
  28 # create and write out the xml file from the data structure
  29 
  30 root = ET.Element('interbeef') # create the root XML element (called interbeef)
  31 for animal,adata in animals.iteritems():
  32     xmlanimal=ET.SubElement(root,'animal') # add the animal element to the root element
  33     xmlanimal.attrib['id']=animal # add the id as an attribute to the animal element
  34     xmlanimal.attrib['name']=adata[1] # add the name of the animal as an attribute to the animal element
  35 
  36     xmladata = ET.SubElement(xmlanimal, 'adata') # add an associated data element (called adata) to the animal element
  37 
  38     # add all the associated data as child elements to the animal's adata element
  39     for name, value in adata[0].iteritems():
  40         xmladatavalue = ET.SubElement(xmladata, name) # add the name of the associated data as an element
  41         xmladatavalue.text = value # add the value as the content/text of the element
  42 
  43 # write out the xml file
  44 xmlfile = open('output.xml','w')
  45 xmlfile.write(ET.tostring(root,'UTF-8'))
  46 xmlfile.close()

Fortran example of reading our XML file

This fortran program utilizes the XML Fortran's xmlparser module to read and parse the xml file, to then conveniently loop through the elements to find the correct one. The program prints out the animal with the animal ID submitted as the first argument to it, and its associated data. In order to compile the program, it must be compiled together with the xmlparse.f90 file from the XML Fortran project.

   1 program readxml
   2 
   3 use xmlparse ! load the xml parsing module
   4 
   5 character(50) :: aid, tag, xmlaid, xmlname
   6 type(XML_PARSE) :: info
   7 character(len=80),dimension(1:2,1:20) :: attribs
   8 integer :: no_attribs
   9 logical :: endtag
  10 character(len=200),dimension(1:100) :: data
  11 integer :: no_data
  12 integer :: i
  13 
  14 call getarg(1, aid) ! get the first argument, ie. the animal id to search for
  15 call xml_open(info,"output.xml", .true.) ! open the xml document and parse it
  16 call xml_options(info, ignore_whitespace = .true.) ! set xml options to ignore whitespace
  17 
  18 do
  19 
  20     call xml_get(info,tag,endtag,attribs,no_attribs,data,no_data) ! get an element from xml structure
  21 
  22     if (tag=="animal") then 
  23         do i=1,no_attribs ! get animal animal attributes; id and name
  24             if (attribs(1,i)=="id") xmlaid=attribs(2,i)
  25             if (attribs(1,i)=="name") xmlname=attribs(2,i)
  26         end do
  27         
  28         if (xmlaid==aid) then ! check if the animal is the one we search for
  29             write (*,*) "Animal ID: "//xmlaid
  30             write (*,*) "Animal name: "//xmlname
  31             
  32             call xml_get(info,tag,endtag,attribs,no_attribs,data,no_data) ! get the animals associated data element
  33             if (tag=="adata" .and. .not. endtag) then
  34                 do  ! loop through the associated data elements and display them
  35                     call xml_get(info,tag,endtag,attribs,no_attribs,data,no_data)
  36                     if ((tag=="adata" .and. endtag) .or. (.not. xml_ok(info))) exit
  37                     if (.not. endtag) write (*,*) "  " // trim(tag) // " : " // trim(data(1))
  38                 end do
  39             end if
  40         end if
  41         xmlaid=""
  42         xmlname=""
  43     
  44     end if
  45         
  46     if (.not. xml_ok(info)) exit ! exit the loop at the end of the xml structure
  47     
  48 end do
  49 
  50 end program

Python example of reading our XML file

This Python program does the exact same thing as the fortran code above, but because of the efficient design of the ElementTree XML module and the use of Xpaths to find the correct data elements, the code required is much less. The code finds and prints the data of the animal with the animal id supplied as the first argument.

   1 #!/usr/bin/python
   2 
   3 from sys import argv
   4 from xml.etree import ElementTree
   5 
   6 # open the xml file and parse it
   7 with open('poutput.xml', 'rt') as f:
   8     tree = ElementTree.parse(f) # parse the xmlfile
   9 
  10 node=tree.find('animal[@id="{aid}"]'.format(aid=argv[1])) # use an xpath to find the correct animal in the xml file
  11 
  12 if node is not None: # print out the animal data if the animal was found
  13     print "Animal ID: {aid}".format(aid=node.attrib['id'])
  14     print "Animal name: {name}".format(name=node.attrib['name'])
  15     print "Animal associated data:"
  16 
  17     # get all associated data
  18     for adata in node.findall('adata/*'): # loop through all child-elements of the adata element
  19         print "  {adataname} : {adatavalue}".format(adataname=adata.tag, adatavalue=adata.text)

public/XMLdigest (last edited 2014-05-06 10:48:58 by Carl Wasserman)