Size: 13349
Comment:
|
Size: 15294
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 311: | Line 311: |
=== Fortran example of reading our XML file === {{{#!highlight fortran program readxml use xmlparse ! load the xml parsing module character(50) :: aid, tag, xmlaid, xmlname type(XML_PARSE) :: info character(len=80),dimension(1:2,1:20) :: attribs integer :: no_attribs logical :: endtag character(len=200),dimension(1:100) :: data integer :: no_data integer :: i call getarg(1, aid) ! get the first argument, ie. the animal id to search for call xml_open(info,"output.xml", .true.) ! open the xml document and parse it call xml_options(info, ignore_whitespace = .true.) ! set xml options to ignore whitespace do call xml_get(info,tag,endtag,attribs,no_attribs,data,no_data) ! get an element from xml structure if (tag=="animal") then do i=1,no_attribs ! get animal animal attributes; id and name if (attribs(1,i)=="id") xmlaid=attribs(2,i) if (attribs(1,i)=="name") xmlname=attribs(2,i) end do if (xmlaid==aid) then ! check if the animal is the one we search for write (*,*) "Animal ID: "//xmlaid write (*,*) "Animal name: "//xmlname call xml_get(info,tag,endtag,attribs,no_attribs,data,no_data) ! get the animals associated data element if (tag=="adata" .and. .not. endtag) then do ! loop through the associated data elements and display them call xml_get(info,tag,endtag,attribs,no_attribs,data,no_data) if ((tag=="adata" .and. endtag) .or. (.not. xml_ok(info))) exit if (.not. endtag) write (*,*) " " // trim(tag) // " : " // trim(data(1)) end do end if end if xmlaid="" xmlname="" end if if (.not. xml_ok(info)) exit ! exit the loop at the end of the xml structure end do end program }}} |
XML tutorials and examples
Introduction
When fixed character width data file formats become too complex, with for example the number of columns depending on values in previous columns or from data in other files, the effort to write parsers to read and write the information and make sure it is correctly understood becomes very time- and resource consuming. One possible solution is to use an XML format for the data exchange files, in order to draw from the large amount of packages and libraries already written to parse such documents. In this page you will find links and examples that will help with getting familiar with the XML format and how to write tools utilizing it.
XML Overview
See the Wikipedia entry for XML for a description of the XML markup language, link here. Especially take note of the terminology of the different parts of the XML structure.
XML tutorials
An XML tutorial focusing on the web aspect: W3 Schools XML tutorial
A video XML tutorial on youtube: XML basics
Advanced, in-depth tutorial on XML: http://www.xmlmaster.org/en/article/d01/
XML concrete example
In our hypothetical example we have two source files, one with the IDs and name of animals, and another one with associated data. In our final XML file the information in these two files should be packed together in one, single XML file. The structures of the source files are very simple.
Our source files
Animal ID file
This file is a fixed width file with two columns, the first being the animal ID (19 characters) and the second being the animal name (30 characters). Every line contains only one animal, which has to be unique within the file. Our example source file, which we can call source_id.dat, looks like this:
LIMFRAF001521469226 Rosa HOLUSAF000017059414 Bossy AANFINF000006314316 Muhmuh HERFINF000003266465 Greta LIMFRAF001930958553 Linda CHAFINM000008365662 Cowlin LIMFRAM003150038969 Bryan CHAFRAF002350102162 Linda SIMDEUF000922204654 Angel HOLDEUF001006117458 Hermione CHAFRAF004303055320 Samantha HERFINM000008405652 Frodo
Associated data file
This is a 3-column comma separated file, with the first column being the animal ID, the second one being the name of the associated data and the third one being the data itself. An animal may have many pieces of data associated with it. Our example source file, called source_ad.dat, is:
HOLUSAF000017059414,recommendation,medium CHAFRAF002350102162,recommendation,rare HOLUSAF000017059414,height,180 CHAFINM000008365662,height,200 LIMFRAM003150038969,height,205 CHAFRAF002350102162,height,175 SIMDEUF000922204654,height,190 HOLDEUF001006117458,height,180 CHAFRAF004303055320,height,175 HERFINM000008405652,recommendation,med-rare HOLUSAF000017059414,offspring,4 AANFINF000006314316,offspring,5 HERFINF000003266465,offspring,2 LIMFRAF001930958553,offspring,0 CHAFINM000008365662,offspring,45 LIMFRAM003150038969,offspring,30 CHAFRAF002350102162,offspring,3 SIMDEUF000922204654,offspring,10 HOLDEUF001006117458,offspring,7 CHAFRAF004303055320,offspring,3 HERFINM000008405652,offspring,18 LIMFRAF001930958553,attitude,friendly SIMDEUF000922204654,attitude,touchy CHAFRAF002350102162,attitude,classy AANFINF000006314316,attitude,angry HOLDEUF001006117458,attitude,curious
Our final XML file
This is the output of the final XML file, when it has merged the information in the two source files:
1 <interbeef><animal id="LIMFRAF001521469226" name="Rosa"><adata /></animal><animal id="HOLDEUF001006117458" name="Hermione"><adata><attitude>curious</attitude><offspring>7</offspring><height>180</height></adata></animal><animal id="AANFINF000006314316" name="Muhmuh"><adata><attitude>angry</attitude><offspring>5</offspring></adata></animal><animal id="HERFINM000008405652" name="Frodo"><adata><offspring>18</offspring><recommendation>med-rare</recommendation></adata></animal><animal id="SIMDEUF000922204654" name="Angel"><adata><attitude>touchy</attitude><offspring>10</offspring><height>190</height></adata></animal><animal id="HOLUSAF000017059414" name="Bossy"><adata><height>180</height><offspring>4</offspring><recommendation>medium</recommendation></adata></animal><animal id="CHAFRAF002350102162" name="Linda"><adata><attitude>classy</attitude><height>175</height><offspring>3</offspring><recommendation>rare</recommendation></adata></animal><animal id="CHAFINM000008365662" name="Cowlin"><adata><offspring>45</offspring><height>200</height></adata></animal><animal id="LIMFRAF001930958553" name="Linda"><adata><attitude>friendly</attitude><offspring>0</offspring></adata></animal><animal id="CHAFRAF004303055320" name="Samantha"><adata><offspring>3</offspring><height>175 </height></adata></animal><animal id="LIMFRAM003150038969" name="Bryan"><adata><offspring>30</offspring><height>205</height></adata></animal><animal id="HERFINF000003266465" name="Greta"><adata><offspring>2</offspring></adata></animal></interbeef>
After the compact version above has been pretty printed for easier viewing (with the structure intact) it looks like:
1 <interbeef>
2 <animal id="LIMFRAF001521469226" name="Rosa">
3 <adata />
4 </animal>
5 <animal id="HOLDEUF001006117458" name="Hermione">
6 <adata>
7 <attitude>curious</attitude>
8 <offspring>7</offspring>
9 <height>180</height>
10 </adata>
11 </animal>
12 <animal id="AANFINF000006314316" name="Muhmuh">
13 <adata>
14 <attitude>angry</attitude>
15 <offspring>5</offspring>
16 </adata>
17 </animal>
18 <animal id="HERFINM000008405652" name="Frodo">
19 <adata>
20 <offspring>18</offspring>
21 <recommendation>med-rare</recommendation>
22 </adata>
23 </animal>
24 <animal id="SIMDEUF000922204654" name="Angel">
25 <adata>
26 <attitude>touchy</attitude>
27 <offspring>10</offspring>
28 <height>190</height>
29 </adata>
30 </animal>
31 <animal id="HOLUSAF000017059414" name="Bossy">
32 <adata>
33 <height>180</height>
34 <offspring>4</offspring>
35 <recommendation>medium</recommendation>
36 </adata>
37 </animal>
38 <animal id="CHAFRAF002350102162" name="Linda">
39 <adata>
40 <attitude>classy</attitude>
41 <height>175</height>
42 <offspring>3</offspring>
43 <recommendation>rare</recommendation>
44 </adata>
45 </animal>
46 <animal id="CHAFINM000008365662" name="Cowlin">
47 <adata>
48 <offspring>45</offspring>
49 <height>200</height>
50 </adata>
51 </animal>
52 <animal id="LIMFRAF001930958553" name="Linda">
53 <adata>
54 <attitude>friendly</attitude>
55 <offspring>0</offspring>
56 </adata>
57 </animal>
58 <animal id="CHAFRAF004303055320" name="Samantha">
59 <adata>
60 <offspring>3</offspring>
61 <height>175</height>
62 </adata>
63 </animal>
64 <animal id="LIMFRAM003150038969" name="Bryan">
65 <adata>
66 <offspring>30</offspring>
67 <height>205</height>
68 </adata>
69 </animal>
70 <animal id="HERFINF000003266465" name="Greta">
71 <adata>
72 <offspring>2</offspring>
73 </adata>
74 </animal>
75 </interbeef>
Fortran example of creating our XML file from the source files
1 program merge
2
3 character(30) :: aid,aname, tmpaname
4 character(30) :: aidad, adname, adval
5
6 integer :: ios=0, first=0
7
8 ! declare an associated data structure as a linked list structure
9 TYPE Adata
10 character(30) :: aname
11 character(30) :: avalue
12 TYPE(Adata), POINTER :: next
13 END TYPE Adata
14
15 ! declare an animal data structure as a linked list structure
16 TYPE :: Animal
17 character(30) :: aname
18 character(19) :: aid
19 TYPE(Animal), POINTER :: next
20 TYPE(Adata), POINTER :: adata
21 END TYPE Animal
22
23 TYPE(Animal), POINTER :: head, tmpanimal
24 TYPE(Adata), POINTER :: tmpadata
25
26 ! open files
27 open(unit=2, file='source_id.dat')
28 open(unit=3, file='source_ad.dat')
29
30 ! read animal data into the animal data structure
31 NULLIFY(head)
32 do while (ios==0)
33 read (2,*, iostat=ios) aid, aname
34 if (ios==0) then
35 ALLOCATE(tmpanimal)
36 tmpanimal%aname=aname
37 tmpanimal%aid=aid
38 NULLIFY(tmpanimal%adata)
39 if (ASSOCIATED(head)) then
40 tmpanimal%next => head
41 else
42 NULLIFY(tmpanimal%next)
43 end if
44 head => tmpanimal
45 end if
46 end do
47
48 ! read associated data into the animal dat structure
49 ios=0
50 do while (ios==0)
51 read (3,*, iostat=ios) aidad, adname, adval
52 if (ios==0) then
53 tmpanimal => head
54 do while (ASSOCIATED(tmpanimal))
55 if (aidad==tmpanimal%aid) then
56 exit
57 end if
58 tmpanimal => tmpanimal%next
59 end do
60 if (ASSOCIATED(tmpanimal)) then
61 ALLOCATE(tmpadata)
62 tmpadata%aname=adname
63 tmpadata%avalue=adval
64 if (ASSOCIATED(tmpanimal%adata)) then
65 tmpadata%next => tmpanimal%adata
66 else
67 NULLIFY(tmpadata%next)
68 end if
69 tmpanimal%adata => tmpadata
70 end if
71 end if
72 end do
73
74 ! write the xml structure to a file
75 open(unit=5, file='output.xml')
76 tmpanimal => head
77 write (5,*) "<interbeef>"
78 NULLIFY(head)
79 do while (ASSOCIATED(tmpanimal))
80 write (5,*) "<animal id="""//TRIM(tmpanimal%aid)//""" name="""//TRIM(tmpanimal%aname)//""">"
81 tmpadata => tmpanimal%adata
82 if (ASSOCIATED(tmpadata)) then
83 write (5,*) "<adata>"
84 do while (ASSOCIATED(tmpadata))
85 write (5,*) "<"//TRIM(tmpadata%aname)//">"//TRIM(tmpadata%avalue)//"</"//TRIM(tmpadata%aname)//">"
86 tmpadata => tmpadata%next
87 end do
88 write (5,*) "</adata>"
89 else
90 write (5,*) "<adata/>"
91 end if
92 write (5,*) "</animal>"
93 tmpanimal => tmpanimal%next
94 end do
95 write (5,*) "</interbeef>"
96
97 stop
98 end
Python example of creating our XML file from the source files
This python program takes the two source files and produces the correct XML file from it.
1 #!/usr/bin/python2
2
3 import xml.etree.ElementTree as ET # ElementTree is a good, python xml parser
4
5 # open and read the source files
6
7 s1 =open('source_id.dat','r')
8 s2 =open('source_ad.dat','r')
9
10 # build a data structure with animals from the animal ID file
11
12 animals={}
13 for line in s1:
14 aid=line[0:19]
15 name=line[20:].strip()
16
17 animals[aid]=[{}, name] # the aid is tied to an empty dictionary for associated data and the name of the animal
18
19 # add associated data to the data structure
20
21 for line in s2:
22 aid, adname, adval = line.strip('\n').split(',')
23 if aid not in animals:
24 print 'Warning: animal {aid} not in animal ID file!'.format(aid=aid)
25
26 animals[aid][0].update({adname:adval})
27
28 # create and write out the xml file from the data structure
29
30 root = ET.Element('interbeef') # create the root XML element (called interbeef)
31 for animal,adata in animals.iteritems():
32 xmlanimal=ET.SubElement(root,'animal') # add the animal element to the root element
33 xmlanimal.attrib['id']=animal # add the id as an attribute to the animal element
34 xmlanimal.attrib['name']=adata[1] # add the name of the animal as an attribute to the animal element
35
36 xmladata = ET.SubElement(xmlanimal, 'adata') # add an associated data element (called adata) to the animal element
37
38 # add all the associated data as child elements to the animal's adata element
39 for name, value in adata[0].iteritems():
40 xmladatavalue = ET.SubElement(xmladata, name) # add the name of the associated data as an element
41 xmladatavalue.text = value # add the value as the content/text of the element
42
43 # write out the xml file
44 xmlfile = open('output.xml','w')
45 xmlfile.write(ET.tostring(root,'UTF-8'))
46 xmlfile.close()
Fortran example of reading our XML file
1 program readxml
2
3 use xmlparse ! load the xml parsing module
4
5 character(50) :: aid, tag, xmlaid, xmlname
6 type(XML_PARSE) :: info
7 character(len=80),dimension(1:2,1:20) :: attribs
8 integer :: no_attribs
9 logical :: endtag
10 character(len=200),dimension(1:100) :: data
11 integer :: no_data
12 integer :: i
13
14 call getarg(1, aid) ! get the first argument, ie. the animal id to search for
15 call xml_open(info,"output.xml", .true.) ! open the xml document and parse it
16 call xml_options(info, ignore_whitespace = .true.) ! set xml options to ignore whitespace
17
18 do
19
20 call xml_get(info,tag,endtag,attribs,no_attribs,data,no_data) ! get an element from xml structure
21
22 if (tag=="animal") then
23 do i=1,no_attribs ! get animal animal attributes; id and name
24 if (attribs(1,i)=="id") xmlaid=attribs(2,i)
25 if (attribs(1,i)=="name") xmlname=attribs(2,i)
26 end do
27
28 if (xmlaid==aid) then ! check if the animal is the one we search for
29 write (*,*) "Animal ID: "//xmlaid
30 write (*,*) "Animal name: "//xmlname
31
32 call xml_get(info,tag,endtag,attribs,no_attribs,data,no_data) ! get the animals associated data element
33 if (tag=="adata" .and. .not. endtag) then
34 do ! loop through the associated data elements and display them
35 call xml_get(info,tag,endtag,attribs,no_attribs,data,no_data)
36 if ((tag=="adata" .and. endtag) .or. (.not. xml_ok(info))) exit
37 if (.not. endtag) write (*,*) " " // trim(tag) // " : " // trim(data(1))
38 end do
39 end if
40 end if
41 xmlaid=""
42 xmlname=""
43
44 end if
45
46 if (.not. xml_ok(info)) exit ! exit the loop at the end of the xml structure
47
48 end do
49
50 end program
Python example of reading our XML file
1 #!/usr/bin/python
2
3 from sys import argv
4 from xml.etree import ElementTree
5
6 # open the xml file and parse it
7 with open('poutput.xml', 'rt') as f:
8 tree = ElementTree.parse(f) # parse the xmlfile
9
10 node=tree.find('animal[@id="{aid}"]'.format(aid=argv[1])) # use an xpath to find the correct animal in the xml file
11
12 if node is not None: # print out the animal data if the animal was found
13 print "Animal ID: {aid}".format(aid=node.attrib['id'])
14 print "Animal name: {name}".format(name=node.attrib['name'])
15 print "Animal associated data:"
16
17 # get all associated data
18 for adata in node.findall('adata/*'): # loop through all child-elements of the adata element
19 print " {adataname} : {adatavalue}".format(adataname=adata.tag, adatavalue=adata.text)