Reading data from an XML file

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
ThilSe
Participant
Posts: 80
Joined: Thu Jun 09, 2005 7:45 am

Reading data from an XML file

Post by ThilSe »

Hi,

I want to read the data from an XML file.

I created a job like the one below.

Folder------------->XMLInput--------->Transformer----------------->SeqFile
Stage

The input XML file is

Code: Select all

<?xml version="1.0" encoding="UTF-8" ?>
<root>
<a> ASK </a>
<b> BSK </b>
</root>
I need to extract the value enclosed in the tags and write into the file.
Op Reqd:
  • ASK
    BSK
When I imported the metadata from this file, I got the following metadata:
  • Column --->SQLType----> Description
    root--->Unknown--------->/root
    a------>Varchar(255)---->/root/a/#PCDATA
    b------>Varchar(255)---->/root/b/#PCDATA
I have set 'b' as key.

When i execute the job I get the following error.
Unexpected token!pattern = '#PCDATA'(Unknown URI, 50, 34)
Remaining tokens: ('#PCDATA')
Please guide me in this issue.

Thanks
Senthil
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Not an XML expert by any stretch, but that XPath information it imported looks wrong. Try a couple of things.

Change the XPath bits in the Description field from #PCDATA to just text() and see if that works. Also, only select a field as a Key if it is a repeating element, if you always just get simple pairs like that you shouldn't need to mark either of them as a key.

Give that a shot.
-craig

"You can never have too many knives" -- Logan Nine Fingers
gpatton
Premium Member
Premium Member
Posts: 47
Joined: Mon Jan 05, 2004 8:21 am

Post by gpatton »

you should use the #PCDATA tag.

Make sure the tag is fully qualified.

Do not set the key until you write the file in the output of the transformer.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

gpatton wrote:Make sure the tag is fully qualified.
You should probably explain what that means, g.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ThilSe
Participant
Posts: 80
Joined: Thu Jun 09, 2005 7:45 am

Post by ThilSe »

Chulett,
Change the XPath bits in the Description field from #PCDATA to just text()
I tried using text() instead of #PCDATA. It runs successfully.

I thank all of you for your inputs and time!

Thanks
Senthil
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Still curious what the #PCDATA tag is supposed to mean. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ThilSe
Participant
Posts: 80
Joined: Thu Jun 09, 2005 7:45 am

Post by ThilSe »

Hi,

PCDATA means parsed character data.

It is the text found between the start tag and the end tag of an XML element.This text will be parsed by a parser.

eg.
<Details>
<name>Senthil</name>
<address>
<street>10 th main road</street>
<city>Chennai</city>
</address>
</Details>

If <address> is defined as #PCDATA and the tags <city>,<Street> are defined, then the tags <city>,<Street> will be parsed by XML parser and expanded.

If <address> is defined as CDATA, then
<street>10 th main road</street><city>Chennai</city>
will be treated as text. The tags will <street> and <city> will not be identified by the XML parser.

Hope this clarifies.

More info can be found at
http://www.w3schools.com/dtd/dtd_building.asp


Thanks
Senthil
aartlett
Charter Member
Charter Member
Posts: 152
Joined: Fri Apr 23, 2004 6:44 pm
Location: Australia

Post by aartlett »

Senthil,
People here may think I;m not a large advocate of the Datastage XML system, and they'd probably be right :). I like it for very little amounts of data, or from data coming in as a feed rather than a static source.

My preference in your situation would be a XSLT translator. This allows you to create your seq. files directly from the XML without datastage at all. Saxxon is one I have used and there are others out there for most platforms.

If you do need to use the D/S XML then the previous suggestions should get you going. The metadata handling is one of the reasons I really dislike the D/S XML.

<<end of transmission>>
Andrew

Think outside the Datastage you work in.

There is no True Way, but there are true ways.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

You are spot on, the XML Input and XML Output stages sit in the "Real Time" folder for a reason, they are better at handling small volumes. Good suggestion on the XSLT translator, I will have to check it out.
aartlett
Charter Member
Charter Member
Posts: 152
Joined: Fri Apr 23, 2004 6:44 pm
Location: Australia

Post by aartlett »

Vince,
Have a look at the XSLT's on the apache web site. I think they were supplied in part by IBM. The licence allows comercial use so long as no money is charged further on (it's either a GPL or the apache one, I can't remember).

The XML stages are great for a MQ feed, like you said, real time.

My last gig I changed 45 jobs running D/S XML that ran for 2.5 - 3 hours to java based XSLT (used some awk scripts and the DDL to create the XSLT files) to run 15 at a time for and end to end of 20 minutes out of 3 XML files. CPU ran about 95% (a wasted cpu cycle is a lost cpu cycle). This could have been reduced if I used the C++ version, but I couldn't get the admins to load the libraries I needed, while the Java I could fo it myself.
Andrew

Think outside the Datastage you work in.

There is no True Way, but there are true ways.
Post Reply