Need help with reading XML file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Bostonian
Premium Member
Premium Member
Posts: 33
Joined: Fri Nov 20, 2009 3:32 pm

Need help with reading XML file

Post by Bostonian »

Hi,
I need some help as I have never done this before.

DS Version: 9.1 and OS is Windows

I am trying to create a job to read an XML file and load into a NZ table. The XML file is actually set as a .txt file from outside vendors. It looks something like mentioned below.

My question is how do I design the job and what stages shall I use in this scenario?


<?xml version="1.0" encoding="UTF-8"?>
<SOPDataLoad>
<BatchInfo>
<Environment>P</Environment>
<BatchId>00018566</BatchId>
<ReplyTo>status_can@list.volts.com</ReplyTo>
<Date>09/10/2014</Date>
<TableName>sales</TableName>
<SourceSystem>CASALES</SourceSystem>
<Replaces></Replaces>
</BatchInfo>
<Records>
<R>|D9002116|LFF-DSP-S06S| | | |PTF9|ea|002|0|0||no| |NO|0| |0| |0| |hydrofrc|110019|15| | | | | | |002|3110| | | | | | | |222|P|DORMAN|09/25/13| | | | |CDN|1|</R>
.
.
.
</Records>
<RecordCount>
<Count>5034</Count>
</RecordCount>
</SOPDataLoad>
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Just to clarify the challenge here - you've never handled XML in DataStage before or you've never done anything with XML before?

At a very high level, the job design is basically:

External Source stage to pass in the filename to process
XML stage to read the file and parse the input into records
NZ target

You also should have an "xsd" from whomever is providing this file which holds the definitions / metadata of the XML so the stage will know (can be told) how to parse it.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Bostonian
Premium Member
Premium Member
Posts: 33
Joined: Fri Nov 20, 2009 3:32 pm

Post by Bostonian »

Chulett -Thank you very much for the information. No, I have never done anything with XML files before.

The design that you proposed is exactly what I have in my job design. But I was not aware of the XSD file. I will get that as the first thing from the vendor sending this file and see how it goes. Thank you again for the response and help!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Once you have the XSD file, you need to import it as a table definition.
In Designer, Import > Table Definitions > XML Table Definitions
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Bostonian
Premium Member
Premium Member
Posts: 33
Joined: Fri Nov 20, 2009 3:32 pm

Post by Bostonian »

Excellent. Thank you Ray. I will give it a shot tomorrow and update this thread with the results.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Being new I'd suggest you stick with the XML Input stage, this rather than the fancy new XML 'Composer' stage - save that one for when you get more experience under your belt. XML Input is pretty straight-forward.

And what the xsd's metadata will get you are what is known as the XPath Expressions, what the parser uses to break the elements down into fields for your target. There's plenty of information out there in the wild on this as it's not specific to DataStage, found this reference with a quick search, many others are out there for your reading pleasure.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Also wanted to say that if you've got time for some reading, then it would behoove you to hit Ray's site http://www.raywurlod.com/ and download the XML Best Practices document hosted there. It's old and uses Server jobs to illustrate the lessons but the techniques are all sound and just as valid today.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Bostonian
Premium Member
Premium Member
Posts: 33
Joined: Fri Nov 20, 2009 3:32 pm

Post by Bostonian »

Thank you, Chulett. I did download the Best practices doc from Ray's website and I am reading it now.

Here is an update on my issue (and it still exists :( )
Got the XSD file layout and imported to the XML definitions as Ray suggested. Then used External Source--> XML Input --> Peek design to process the file. It just fails with the error message "Import Error at Record 0". Another thing to note is that the data in the XML file is coming as 1 single column. In other words, every record in the XML file is considered as a single column per the XSD file.


XSD File layout:

Code: Select all

<?xml version="1.0"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="SOPDataLoad">
    <xs:complexType>
      <xs:sequence>
        <xs:element minOccurs="0" name="BatchInfo">
          <xs:complexType>
            <xs:sequence>
              <xs:element minOccurs="0" name="Environment" type="xs:string" />
              <xs:element minOccurs="0" name="BatchId" type="xs:unsignedInt" />
             <xs:element minOccurs="0" name="ReplyTo" type="xs:string" />
              <xs:element minOccurs="0" name="Date" type="xs:string" />
              <xs:element minOccurs="0" name="TableName" type="xs:string" />
              <xs:element minOccurs="0" name="SourceSystem" type="xs:string" />
              <xs:element minOccurs="0" name="Replaces" />
            </xs:sequence>
          </xs:complexType>
        </xs:element>
        <xs:element minOccurs="0" name="Records">
          <xs:complexType>
            <xs:sequence>
              <xs:element minOccurs="0" maxOccurs="unbounded" name="R" type="xs:string" />
            </xs:sequence>
          </xs:complexType>
        </xs:element>
        <xs:element minOccurs="0" name="RecordCount">
          <xs:complexType>
            <xs:sequence>
              <xs:element minOccurs="0" name="Count" type="xs:unsignedInt" />
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

As Craig recommends above regarding the xmlInput Stage, here's what you want to do:

Get a formal "instance" of the document. The xsd is nice, but not needed here.

Go to import....tabledefs...xml table definitions......completely expand the tree.......then check the lowest level boxes with "T" next to them...this generates a list of columns at the bottom of the screen...Save the tabledefinition somewhere. Do this for ONE repeating node for now, so you can see how things work. Mark any one of the columns on that node as "key". This is called a "repetition element".

This will go on the "output" link of the xmlInput Stage...and your single column on the input link.

You should be well on your way....

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Huh... always thought it was best to use the xsd rather than the actual file for metadata as the file may not have all occurrences of elements in it. I guess you solve that with asking for a 'formal instance' of the file? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Bostonian
Premium Member
Premium Member
Posts: 33
Joined: Fri Nov 20, 2009 3:32 pm

Post by Bostonian »

Thank you, Guys. But I am definitely doing something wrong as I still can't make it work.

Here is what I did.

- Imported the XML file into XML Table definition. In the XML Meta Data Importer window, I expanded all the sections. Under "Records", there was "R" and under that there was "TEXT" with T next to it. I selected only that.

- On the job palette, Put 3 stages. External_Source ---> XML_Input--->Peek

- On External Source stage, set Source Method = Program File. Source_Program = C:\test.xml. On format page, I set the Delimiter = None, Quote = None and Final Delimiter = End. On Column section, I added just 1 row as "file_name" and declared it as varchar(255).

- On XML_Input stage, I let the Stage tab with all default values. Under Input tab, selected "file_name" as XML_Source_Column. Selected XML Document as column content. Under output tab, loaded the imported XML Definition from step 1. There is only column with "R" and key is set to yes.

When I run the job, it fails with the below message,

Warning: External_Source_31,14: Source subproc: syntax error: got EOF, expecting Word
Error: External_Source_31,14: Filter status 1;
filter process failed: 1;
import error at record 0.
External_Source_31,14: Import error at record 0.

Trying to find out what am I doing wrong here... :shock:

I can put some screen shot if it can help anyone understand the issue better.
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

I have a detailed blog post on how to use the external source stage at www.dsrealtime.com. find the table of contents and look for posts on xml.

In short the ext sourxe stage needa a program.......put one column on the output lonk called fikenames with varchar and kength of 250. Use the unix lust command in the stage. Like..... ls /tmp/*.xml.

The select the url option on the input link to the xmlinput syage. That will get you closer to your goal but also read that blog post.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Bostonian
Premium Member
Premium Member
Posts: 33
Joined: Fri Nov 20, 2009 3:32 pm

Post by Bostonian »

Hello Eostic,
I did read your post with the title "Reading XML Content as a Source".

Are you suggesting to use unix ls command on Windows system which is my OS? If so, I tried that too.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The ls command ought to work, as you have MKS Toolkit installed. But there's no reason you couldn't use the DIR command, ideally with the /B switch.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Bostonian
Premium Member
Premium Member
Posts: 33
Joined: Fri Nov 20, 2009 3:32 pm

Post by Bostonian »

Actually, when I tried to give "ls C:\TEST.xml" without quotes, I get the following error. The source method is Program Files

External_Source_31: Could not open source list file "ls C: est\dim_Item-TEST.xml"

should I give it in quotes?

{EDIT}: The path I give is "ls C:\XML\TEST.xml" without quotes and the error is External_Source_31: Could not open source list file "ls C:\XML\TEST.xml"
Post Reply