Record format error when converting XML data

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
NigeGriff
Premium Member
Premium Member
Posts: 46
Joined: Mon Nov 24, 2003 5:46 am

Record format error when converting XML data

Post by NigeGriff »

I am converting XML data using a sequential file stage that is input to the XML input stage but i am getting record format error :- "Sequential_File_10,0: Missing record delimiter "\n", saw EOF instead".

When i use a folder stage in server to describe the same data the job runs succesfully.

Below is the XML data

<?xml version="1.0"?><customers><customer id="55000"><name>Charter Group</name><address><street>100 Main</street><city>Framingham</city><state>MA</state><zip>01701</zip></address><address><street>720 Prospect</street><city>Framingham</city><state>MA</state><zip>01701</zip></address><address><street>120 Ridge</street><state>MA</state><zip>01760</zip></address></customer></customers>

Any ideas?
Thanks
Nigel
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Replace the Sequential stage with an External Source stage that just passes in the full XML file path and let the XML Input stage consume the file directly via the "URL/File path" option.
-craig

"You can never have too many knives" -- Logan Nine Fingers
NigeGriff
Premium Member
Premium Member
Posts: 46
Joined: Mon Nov 24, 2003 5:46 am

Post by NigeGriff »

Firstly, i'm not familiar with the external source stage.

It references 'source program' and 'source method' but not sure how the full XML path is defined using these options?

Secondly, where in the XML input stage is the file path value specified?
Thanks
Nigel
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Our resident XML Guru Ernie Ostic has posted the details in his blog, everything should be there:

http://dsrealtime.wordpress.com/2007/12 ... -a-source/

Basically you use it to do an "ls" to capture what the XML stage needs and then tell that stage where to find the file.
-craig

"You can never have too many knives" -- Logan Nine Fingers
NigeGriff
Premium Member
Premium Member
Posts: 46
Joined: Mon Nov 24, 2003 5:46 am

Post by NigeGriff »

Thanks for that Craig.

Ernie Ostic's document was very informative.

I tried the 'ls' command and it passed just the name of the file without the full path name so i used 'echo' instead i.e. echo opt/dstransfer/dev/credit_sanctioning/xml_input/* ,which provided the full path name for the file as input to the xml input stage using the URL/FilePath option.

This worked when there was just one file in the source directory but when there were 2 i got the followwing message:-

XML_Input_0,0: Warning: CreditSanctioningXMLTest.XML_Input_0: XML input document parsing failed. Reason: Xalan fatal error (publicId: , systemId: , line: 0, column: 0): An exception occurred! Type:RuntimeException, Message:The primary document entity could not be opened. Id=/opt/dstransfer/dev/credit_sanctioning/xml_input/address_details_xml_file.txt /opt/dstransfer/dev/credit_sanctioning/xml_input/address_details_xml_file2.txt

The xml input stage does not seem to recognize that there 2 file pathnames
Thanks
Nigel
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

"echo"? :? Perhaps Ernie will stumble across this and help explain his method. How about trying "find" instead?

Code: Select all

find opt/dstransfer/dev/credit_sanctioning/xml_input -name '*.xml' -print
This, however, may have issues if there are sub-directories with files that match that pattern in play.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Most find commands have an option to limit the number of levels of search.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
lstsaur
Participant
Posts: 1139
Joined: Thu Oct 21, 2004 9:59 pm

Post by lstsaur »

Use the following:
ls opt/dstransfer/dev/credit_sanctioning/xml_input/* | sort

Also, make sure that address_details_xml_file.txt and address_details_xml_file2.txt are in the same format.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

That just gets the filename, we need to supply the full path to the stage.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

ray.wurlod wrote:Most find commands have an option to limit the number of levels of search.
One would think so, but I couldn't find one that worked for me. Perhaps I just need to look more better.
-craig

"You can never have too many knives" -- Logan Nine Fingers
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

I'd have to look thru some more examples to see how/where/why it works and what I may have done to avoid a problem, but if the filename by itself is coming thru correctly on each row, why not just have a transformer where you concatenate the value of a Job Parameter that provides the rest of the detail.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

D'oh. I was trying to come up with a simple way to include the path, since it more than likely was a job parameter, but couldn't for the life of me figure out where to combine that with the filename in the XML Input or External Source world. Transformer, bah. [face palm]
-craig

"You can never have too many knives" -- Logan Nine Fingers
John Smith
Charter Member
Charter Member
Posts: 193
Joined: Tue Sep 05, 2006 8:01 pm
Location: Australia

Post by John Smith »

NigeGriff wrote:Thanks for that Craig.

Ernie Ostic's document was very informative.

I tried the 'ls' command and it passed just the name of the file without the full path name so i used 'echo' instead i.e. echo opt/dstransfer/dev/credit_sanctioning/xml_input/* ,which provided the full path name for the file as input to the xml input stage using the URL/FilePath option.

This worked when there was just one file in the source directory but when there were 2 i got the followwing message:-

XML_Input_0,0: Warning: CreditSanctioningXMLTest.XML_Input_0: XML input document parsing failed. Reason: Xalan fatal error (publicId: , systemId: , line: 0, column: 0): An exception occurred! Type:RuntimeException, Message:The primary document entity could not be opened. Id=/opt/dstransfer/dev/credit_sanctioning/xml_input/address_details_xml_file.txt /opt/dstransfer/dev/credit_sanctioning/xml_input/address_details_xml_file2.txt

The xml input stage does not seem to recognize that there 2 file pathnames
Use the find command to list all the filenames. It's better than using ls because ls may fail (esp in AIX) if the number of files is high.
Send the output of your find command into a file. In the External source stage, all you need to do then is a cat command to read through your file containing the full path name.
Post Reply