Page 1 of 2

XML Input stage

Posted: Tue Dec 17, 2013 12:23 pm
by suryadev
Hello,

I am trying to parse an XML document using
External source--->XML_input---->Sequentialfile

Gave all the properties in the stages and I see no warnings or errors but the output looks different.
Input is <ENTITY>172</ENTITY>
<ACCOUNT>0000000006196</ACCOUNT>
<TRACK>0</TRACK>
<USER_CD>1101</USER_CD>
Output I get is ENTITY:<ENTITY>172</ENTITY>ACCOUNT:<ACCOUNT>0000000006196</ACCOUNT>TRACK:<TRACK>0</TRACK>USER_CD:<USER_CD>1101</USER_CD>

I was expecting ENTITY:172
ACCOUNT:0000000006196
TRACK:0
USER_CD:1101

can someone help me why am I getting the tags also in the output?

Thank you!

Posted: Tue Dec 17, 2013 2:42 pm
by ray.wurlod
Describe the XPath expressions and other settings you have used in the XML Input stage.

Posted: Tue Dec 17, 2013 3:28 pm
by suryadev
XML input stage:
Under stage->transformation Settings->selected repetition element required
Input->XMLsource->gave filename and selected URL/Filepath
-->columns--->gave filename with varchar (no) an length 255
output->transformation Settings->selected repetition element required
---> columns -gave all the columns except the top one

Now after trying some options the output looks like
ENTITY
<ENTITY>172</ENTITY>
ACCOUNT
<ACCOUNT>0000000006196</ACCOUNT>
TRACK
<TRACK>0</TRACK>
USER_CD
<USER_CD>1101</USER_CD>
which is the column name is good with entity or account or track or user_code and the records are also under the corresponding column but tags are not removed. Not sure what the reason could be :?

Posted: Tue Dec 17, 2013 6:20 pm
by suryadev
There is some issue with the main tag which is the ID. Now after using it as ID I got the right output when pulling the XML from file which has its data loaded from a table.

When reading the same CLOB field from the DB directly, it is throwing an error which says "Xalan fatal error" due to Invalid document structure"
The job design is DBstage--->Transformer---->XMLinput----->Sequentialfile

I am not using the external source stage.....as there is no path for the table...

Please suggest,Thank you

Posted: Tue Dec 17, 2013 6:53 pm
by eostic
The result you are getting is by design. Sometimes it is useful to retrieve whole tags.

Open your output link and scroll all the way to the right and review the Description property.

The contents are formal "xpath". When it ends in a slash, you get the whole tag. When it ends in /text(), you get the "contents" of the tag.

Ernie

Posted: Tue Dec 17, 2013 7:16 pm
by suryadev
Thank you Ernie!

So if the whole tags are retrieved , I can get the output when pulled from the DB. How do I get the whole tags, when importing do I need to select something?
as you said when I see the description it ends in /text(), please suggest how to retrieve the whole tags?

Posted: Tue Dec 17, 2013 7:22 pm
by ray.wurlod
That's where I was heading when I asked you for your xpath expressions. Ernie pre-empted me. But that's OK - the more people answering questions well, the better.

Posted: Tue Dec 17, 2013 7:52 pm
by chulett
suryadev wrote:please suggest how to retrieve the whole tags?
eostic wrote:When it ends in a slash, you get the whole tag.

Posted: Tue Dec 17, 2013 8:09 pm
by suryadev
I removed the text() after / which made the derivation ending with / but still I get the warning and the records are not passing after XML input stage.

Below are the warnings
XML input document parsing failed.
Xalan fatal error (publicId: , systemId: , line: 2, column: 2): Invalid character (Unicode: 0x0)
XML input document parsing failed.
Xalan fatal error (publicId: , systemId: , line: 1, column: 6): There are more end tags than start tags
XML input document parsing failed.
Xalan fatal error (publicId: , systemId: , line: 2, column: 1): Invalid document structure


Thanks again

Posted: Tue Dec 17, 2013 8:47 pm
by eostic
Which problem are we helping to resolve? Take my post and apply it to your working job in the very first post. That will show you how to get whole tags (what your very first post is getting) vs just the "value" of the tags (what your first post says you wanted).

Ernie

Posted: Tue Dec 17, 2013 10:21 pm
by suryadev
Actually when waiting for the solution for the issue, I tried some options and was updating the output each time I made changes.

Now I get the output from the job in first post as wanted but only when I use External source.My requirement is to get the same output when I read the CLOB field information from the table instead of reading from a sequential file.

Please let me know if I need to open a new post for this, I felt its the same parsing XML from file Vs parsing XML from table so continued....

Posted: Wed Dec 18, 2013 5:53 am
by eostic
First write a job that sends your clob to a sequential file without the xml stage. Use "none" for delimeter and quote character. Look at the resulting xml. Make sure it is clean and can be opened in your browser as well formed xml.

Once you have that working, send into your xmlInput but use the "content" option instead of the url option.

Ernie

Posted: Wed Dec 18, 2013 9:27 am
by suryadev
Thank you very much, will implement the same with two jobs.

So my assumption is XML cannot be parsed directly from the CLOB field of Table?

Posted: Wed Dec 18, 2013 9:31 am
by chulett
Ernie said no such thing. Out of curiosity, has your premium membership expired?

Posted: Wed Dec 18, 2013 10:07 am
by suryadev
The way which Ernie suggested worked for me, loaded the XML CLOB into a file and then parsed it with XML input stage but I have around 15 more transformations to do which should include CLOB column and other columns of the table.

So it needs to be a single job, that is the reason I was asking for parsing technique from the table directly.

Sorry for the confusion, hope you understand!