Page 1 of 1

XML input stage

Posted: Mon Aug 10, 2015 5:38 pm
by karry450
Hi Friends

I am reading an xml file using

sequential file-->xml_input---> sequence file

Job is running fine but the out put is like all the records are identical

I have 422 records in the xml file(source) and when I run this job I get 1 single record repeated 422 times.

Any suggestions where I am failing?

Regards
Karry

Posted: Mon Aug 10, 2015 7:09 pm
by eostic
Hard to say....for xmlInput, share with us a snippet of your xml document and share the xpath (descriptions on the output link) for some of your cols that are duplicating and you think they shouldnt be.

Ernie

XML input stage

Posted: Tue Aug 11, 2015 10:41 am
by karry450
source xml :

Code: Select all

<?xml version="1.0" encoding="UTF-8" ?>
<root>
    <ID>1</ID>
    <ID>2</ID>
    <ID>3</ID>
    <UIS>Yes</UIS>
    <UIS></UIS>
    <UIS>Yes</UIS>
    <clubColor>red</clubColor>
    <clubColor>blue</clubColor>
    <clubColor>yellow</clubColor>
    <playerName>Josh R</playerName>
    <playerName>Jeff K</playerName>
    <playerName>Graham S</playerName>
    <codeVersion>1.0.3</codeVersion>
    <username>willekra</username>
    <buildDate>04-Aug-2015 11:09:48</buildDate>
</root>
XPATH:
/root/ID/text()
/root/UIS/text()
/root/clubColor/text()
/root/playerName/text()
/root/codeVersion/text()
/root/username/text()
/root/buildDate/text()

Repeating KEY is on ID.

and the output Im getting is below which is wrong.

ID,"UIS","clubColor","playerName","codeVersion","username","buildDate"
1,"Yes","red","Josh R","1.0.3","willekra","04-Aug-2015 11:09:48"
2,"Yes","red","Josh R","1.0.3","willekra","04-Aug-2015 11:09:48"
3,"Yes","red","Josh R","1.0.3","willekra","04-Aug-2015 11:09:48"

My job design is
sequential file--->xmlinput-----> seqfile

please help if Im missing anything?

Posted: Tue Aug 11, 2015 11:32 am
by eostic
Sadly, that is a pretty poor xml design. There is structure, but it is merely implied, not forced as it could or should be. There ought to be a "player" element, with id, UID, clubcolor, etc. within it...and then multiple of those player elements.

As it stands now, there are multiple, entirely independent repeating units. id....UID.....clubcolor....playerName, etc.

You may be able to play with the xpath or use xslt in the stage, but the quickest and simplest thing is probably just to have one output link (from the same stage) for each of those groups...one column each, and then work out a way to combine them downstream into a single row.

Ernie

Posted: Tue Aug 11, 2015 11:49 am
by karry450
Sorry I don't have a premium membership to lookinto can you please help

Posted: Tue Aug 11, 2015 4:48 pm
by ray.wurlod
karry450 wrote:Sorry I don't have a premium membership to lookinto can you please help

Why not get one? You're over 200 posts, so clearly benefiting from DSXchange. Premium memberships make up the funding mechanism that keeps DSXchange alive.