Page 1 of 1
XML input stage
Posted: Mon Aug 10, 2015 5:38 pm
by karry450
Hi Friends
I am reading an xml file using
sequential file-->xml_input---> sequence file
Job is running fine but the out put is like all the records are identical
I have 422 records in the xml file(source) and when I run this job I get 1 single record repeated 422 times.
Any suggestions where I am failing?
Regards
Karry
Posted: Mon Aug 10, 2015 7:09 pm
by eostic
Hard to say....for xmlInput, share with us a snippet of your xml document and share the xpath (descriptions on the output link) for some of your cols that are duplicating and you think they shouldnt be.
Ernie
XML input stage
Posted: Tue Aug 11, 2015 10:41 am
by karry450
source xml :
Code: Select all
<?xml version="1.0" encoding="UTF-8" ?>
<root>
<ID>1</ID>
<ID>2</ID>
<ID>3</ID>
<UIS>Yes</UIS>
<UIS></UIS>
<UIS>Yes</UIS>
<clubColor>red</clubColor>
<clubColor>blue</clubColor>
<clubColor>yellow</clubColor>
<playerName>Josh R</playerName>
<playerName>Jeff K</playerName>
<playerName>Graham S</playerName>
<codeVersion>1.0.3</codeVersion>
<username>willekra</username>
<buildDate>04-Aug-2015 11:09:48</buildDate>
</root>
XPATH:
/root/ID/text()
/root/UIS/text()
/root/clubColor/text()
/root/playerName/text()
/root/codeVersion/text()
/root/username/text()
/root/buildDate/text()
Repeating KEY is on ID.
and the output Im getting is below which is wrong.
ID,"UIS","clubColor","playerName","codeVersion","username","buildDate"
1,"Yes","red","Josh R","1.0.3","willekra","04-Aug-2015 11:09:48"
2,"Yes","red","Josh R","1.0.3","willekra","04-Aug-2015 11:09:48"
3,"Yes","red","Josh R","1.0.3","willekra","04-Aug-2015 11:09:48"
My job design is
sequential file--->xmlinput-----> seqfile
please help if Im missing anything?
Posted: Tue Aug 11, 2015 11:32 am
by eostic
Sadly, that is a pretty poor xml design. There is structure, but it is merely implied, not forced as it could or should be. There ought to be a "player" element, with id, UID, clubcolor, etc. within it...and then multiple of those player elements.
As it stands now, there are multiple, entirely independent repeating units. id....UID.....clubcolor....playerName, etc.
You may be able to play with the xpath or use xslt in the stage, but the quickest and simplest thing is probably just to have one output link (from the same stage) for each of those groups...one column each, and then work out a way to combine them downstream into a single row.
Ernie
Posted: Tue Aug 11, 2015 11:49 am
by karry450
Sorry I don't have a premium membership to lookinto can you please help
Posted: Tue Aug 11, 2015 4:48 pm
by ray.wurlod
karry450 wrote:Sorry I don't have a premium membership to lookinto can you please help
Why not get one? You're over 200 posts, so clearly benefiting from DSXchange. Premium memberships make up the funding mechanism that keeps DSXchange alive.