XML Stage - Rejecting Input XML files from a single director

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Nini
Participant
Posts: 6
Joined: Tue Oct 14, 2014 12:30 pm

XML Stage - Rejecting Input XML files from a single director

Post by Nini »

Reading multiple XML files from a single directory using an (External Source Stage), I want to reject all the XMLs that do not conform to the xsd specified in the XML stage that follows.

Scenario:
5 test XMLs, 3 conform to the xsd while 2 don't --> External_Source_Stage passes 5 rows to the XML Stage, which then outputs 3 rows. Want to capture the file name of the other 2 dropped XMLs to a flatfile.

Approach tried:

1) Adding structure validation to the Parser step that outputs to the output step
Result: Does not work. Non-conforming rows dropped at the parser step before ever reaching the output step, therefore not passing any information.

2) Adding structure validation to the Parser step + an output link coming from the XML stage
Result: Does not work. Job aborted with the following error message:
XML_PARSE:java.lang.NullPointerException at com.ibm.e2.connector.datastage.cc.CC_E2Adapter.getRejectDataSetProducer (CC_E2Adapter.java:406)
Can't find any resolution regarding this error message :(
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

I have to dig up some notes and look up some old Jobs I have that do this sort of thing with the XML Stage.....

....but in the interim, if your XML documents are not huge (<200 meg), and they contain the xsd location, you should be able to do this fairly easily with the xmlInput Stage. It provides the ability to establish a reject link (there's a check box), and then the incoming filename (use same column name as on the input link) along with the reason can be sent down that reject link to your sequential file.

You don't need to "parse" the file in the xmlInput Stage if you don't want to...can just use it for this...

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

I like the xmlInput Stage for this... it is simpler to implement and you can easily "turn it on" or "turn it off"....but it can be done in the xml Stage also, but the Assembly has to be specifically developed for this purpose.

Take a look at the right side frames after you use REJECT for one of the options in your validation dialog of the XML Parser Step. Here you will see a new entry --- the xml Parser result. It will have a boolean "Success" property, and also a message.

Now add a "Switch" Step. In the switch step, you can identify various results......have a result called "badXML", which is what occurs when "success" up above is "false"....and a goodXML, which is what occurs when "success" up above is "true". Now you have two entirely different structures that you can map later on.......

Presumably you will have (at least) two output links from the Stage.....in the Output Step, map the structures from "badXML" to the link for your "rejects" and the structures from "goodXML" to the output link for positive parsing results. The filename on the input (from External Source) can then be mapped to the reject link and sent to the external file along with the message of why it failed validation.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Post Reply