Page 1 of 1

XML Output stage in Parallel Job - bad performance

Posted: Sun Oct 27, 2013 3:22 pm
by richieRich
Hi,

We are on DataStage version 8.0. I've just got through the pain of writing a nice little job that reads from a database and then creates an XML document using multiple XML output stages.

I've now been told by someone on my team that it is not best practice to use the XML Output stage in a parallel job because it loads in a sequence job per instance of the stage and hence degrades performance?

Is this correct? I've got my doubts because the XML Output stage is in the Parallel job developer guide.

thanks

Posted: Sun Oct 27, 2013 3:59 pm
by ray.wurlod
I don't believe it is correct, but am struggling to remember back to 8.0. As a general rule, though, a stage that is marked as parallel executes in multiple processes one per processing node defined in the configuration file.

Posted: Sun Oct 27, 2013 7:01 pm
by eostic
What? ...xml brings in a Sequence Job?

No. The xmlOutput Stage is a regular Stage like any other...it can be parallelized and behaves like any other Stage. It's not a screamer --- no xml processing usually is, but it certainly doesn't launch a Sequence.

As you move up in releases, you will be happy to start using the new xml Stage (8.5+), which dramatically simplifies the number of steps/stages needed to write a complex, multi-node xml document.

Ernie

Posted: Sun Oct 27, 2013 10:02 pm
by richieRich
woops sorry. Rather than saying a sequence job is loaded in per instance of the XML stage I meant to say "server job". I don't think this changes the answer though.

Posted: Sun Oct 27, 2013 11:18 pm
by chulett
It doesn't... both statements are incorrect.

Posted: Mon Oct 28, 2013 4:51 am
by eostic
The xmlOutput Stage was written a long time ago, before Enterprise Edition, and like all the stages at that time, it had to be adapted to parallel jobs.....and that meant that a special wrapper had to be created in order for them to run in parallel.....but even so, it didn't mean launching a server job or anything like that. The closest the architecture comes to that is with the BASIC Transformer Stage, but we're talking here about xmlOutput. The xmlOutput Stage, like the xmlInput Stage, use xslt in their processing, whether in an EE Job or a Server Job --- perhaps that is where the initial confusion got started....

Ernie

Posted: Tue Oct 29, 2013 4:05 pm
by richieRich
Hi,

Yes what Ernie is saying sounds very like what was suggested happens. A wrapper for a parallel job. However from the sounds of all the responses I don't think it is a concern hence I'll go back to my parallel written job and continue with that.

Thanks all for the answers. I'm going to mark this thread off as resolved.