XML Output stage in Parallel Job - bad performance

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
richieRich
Premium Member
Premium Member
Posts: 27
Joined: Tue Jan 05, 2010 12:04 am

XML Output stage in Parallel Job - bad performance

Post by richieRich »

Hi,

We are on DataStage version 8.0. I've just got through the pain of writing a nice little job that reads from a database and then creates an XML document using multiple XML output stages.

I've now been told by someone on my team that it is not best practice to use the XML Output stage in a parallel job because it loads in a sequence job per instance of the stage and hence degrades performance?

Is this correct? I've got my doubts because the XML Output stage is in the Parallel job developer guide.

thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I don't believe it is correct, but am struggling to remember back to 8.0. As a general rule, though, a stage that is marked as parallel executes in multiple processes one per processing node defined in the configuration file.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

What? ...xml brings in a Sequence Job?

No. The xmlOutput Stage is a regular Stage like any other...it can be parallelized and behaves like any other Stage. It's not a screamer --- no xml processing usually is, but it certainly doesn't launch a Sequence.

As you move up in releases, you will be happy to start using the new xml Stage (8.5+), which dramatically simplifies the number of steps/stages needed to write a complex, multi-node xml document.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
richieRich
Premium Member
Premium Member
Posts: 27
Joined: Tue Jan 05, 2010 12:04 am

Post by richieRich »

woops sorry. Rather than saying a sequence job is loaded in per instance of the XML stage I meant to say "server job". I don't think this changes the answer though.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

It doesn't... both statements are incorrect.
-craig

"You can never have too many knives" -- Logan Nine Fingers
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

The xmlOutput Stage was written a long time ago, before Enterprise Edition, and like all the stages at that time, it had to be adapted to parallel jobs.....and that meant that a special wrapper had to be created in order for them to run in parallel.....but even so, it didn't mean launching a server job or anything like that. The closest the architecture comes to that is with the BASIC Transformer Stage, but we're talking here about xmlOutput. The xmlOutput Stage, like the xmlInput Stage, use xslt in their processing, whether in an EE Job or a Server Job --- perhaps that is where the initial confusion got started....

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
richieRich
Premium Member
Premium Member
Posts: 27
Joined: Tue Jan 05, 2010 12:04 am

Post by richieRich »

Hi,

Yes what Ernie is saying sounds very like what was suggested happens. A wrapper for a parallel job. However from the sounds of all the responses I don't think it is a concern hence I'll go back to my parallel written job and continue with that.

Thanks all for the answers. I'm going to mark this thread off as resolved.
Post Reply