Page 1 of 1

Running a stage within a job with a different config file

Posted: Sat Jul 27, 2019 3:13 pm
by abc123
I have a job with several stages with the last stage being an Oracle Connector stage. I would like the last stage to run with a different config file than the other stages. I looked into node map constraint but it wasn't of much help. Any ideas?

Posted: Sat Jul 27, 2019 4:31 pm
by chulett
Why don't we start with the why of this... what would be different about the configuration for this last stage?

Posted: Mon Jul 29, 2019 11:56 am
by abc123
Nothing would be different. I am just wondering if just the Oracle Connector stage can be run with a higher number of nodes to improve performance. The SQL takes a long time due to the volume of data and was hoping to improve performance.

Posted: Mon Jul 29, 2019 1:10 pm
by UCDI
if you run into something that cannot be set as a parameter, and there are a very few things that are like this, you can make 2 copies of the parallel job with the different setups and call the one you want dynamically in the sequence job.

that said I can't recall if this is something you can stuff into a parameter or not.

From memory only (I don't do a lot of this kind of tampering) I think you can put these config file names into parameter sets and pick the one you want at the sequence level and pass it down from there. I know we have a couple of jobs where we can pick 1cpu or 4 this way.

Posted: Mon Jul 29, 2019 9:40 pm
by chulett
Ah, okay. Perhaps you can handle this on the database side? With possibly a parallel hint or setting a default level of parallelism on the tables involved. Each node would get the portion it has to work on and that work could be further "parallel-ized" by Oracle.

Posted: Mon Jul 29, 2019 9:41 pm
by chulett
Ah, okay. Perhaps you can handle this on the database side? With possibly a parallel hint or setting a default level of parallelism on the tables involved. Each node would get the portion it has to work on and that work could be further "parallel-ized" by Oracle.

Posted: Wed Jul 31, 2019 7:21 am
by ray.wurlod
You could run the whole job using a configuration file with the higher number of nodes, but use node pool(s) to constrain the other stages to execute in a smaller subset of those nodes.

Posted: Wed Jul 31, 2019 10:45 am
by PaulVL
More is not always better.

Look to see how you are accessing that target Oracle system.

Are you doing updates?
Are you using bulk load?
Is the table truncated before you are loading it?
Is your data pre-sorted based upon the keys of your table?
Any locks on that table during your load time?

Have you spoken to your Oralce DBA to see if runstats/reorg needs to be done on the table so that you don't waste time.

I have seen where the target Database was the bottleneck because it was super busy servicing customer queries at the time of load. So if you put even MORE work on it by upping the parallelism, you could be impacting customers.


=============

I have NEVER seen a need to push out any target Connector stage to it's own APT file.

You'd be repartitioning data left and right if your parallelism changed.

You'd potentially be crossing host names if your APT file pushed it to a different host. (TCP/IP traffic is slower than memory to memory traffic)

Your DataStage admin would slap you if you were running on a grid install and hard coded an APT file just for that stage.

Posted: Thu Aug 01, 2019 7:58 am
by chulett
All good points. :wink:

Came back here after realizing my simplistic suggestion wasn't really appropriate, guess I was thinking more from a source standpoint than a target. D'oh. I too should have asked about the nature of the target load you are doing - updates versus inserts versus a mixture of the two - and what seems to be the issue / bottleneck. Are you sure the target is being fed fast enough? Have you tried replacing the target Oracle connector with a Sequential file stage to check?