RULES TO RETRIEVE DA FROM MORE TABLES IN ONE JOB

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sangi1981
Participant
Posts: 99
Joined: Fri Jun 13, 2008 8:10 am

RULES TO RETRIEVE DA FROM MORE TABLES IN ONE JOB

Post by sangi1981 »

Hello everyone,
I am writing to ask an explanation about how to compose a job DS that retrieves data from multiple tables.
My infrastructure is Exadata Machine, Datastage 8.7 on Aix server(10 cores).

My case is as follow.
In production environment, there is a job that reads 8 tables as input.
This job uses 8 connectors, one for each table/view.
Links from these connectors go into a funnel, and then some business rule is applied.....

My doubt is whether it is more efficient to remain in this configuration or collapse the 8 queries in a single query using the union operator, and therefore have only one connector.
For efficient I mean less time or an optimal way to retrieve all data.
The number of rows read from DB is certainly greater than 30M.

What is your experience?
What can drive the choice between the two approach described earlier?
Is there any de facto rule?

Thanks in advance,
Sandro
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

Use the same general rule as deciding when to sort using the Sort stage vs. using an ORDER BY clause in your query.

It depends mainly on the processing power, the degree of parallelism, and the amount of resource contention at the time the job runs on your DataStage server vs. on the database server.

Is the Exadata Machine and DataStage running on the same server as each other? You could modify a copy of the job, having the funnel output to a Copy stage. Then make another copy of that job and combine the source stages and compare results to see.
Choose a job you love, and you will never have to work a day in your life. - Confucius
sangi1981
Participant
Posts: 99
Joined: Fri Jun 13, 2008 8:10 am

Post by sangi1981 »

Exadata DB Machine and AIX are on the same network.
Is there any guide line on choosing between the two way: 8 connector with 1 query per connector, vs 1 connector with 8 query in union?
Actually we use four node on aix, having ten dedicated phisical core.

Thank you
sangi1981
Participant
Posts: 99
Joined: Fri Jun 13, 2008 8:10 am

Post by sangi1981 »

Exadata DB Machine and AIX are on the same network.
Is there any guide line on choosing between the two way: 8 connector with 1 query per connector, vs 1 connector with 8 query in union?
Actually we use four node on aix, having ten dedicated phisical core.

Thank you
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

There's no guideline, per se. As noted, it's all about where you can best afford the processing power. Unless those eight connectors come from different instances, I would pretty much always push this back on the database server rather than do this "in job".

Unless you've got a mandate to "do no harm" to the source system, i.e. get in and get out with a minimal of resource usage - then I could see streaming the eight flows in and doing the union in the ETL job.
-craig

"You can never have too many knives" -- Logan Nine Fingers
sangi1981
Participant
Posts: 99
Joined: Fri Jun 13, 2008 8:10 am

Post by sangi1981 »

Thanks for your answers.
I completed some tests on the two alternatives, among which I should choose.
The evidence is that, in my case, I get the shortest processing time using 8 connectors.
I thought that push back query to DB Exadata using unique connector could improve performance, but the prior solution seem to be more efficient.
Clearly, this consideration applies to my application environment.

Greetings
Post Reply