RULES TO RETRIEVE DA FROM MORE TABLES IN ONE JOB

sangi1981 · Post by **sangi1981** » Sun Feb 22, 2015 4:38 pm

Hello everyone,
I am writing to ask an explanation about how to compose a job DS that retrieves data from multiple tables.
My infrastructure is Exadata Machine, Datastage 8.7 on Aix server(10 cores).

My case is as follow.
In production environment, there is a job that reads 8 tables as input.
This job uses 8 connectors, one for each table/view.
Links from these connectors go into a funnel, and then some business rule is applied.....

My doubt is whether it is more efficient to remain in this configuration or collapse the 8 queries in a single query using the union operator, and therefore have only one connector.
For efficient I mean less time or an optimal way to retrieve all data.
The number of rows read from DB is certainly greater than 30M.

What is your experience?
What can drive the choice between the two approach described earlier?
Is there any de facto rule?

Thanks in advance,
Sandro

qt_ky · Post by **qt_ky** » Sun Feb 22, 2015 7:26 pm

Use the same general rule as deciding when to sort using the Sort stage vs. using an ORDER BY clause in your query.

It depends mainly on the processing power, the degree of parallelism, and the amount of resource contention at the time the job runs on your DataStage server vs. on the database server.

Is the Exadata Machine and DataStage running on the same server as each other? You could modify a copy of the job, having the funnel output to a Copy stage. Then make another copy of that job and combine the source stages and compare results to see.

sangi1981 · Post by **sangi1981** » Mon Feb 23, 2015 2:49 pm

Exadata DB Machine and AIX are on the same network.
Is there any guide line on choosing between the two way: 8 connector with 1 query per connector, vs 1 connector with 8 query in union?
Actually we use four node on aix, having ten dedicated phisical core.

Thank you

sangi1981 · Post by **sangi1981** » Mon Feb 23, 2015 2:50 pm

Exadata DB Machine and AIX are on the same network.
Is there any guide line on choosing between the two way: 8 connector with 1 query per connector, vs 1 connector with 8 query in union?
Actually we use four node on aix, having ten dedicated phisical core.

Thank you

chulett · Post by **chulett** » Mon Feb 23, 2015 2:59 pm

There's no guideline, per se. As noted, it's all about where you can best afford the processing power. Unless those eight connectors come from different instances, I would pretty much always push this back on the database server rather than do this "in job".

Unless you've got a mandate to "do no harm" to the source system, i.e. get in and get out with a minimal of resource usage - then I could see streaming the eight flows in and doing the union in the ETL job.

sangi1981 · Post by **sangi1981** » Wed Feb 25, 2015 3:12 pm

Thanks for your answers.
I completed some tests on the two alternatives, among which I should choose.
The evidence is that, in my case, I get the shortest processing time using 8 connectors.
I thought that push back query to DB Exadata using unique connector could improve performance, but the prior solution seem to be more efficient.
Clearly, this consideration applies to my application environment.

Greetings