Hello everyone,
I am writing to ask an explanation about how to compose a job DS that retrieves data from multiple tables.
My infrastructure is Exadata Machine, Datastage 8.7 on Aix server(10 cores).
My case is as follow.
In production environment, there is a job that reads 8 tables as input.
This job uses 8 connectors, one for each table/view.
Links from these connectors go into a funnel, and then some business rule is applied.....
My doubt is whether it is more efficient to remain in this configuration or collapse the 8 queries in a single query using the union operator, and therefore have only one connector.
For efficient I mean less time or an optimal way to retrieve all data.
The number of rows read from DB is certainly greater than 30M.
What is your experience?
What can drive the choice between the two approach described earlier?
Is there any de facto rule?
Thanks in advance,
Sandro
RULES TO RETRIEVE DA FROM MORE TABLES IN ONE JOB
Moderators: chulett, rschirm, roy
Use the same general rule as deciding when to sort using the Sort stage vs. using an ORDER BY clause in your query.
It depends mainly on the processing power, the degree of parallelism, and the amount of resource contention at the time the job runs on your DataStage server vs. on the database server.
Is the Exadata Machine and DataStage running on the same server as each other? You could modify a copy of the job, having the funnel output to a Copy stage. Then make another copy of that job and combine the source stages and compare results to see.
It depends mainly on the processing power, the degree of parallelism, and the amount of resource contention at the time the job runs on your DataStage server vs. on the database server.
Is the Exadata Machine and DataStage running on the same server as each other? You could modify a copy of the job, having the funnel output to a Copy stage. Then make another copy of that job and combine the source stages and compare results to see.
Choose a job you love, and you will never have to work a day in your life. - Confucius
There's no guideline, per se. As noted, it's all about where you can best afford the processing power. Unless those eight connectors come from different instances, I would pretty much always push this back on the database server rather than do this "in job".
Unless you've got a mandate to "do no harm" to the source system, i.e. get in and get out with a minimal of resource usage - then I could see streaming the eight flows in and doing the union in the ETL job.
Unless you've got a mandate to "do no harm" to the source system, i.e. get in and get out with a minimal of resource usage - then I could see streaming the eight flows in and doing the union in the ETL job.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Thanks for your answers.
I completed some tests on the two alternatives, among which I should choose.
The evidence is that, in my case, I get the shortest processing time using 8 connectors.
I thought that push back query to DB Exadata using unique connector could improve performance, but the prior solution seem to be more efficient.
Clearly, this consideration applies to my application environment.
Greetings
I completed some tests on the two alternatives, among which I should choose.
The evidence is that, in my case, I get the shortest processing time using 8 connectors.
I thought that push back query to DB Exadata using unique connector could improve performance, but the prior solution seem to be more efficient.
Clearly, this consideration applies to my application environment.
Greetings