Page 1 of 1

Better Performance

Posted: Fri Mar 07, 2008 4:10 am
by georgesebastian
Hi Everyone,
I have a design issue.
In the source DRS I need to get data from 28 tables.The structure of the tables are same but having very huge amount of data. (all the rows are distinct)

I would kindly like to know which one will give me a better performance
My options are

1.Using a UNION in the source DRS
2.Using a UNION ALL in the source DRS
3.Using 28 sepreate DRS in the source and using LINK COLLECTOR to combine the data.

I thought it would be the UNION ALL since it merges data from the Database itself but
I read in one of the DSxchange post that its LINK COLLECTOR which will give a better performance than UNION ALL

viewtopic.php?t=113306&highlight=Link+Collector
(please refer to the post from balasumit21 )

Please advise

Thanks
George Sebastian
:wink:

Posted: Fri Mar 07, 2008 4:32 am
by ray.wurlod
Define "performance" in an ETL context. Only then might one be able to address whether whatever it is can be made "better" (whatever that is).

Please do not cite rows/sec as your performance metric. Search DSXchange for many rants from me about why it is meaningless.

Posted: Fri Mar 07, 2008 4:34 am
by WoMaWil
union all is more performant than UNION because a UNION is a distinct.

you have a lot of screws for to find the optimal performance, there are many ways and there is not best of all for each situation. To find the best way for your case you need some investigation. Sometimes the bottleneck for performance is not the way of reading but the way of writing and its syncronisation.

Posted: Fri Mar 07, 2008 4:35 am
by georgesebastian
Hi all,

What about using 28 DRS and after that the using a Sequencial file(with append option) insted of using Link collector.
which one will be a better design

Please advise

Thanks
George

Posted: Fri Mar 07, 2008 7:37 am
by chulett
WoMaWil wrote:union all is more performant than UNION because a UNION is a distinct.
Specifically, a UNION ALL simply returns the results of all queries. A UNION must first get the results of all queries and then so a 'SORT DISTINCT' on the results, so it obviously will take longer and more processing resources.

You make the choice based on whether you need duplicates removed from the result set, not on which one is 'faster'.