Page 2 of 2

Posted: Thu Jun 08, 2006 10:45 am
by Tatiana
rwierdsm wrote:The database will be much, much faster then DataStage.

Rob
If I'm pulling data directly from a table I'll need to switch to pull the data from a sorted view, right?

Posted: Thu Jun 08, 2006 10:52 am
by DSguru2B
You dont need to switch to anything. The orderby clause will give you the data in the sorted order.
Is that what you meant or am i going cukoo?

Posted: Thu Jun 08, 2006 12:14 pm
by Tatiana
Thanks, DSguru for all your help. I did add order by clause in the select statement in Oracle stage. In my question I was referring to rwierdsm's statement that it will be much faster in database then DataStage ("If you can do the sort as part of your database read, do it there! The database will be much, much faster then DataStage.").

Posted: Thu Jun 08, 2006 12:59 pm
by rwierdsm
Tatiana,

No view required, simply add an order by clause, as below

Before:

Code: Select all

Select ColumnA, ColumnB, ColumnC
 From Table1
After

Code: Select all

Select ColumnA, ColumnB, ColumnC
 From Table1
 order by ColumnA, ColumnB
You can do this by setting the database stage to either 'User Defined SQL', or 'Generated SQL' and specifying your sort criteria in the proper place.

Craig,
Actually, asserting the 'sort' order in the Aggregator is required or else all your database sorting will be wasted.
?!?!? I've used the aggregator after a sort without specifying any sort criteria. Am I writing buggy code?!?

Rob

Posted: Thu Jun 08, 2006 1:10 pm
by DSguru2B
Well by specifying the sort order to the aggregator, the aggregator becomes much happier as it can blindly group by those keys without actually storing much info in the temp files. The aggregator becomes much faster (kinda like a super charged car 8) ) when it knows how the data is coming in.
And no my name is not Chuck. :twisted:

Posted: Thu Jun 08, 2006 2:01 pm
by chulett
Not buggy, per se - but you are making it do more work than necessary. :P

If you do not assert a sort order in the Aggregator stage, it has no idea what order the incoming data is in. It then errs on the side of caution and so needs to take all of the incoming data into the stage, order and then aggregate it. That's why you see everything go in before anything starts coming out.

When you mark those sort fields in the Aggregator, you aren't telling it how to sort the data but how the incoming data is sorted - hence my use of the term 'assert'. And much like a normal assertion, if you lie to the stage it will bust you for it with an Abort for 'row out of sequence'.

If you get it right - presort in a manner that supports the aggregation being done (another key point) and assert that sort order in the stage - it is no longer a 'bottleneck'. It basically only needs to hold one 'aggregation group' in memory at a time and when the first record from the new group comes in the old group is flushed through.

Hope that makes sense...

Posted: Thu Jun 08, 2006 2:04 pm
by DSguru2B
Isnt that what i said :roll:
Well, Craig gave more grace to it. Well explained.

Posted: Thu Jun 08, 2006 2:35 pm
by rwierdsm
chulett wrote:Hope that makes sense...
Perfect sense!

As it turns out, the places where I've done this has pretty low volumes, so it will be OK in the interim.

Thanks, guys (as I mutter something about old dogs learning new tricks...)

Posted: Fri Jun 09, 2006 7:12 am
by Tatiana
Thanks for your help with the sort suggestion. After I added sorting, the job did run without the mutex error message. However, now with the sort it takes almost twice as long as before. Do you have any ideas on improving the performance?

Posted: Fri Jun 09, 2006 7:16 am
by ray.wurlod
Is there an index on the sorting column?

Because a B-tree is stored in sorted order, it can facilitate sorting in an unconstrained query, and in certain classes of constrained query.

Posted: Fri Jun 09, 2006 8:32 am
by Tatiana
ray.wurlod wrote:Is there an index on the sorting column?
Does it matter if I'm getting data from a view or table? I'm pulling the data from an Oracle view.

Posted: Fri Jun 09, 2006 8:37 am
by kumar_s
View should point the table while retriving data, and inturns the index.