Page 1 of 1

Preserving the Sorting after FILTER stage

Posted: Tue Dec 23, 2008 11:03 pm
by parag.s.27
We are using the Oracle10 g as our source database. When we extract the data using OCI stage we sort the data by applying "Order By asc" on Keys and the Date at the end of query. After the data is extracted, we split the incoming data into 2 streams on the basis of dates where if records are having Date < 01-01-2005 00:00:00 then it should go to "Stream 1" where as if records are having Date >= 01-01-2005 00:00:00 then it should go in "Stream 2".

Now in Stream 1 we apply Remove Duplicate stage on the Keys and retain the last record.

Is it possible that after spliting the data the Stream 1 may get records in Un - Sorted manner and thus the Remove Duplicate stage will yield incorrect results?

Posted: Wed Dec 24, 2008 12:45 am
by nani0907
Use sequential mode Execution which makes sure that data flows on one node

Posted: Wed Dec 24, 2008 3:07 am
by dhanashreepanse
It is always better to sort the data before giving it to the Remove Duplicates Stage.

Posted: Wed Dec 24, 2008 3:22 am
by mahadev.v
Sort order might remain but i would have Removed the sort from the SQL and done it in DataStage on stream 1 before the remove duplicates stage.

Posted: Wed Dec 24, 2008 3:35 am
by parag.s.27
Actually we are getting review comments from the external contractors who are working as a DataStage professionals about removing the Sort stage. They are saying that Sort stage is not needed as the partition will be preserved.

Re: Preserving the Sorting after FILTER stage

Posted: Wed Dec 24, 2008 3:43 am
by ray.wurlod
parag.s.27 wrote:Is it possible that after spliting the data the Stream 1 may get records in Un - Sorted manner and thus the Remove Duplicate stage will yield incorrect results?
No.
There is nothing in what you are doing that would affect the sorted order.

Posted: Wed Dec 24, 2008 3:45 am
by ray.wurlod
parag.s.27 wrote:Actually we are getting review comments from the external contractors who are working as a DataStage professionals about removing the Sort stage. They are saying that Sort stage is not needed as the partition will be preserved.
Partitioning and sorting are not the same thing. The comments are correct, but miss the point that a Remove Duplicates stage expects sorted input for most efficient operation.