Preserving the Sorting after FILTER stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
parag.s.27
Participant
Posts: 221
Joined: Fri Feb 17, 2006 3:38 am
Location: India
Contact:

Preserving the Sorting after FILTER stage

Post by parag.s.27 »

We are using the Oracle10 g as our source database. When we extract the data using OCI stage we sort the data by applying "Order By asc" on Keys and the Date at the end of query. After the data is extracted, we split the incoming data into 2 streams on the basis of dates where if records are having Date < 01-01-2005 00:00:00 then it should go to "Stream 1" where as if records are having Date >= 01-01-2005 00:00:00 then it should go in "Stream 2".

Now in Stream 1 we apply Remove Duplicate stage on the Keys and retain the last record.

Is it possible that after spliting the data the Stream 1 may get records in Un - Sorted manner and thus the Remove Duplicate stage will yield incorrect results?
Thanks & Regards
Parag Saundattikar
Certified for Infosphere DataStage v8.0
nani0907
Participant
Posts: 155
Joined: Wed Apr 18, 2007 10:30 am

Post by nani0907 »

Use sequential mode Execution which makes sure that data flows on one node
thanks n regards
nani
dhanashreepanse
Participant
Posts: 25
Joined: Fri Jan 11, 2008 12:49 am
Location: Pune, India

Post by dhanashreepanse »

It is always better to sort the data before giving it to the Remove Duplicates Stage.
mahadev.v
Participant
Posts: 111
Joined: Tue May 06, 2008 5:29 am
Location: Bangalore

Post by mahadev.v »

Sort order might remain but i would have Removed the sort from the SQL and done it in DataStage on stream 1 before the remove duplicates stage.
"given enough eyeballs, all bugs are shallow" - Eric S. Raymond
parag.s.27
Participant
Posts: 221
Joined: Fri Feb 17, 2006 3:38 am
Location: India
Contact:

Post by parag.s.27 »

Actually we are getting review comments from the external contractors who are working as a DataStage professionals about removing the Sort stage. They are saying that Sort stage is not needed as the partition will be preserved.
Thanks & Regards
Parag Saundattikar
Certified for Infosphere DataStage v8.0
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Re: Preserving the Sorting after FILTER stage

Post by ray.wurlod »

parag.s.27 wrote:Is it possible that after spliting the data the Stream 1 may get records in Un - Sorted manner and thus the Remove Duplicate stage will yield incorrect results?
No.
There is nothing in what you are doing that would affect the sorted order.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

parag.s.27 wrote:Actually we are getting review comments from the external contractors who are working as a DataStage professionals about removing the Sort stage. They are saying that Sort stage is not needed as the partition will be preserved.
Partitioning and sorting are not the same thing. The comments are correct, but miss the point that a Remove Duplicates stage expects sorted input for most efficient operation.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply