Is Sort stage before Remove-Duplicate stage mandatory?
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 38
- Joined: Fri Apr 22, 2005 6:07 am
Is Sort stage before Remove-Duplicate stage mandatory?
Hi All,
We have tested Remove-Duplicate stage without using any Sort Stage before it. It is working fine and gives expected results. However, in DS help, it is specifically given that data should be presorted and Sort Stage should be used before using Remove-Duplicate stage. During development we are using only one node and partition type is Auto. Will Remove-Duplicate stage without presorted data, cause any issue with more number of nodes and with other type of partitions?
Thanks in advance.
-Amit
We have tested Remove-Duplicate stage without using any Sort Stage before it. It is working fine and gives expected results. However, in DS help, it is specifically given that data should be presorted and Sort Stage should be used before using Remove-Duplicate stage. During development we are using only one node and partition type is Auto. Will Remove-Duplicate stage without presorted data, cause any issue with more number of nodes and with other type of partitions?
Thanks in advance.
-Amit
Hi,
If you know the input stream is already sorted (like a db stage using order by) then you might not need to sort the data again.
The same for select statements that comes sorted naturally as Oracle some times give.
In case your not sure then you need the sort stage, using the RD stage's link option for sorting the data won't do!
IHTH,
If you know the input stream is already sorted (like a db stage using order by) then you might not need to sort the data again.
The same for select statements that comes sorted naturally as Oracle some times give.
In case your not sure then you need the sort stage, using the RD stage's link option for sorting the data won't do!
IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
For reasonable amounts of data (< 2gb) I've always been partial to a sort -u before the job
However, as people probably have noticed by now, I'm not a datastage purist, I think there are other ways of doing things.
Andrew the Heretic
However, as people probably have noticed by now, I'm not a datastage purist, I think there are other ways of doing things.
Andrew the Heretic
Andrew
Think outside the Datastage you work in.
There is no True Way, but there are true ways.
Think outside the Datastage you work in.
There is no True Way, but there are true ways.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
It's not partitioning. Look at the score. Note that DataStage has inserted some tsort operators (and probably some buffer operators also) on the inputs. So, if you don't specify sorting, DataStage will insert sorting. You might prefer a Sort stage so you can tell it "don't sort (previously sorted)" explicitly.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 612
- Joined: Thu May 03, 2007 4:59 am
- Location: Melbourne
Set APT_NO_SORT_INSERTION to True in your job and run to see the difference.
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
-
- Premium Member
- Posts: 783
- Joined: Mon Jan 16, 2006 10:17 pm
- Location: Sydney, Australia
-
- Participant
- Posts: 612
- Joined: Thu May 03, 2007 4:59 am
- Location: Melbourne
You don't have to do an explicit 'Hash partitioning' ! See the score and you can see why. If you include
$APT_NO_PART_INSERTION = True
$APT_NO_SORT_INSERTION = True
and run the job and see the score. You can see the difference.
Datastage inserts what is required on the inputs even if you have not specified.
$APT_NO_PART_INSERTION = True
$APT_NO_SORT_INSERTION = True
and run the job and see the score. You can see the difference.
Datastage inserts what is required on the inputs even if you have not specified.
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>