Sort option

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dslearner07
Participant
Posts: 14
Joined: Wed Feb 01, 2012 5:26 am
Location: Hyd

Sort option

Post by dslearner07 »

Hello All

I am new to datastage though following this forum for some time now. I have a simple question on Sort stage. Appreciate if someone answer with a simple example. Only example will be enough to show the difference. I have gone through the forum history but bit confused with all answer.

"What is the differerence between the Cluster key change and key change option?"

Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Cluster is where the rows are clustered (contiguous) but not necessarily sorted.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dslearner07
Participant
Posts: 14
Joined: Wed Feb 01, 2012 5:26 am
Location: Hyd

Post by dslearner07 »

If i have below input data

Col1 Col2
10 Delhi
10 Mumbai
30 Adelade
40 NewYork
50 Sydney
50 LA
50 HK

What would be output of above for both clustered and key change? Assume both cluster and sort is on Col1. Appriciate your resposnse.
Satwika
Participant
Posts: 45
Joined: Mon Jan 02, 2012 11:29 pm

Cluster key change column

Post by Satwika »

Hi ,

According to my understanding :

For Cluster key column-- sort stage itself creats the clusterkeychange in output column , and takes the default 'tinyint ' datatype. The 'sort key mode' should be in either 'Don't Sort(previously sorted/Previously grouped mode'. Cluster key mode is not ready to take the sort option.

Both the Cluster key , Key column change gives the same output as below.

In Key column change it will sort the data in specified mode and it will create the keycolumn change column in output.

For your input ,

Col1 Col2
10 Delhi
10 Mumbai
30 Adelade
40 NewYork
50 Sydney
50 LA
50 HK


Expected output :

Col1 Col2 Clusterkey
10 Delhi 1
10 Mumbai 0
30 Adelade 1
40 NewYork 1
50 Sydney 1
50 LA 0
50 HK 0

Regards
Satwika
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If there's only one key column, then cluster key and sort key tend to be the same. In your example, Col1=50 forms a cluster even if Col2 is not a sort key.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dslearner07
Participant
Posts: 14
Joined: Wed Feb 01, 2012 5:26 am
Location: Hyd

Post by dslearner07 »

Thanks Satwika.. Can you just put an example where both Cluster and key change column output would be different.
Satwika
Participant
Posts: 45
Joined: Mon Jan 02, 2012 11:29 pm

Post by Satwika »

Hi,

Try with the 2 key columns define in your input and make a sort on one key column and perform cluster on another key column it will give the different output.
Post Reply