percent_rank() function in Datastage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Rank - and any derivative of it - is a display function, not a calculated function.

DataStage only takes one pass through the data, unless you explicitly design it to take two. Without two passes you can not calculate percent, rank or any other value that depends on the total count or value.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Silly question, but why not just continue to do that in Oracle?
-craig

"You can never have too many knives" -- Logan Nine Fingers
srini.dw
Premium Member
Premium Member
Posts: 186
Joined: Fri Aug 18, 2006 1:59 am
Location: Chennai

Post by srini.dw »

We do not have Oracle in this project, its Teradata and Datastage. We are converting from Oracle to Teradata
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Ah... one of those 'gory details' people don't seem to want to burden us with. :wink: Perhaps the question is: is there an equivalent in TeraData?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If Teradata sql doesn't allow this function you could try running the data through an aggregator to get the total, then use that result to compute the rank?
srini.dw
Premium Member
Premium Member
Posts: 186
Joined: Fri Aug 18, 2006 1:59 am
Location: Chennai

Post by srini.dw »

Without two passes you can not calculate percent, rank or any other value that depends on the total count or value.

How to do this, still not able to figure out
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

No need for 2 passes.

1. Read the data, split the stream into two links, A and B
2. Link A does a aggregation on a dummy column to produce one row with the sum.
3. Link B does an inner join to the aggregator output of A
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Keep going - you need more than the sum to get rank.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

To complete the design - put in a transform stage that keeps a running total and compute the percentage rank there.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

At some point you are going to need to SORT the data (at least for the second pass) to get a rank.
Then you need to implement your business rule for tied ranks. For example 1,2,2,4,...

What Arnd called "percentage rank" is simply "percentage".
Last edited by ray.wurlod on Wed Mar 12, 2008 11:48 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
balajisr
Charter Member
Charter Member
Posts: 785
Joined: Thu Jul 28, 2005 8:58 am

Post by balajisr »

As Craig pointed out why not you do this in teradata itself when it is available?

Check out this URL.

http://readvitamin.com/2007/08/22/analy ... -teradata/
srini.dw
Premium Member
Premium Member
Posts: 186
Joined: Fri Aug 18, 2006 1:59 am
Location: Chennai

Post by srini.dw »

Can you please elaborate more, not able to get any idea..

DATASET--> COPY-----> 2 links>
1 As it is
2 Sort-->Aggr> Join-->1---->Dataset
Post Reply