Page 1 of 1

SQL Logic in Datastage

Posted: Wed Sep 01, 2010 8:16 pm
by kpsita
Hi,

How can I implement this following logic in datastage.

select
B.AMT,
rank ()
OVER (PARTITION BY B.FIELD_ID
ORDER BY B.AMT DESC)
ROW_RANK
FROM TABLE B

I wanted to implement this in datastage and here my table represents a dataset.

Appriciate your help.

Thanks

Posted: Wed Sep 01, 2010 9:40 pm
by chulett
Do you understand what the sql does? Can you spell out in words what kind of information you need from the dataset in question?

Posted: Wed Sep 01, 2010 10:49 pm
by ray.wurlod
RANK is a display function, not a selection function. As a general rule it's not something I'd ask an ETL tool to do - it's something I'd ask a reporting (or business intelligence tool) to do.

While it can be done in ETL, you need to decide in advance how you're going to handle tied values.

SQL Logic in Datastage

Posted: Thu Sep 02, 2010 5:23 am
by ajay.vaidyanathan
How can you write an SQL Query on a DataSet in the first place? You either need to load the DataSet into a tabel and then try out whatever query you were planning to implement.

Posted: Thu Sep 02, 2010 2:28 pm
by kpsita
My input dataset has two fileds, FIELD1 and a AMOUNT field. I have to select maximum amout from the dataset corresponding to FIELD1.

Example:
Field1 amt
1 0
1 15
1 10
2 6
2 20

and my output should be
1 15
2 20

Thanks,

Posted: Thu Sep 02, 2010 2:35 pm
by anbu
Use remove duplicates to retain the maximum. Use hash partition and sort on the input to remove duplicates

Posted: Thu Sep 02, 2010 2:59 pm
by kpsita
Hi Anbu,

Thanks for your reply. I didn't understand your post. Should I first hash partition, and then remove the duplicates retaining the first value.

Thanks

Posted: Thu Sep 02, 2010 3:07 pm
by anbu
Yes do the hash partition first and then use remove duplicates

Posted: Fri Sep 03, 2010 2:28 am
by hemanthakumar
Hi,
you can use Aggregator stage. In Aggregator group by on field1 and find max(Amount).

Posted: Fri Sep 03, 2010 3:03 am
by way246
anbu wrote:Use remove duplicates to retain the maximum. Use hash partition and sort on the input to remove duplicates
If I am not wrong it shud be retain Last in remove duplicates if the sort is
asc

Posted: Fri Sep 03, 2010 6:02 am
by ray.wurlod
What ever happened to the requirement for RANK?

Posted: Fri Sep 03, 2010 2:33 pm
by DSShishya
Got lost in the crowd I guess :lol:

Posted: Mon Sep 06, 2010 8:36 am
by way246
Ray you are correct, RANK() is used to display records when one or more rows have same value in reporting.e.g when filter applied to display top 10 ranks(rows) and when rank 10 shared among 3 in this case 13 rows will be displayed even though filter is applied to display top 10 rows. Hope aggregate and filter can be applied to achive this in datastage.

Posted: Mon Sep 06, 2010 7:06 pm
by prakashdasika
While using Remove duplicates sort by Field1 (ASC) and Amount (DESC) on the input link and use Field1 as the Key. This should work.

Posted: Mon Sep 06, 2010 11:04 pm
by ray.wurlod
Try it. It doesn't handle tied ranks.

To handle rank properly in ETL - and I continue to maintain that it's not an ETL task but a display task - you can use stage variables that detect change (perhaps from a Sort stage key change column) and decide whether to preserve or increment the rank. At the same time another stage variable, always incrementing, is needed to determine what the next value needs to be after tied values are encountered.