DSXchange

Posted: **Mon Apr 19, 2010 1:49 am**

I have to build the job as per the below query. Can anyone tell me

select a.act_id,a.create_DATE,a.Pay_TOTAL from payment_tab a where 1= (select count(distinct b.create_DATE) from payment_tab b where b.act_id=a.act_id and
b.create_DATE>=a.create_DATE)

Posted: **Mon Apr 19, 2010 2:05 am**

You've posted a SQL query, but haven't actually specified if this is the source or target. What is your source data and what is your intended target and what form do both take?

Posted: **Mon Apr 19, 2010 2:18 am**

I have source with many duplicate of act_id's with different pay_date and amounts.

what i want in the target is need act_id, latest pay_date and amount that customer has paid.

We can put the above query in Source and get the data but it is taking long time and still not getting the results.

So i am trying to achive this in Datastage Job level by taking the complete data from source and want to restrict in Tranformer or Aggregator.

Posted: **Mon Apr 19, 2010 2:57 am**

Of course it takes a long time - you have a correlated subquery here.

The only way you will get it to run faster in DataStage is to have both query results available - then DataStage can do a lookup on the unique instances.

Posted: **Mon Apr 19, 2010 7:14 am**

select a.act_id,a.create_DATE,a.Pay_TOTAL from payment_tab a order by create_date.....You a remove duplicate stage with retaining the max creat_datte

Posted: **Mon Apr 19, 2010 8:50 am**

Hi

I am using Server Edition. in this Remove duplicate stage is not there.

Posted: **Mon Apr 19, 2010 9:04 am**

You can do the same using stage variables in a transformer, if that is really what you need to do.

Posted: **Mon Apr 19, 2010 9:07 am**

Hi Chulet,

Can you please give me some explanation. how to do this as i am new to Datastage

Posted: **Mon Apr 19, 2010 12:05 pm**

One of the three Tech Tips in the zip file here should get you going. Like some of us that hang out here, they may be old but they still work good. Some days more gooder than others, of course.

Posted: **Thu Apr 29, 2010 3:52 am**

prasad v wrote:I have to build the job as per the below query. Can anyone tell me

select a.act_id,a.create_DATE,a.Pay_TOTAL from payment_tab a where 1= (select count(distinct b.create_DATE) from payment_tab b where b.act_id=a.act_id and
b.create_DATE>=a.create_DATE)

Hi

Use Hash file to remove duplicates

and then proceeed to get the latest date amounts

and use aggreate to get he max dates

Posted: **Thu Apr 29, 2010 3:24 pm**

Ragupathy wrote: Hi

Use Hash file to remove duplicates

and then proceeed to get the latest date amounts

and use aggreate to get he max dates

Why do they want to remove duplicates? If you make two payments to your credit card on the same day, do you want one of them to be removed as a duplicate?

How big is the table? How many rows are expected after the qeury is run?
If we are talking 100+ million rows and only a small percent relevant then you may be better off trying to find why the query is slow.

Being new, can your colleagues not guide you on the requirement rather than finding an answer on DSX?

DSXchange

How to achive this

How to achive this

Re: How to achive this

Re: How to achive this