How to achive this

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
prasad v
Participant
Posts: 174
Joined: Mon Mar 30, 2009 2:18 am

How to achive this

Post by prasad v »

I have to build the job as per the below query. Can anyone tell me

select a.act_id,a.create_DATE,a.Pay_TOTAL from payment_tab a where 1= (select count(distinct b.create_DATE) from payment_tab b where b.act_id=a.act_id and
b.create_DATE>=a.create_DATE)
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

You've posted a SQL query, but haven't actually specified if this is the source or target. What is your source data and what is your intended target and what form do both take?
prasad v
Participant
Posts: 174
Joined: Mon Mar 30, 2009 2:18 am

Post by prasad v »

I have source with many duplicate of act_id's with different pay_date and amounts.

what i want in the target is need act_id, latest pay_date and amount that customer has paid.

We can put the above query in Source and get the data but it is taking long time and still not getting the results.

So i am trying to achive this in Datastage Job level by taking the complete data from source and want to restrict in Tranformer or Aggregator.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Of course it takes a long time - you have a correlated subquery here.

The only way you will get it to run faster in DataStage is to have both query results available - then DataStage can do a lookup on the unique instances.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
z035011
Premium Member
Premium Member
Posts: 26
Joined: Thu Nov 15, 2007 8:51 am

Post by z035011 »

select a.act_id,a.create_DATE,a.Pay_TOTAL from payment_tab a order by create_date.....You a remove duplicate stage with retaining the max creat_datte
prasad v
Participant
Posts: 174
Joined: Mon Mar 30, 2009 2:18 am

Post by prasad v »

Hi

I am using Server Edition. in this Remove duplicate stage is not there.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You can do the same using stage variables in a transformer, if that is really what you need to do.
-craig

"You can never have too many knives" -- Logan Nine Fingers
prasad v
Participant
Posts: 174
Joined: Mon Mar 30, 2009 2:18 am

Post by prasad v »

Hi Chulet,

Can you please give me some explanation. how to do this as i am new to Datastage
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

One of the three Tech Tips in the zip file here should get you going. Like some of us that hang out here, they may be old but they still work good. Some days more gooder than others, of course. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Ragupathy
Participant
Posts: 9
Joined: Thu Feb 11, 2010 11:14 pm

Re: How to achive this

Post by Ragupathy »

prasad v wrote:I have to build the job as per the below query. Can anyone tell me

select a.act_id,a.create_DATE,a.Pay_TOTAL from payment_tab a where 1= (select count(distinct b.create_DATE) from payment_tab b where b.act_id=a.act_id and
b.create_DATE>=a.create_DATE)

Hi

Use Hash file to remove duplicates

and then proceeed to get the latest date amounts

and use aggreate to get he max dates
Ragu
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Re: How to achive this

Post by Kryt0n »

Ragupathy wrote: Hi

Use Hash file to remove duplicates

and then proceeed to get the latest date amounts

and use aggreate to get he max dates
Why do they want to remove duplicates? If you make two payments to your credit card on the same day, do you want one of them to be removed as a duplicate?

How big is the table? How many rows are expected after the qeury is run?
If we are talking 100+ million rows and only a small percent relevant then you may be better off trying to find why the query is slow.

Being new, can your colleagues not guide you on the requirement rather than finding an answer on DSX?
Post Reply