Row with Max Date

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
ghostraider
Premium Member
Premium Member
Posts: 49
Joined: Wed Dec 13, 2006 6:31 pm

Row with Max Date

Post by ghostraider »

Hi all, I have two fields named PNumber and TDate coming from the source(Sql Server). The data is similar as shown below

PNumber TDate
1100 2003-12-04
1100 2004-12-05
1200 2004-11-06
1200 2004-11-01

Now i need to do a lookup with another file which has a Description and Location Field. The Description and Location should be populated for the PNumber with maximum date and Null should be displayed for the PNumber with lesser dates. Can someone let me know as to how i should go about in solving this. Target also is Sql Server by the way.

Thanks
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Like most problems, I'm sure this has multiple solutions. Off the top of my head, you could run the data through a sort stage or use an 'order by' in your sourcing sql to get the data coming in by 'TDate desc' - that way you know the first row you encounter will have the 'max date' on it.

Then take the output from your lookup and use it only when @INROWNUM=1 and for all other rows pass out NULLs for 'the PNumber with lesser dates'.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ghostraider
Premium Member
Premium Member
Posts: 49
Joined: Wed Dec 13, 2006 6:31 pm

Post by ghostraider »

I used the sort stage, now how do i get the PNumber with the max date each time. The first row will have the max TDate and Pnumber, but how do i use @INROWNUM=1 for the other Pnumber values with max date. Could you please explain.

Thanks
chulett wrote:Like most problems, I'm sure this has multiple solutions. Off the top of my head, you could run the data through a sort stage or use an 'order by' in your sourcing sql to get the data coming in by 'TDate desc' - that way you know the first row you encounter will have the 'max date' on it.

Then take the output from your lookup and use it only when @INROWNUM=1 and for all other rows pass out NULLs for 'the PNumber with lesser dates'.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

For example, for the derivation of the 'Description' field:

Code: Select all

If @INROWNUM = 1 Then Lookup.Description Else @NULL
Now, that assumes a single row will have the 'maximum date' and is all you need to populate. If you are saying that multiple rows could have this max date and every 'max date' row needs do this, then you'll need to play the game differently.

First create a hashed file from your source by selecting 'MAX(TDate)' from your source. Either that or use your Sorted data and only write the @OUTROWNUM=1 record to the hashed file.

Then as you stream in your input data (which no longer needs to be sorted btw), do a reference lookup to this hashed file with your incoming date and if you get a hit then you know this is one of the rows that needs the secondary lookup values.

I'd suggest two transformers. The first does the max lookup and adds a field to the link showing if the lookup succeeded. Then in the second transformer, do the description/location lookup and only pass it out on the rows where it shows the first lookup was successful.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ghostraider
Premium Member
Premium Member
Posts: 49
Joined: Wed Dec 13, 2006 6:31 pm

Post by ghostraider »

PNumber TDate
1100 2003-12-04
1100 2004-12-05
1200 2004-11-06
1200 2004-11-01

so for the above data, if i sort date by dsc order, and then use 'If @Inrownum=1 Then Lookup. Description Else @Null' will i get the following output.

PNumber TDate Descritption
1100 2004-12-05 Description
1100 2003-12-04 Null
1200 2004-11-06 Description
1200 2004-11-01 Null

I am thinking only the first row will have Description if i use @Inrownum=1.

Please advise.

Thanks
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

:? Will you get the following output? No.

Argh! It always helps to give as complete a description of what you need to do as possible from the beginning. It will cut down on the amount of thrashing people trying to help you will go through. Your examples show that neither of the two techniques I listed will work for you. [sigh]

You need to do this lookup on the max date for every discrete PNumber value. Sort your data by PNumber ascending then TDate descending. Use stage variables to do standard 'group change detection' so that you know when you are on the first row of any particular group where you will use the results of your lookup. For all other records, pass nulls.

If stage variables are something new to you, go to our resident Guru Kim Duke's tip page and download the TwoTechTips.zip file.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply