stage to get the max time for each date

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kennyapril
Participant
Posts: 248
Joined: Fri Jul 30, 2010 9:04 am

stage to get the max time for each date

Post by kennyapril »

I have records coming in as duplicated and have same dates and different times.
I also have a uniquekey personnumber which is unique for each person.

Can anyone suggest me a stage where I can get the max time for that particular day and pick that single record and eliminate the records of other timings.

I tried using aggregator but it does not do the max for a date.


Thanks
Regards,
Kenny
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Convert the date to a Julian date (an integer) and get the max of that. Convert back to a date when needed.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kennyapril
Participant
Posts: 248
Joined: Fri Jul 30, 2010 9:04 am

Post by kennyapril »

converting to Julian to transformer is using the function julianday fromdate.
and I have timestamp, so do I need to change it to date and then julian.
how do I do the max julian date in the transformer?




Also, can I truncate the timestamp field and make it a new field after that I will sort the newfield in decending order, personnumber as keys in sort stage

pass the original date field also with no change and remove the duplicates from the sort stage by making them false.will the new field descending does the descending for original field?

please suggest me will that work

Thanks
Regards,
Kenny
mobashshar
Participant
Posts: 91
Joined: Wed Apr 20, 2005 7:59 pm
Location: U.S.

Post by mobashshar »

Or....

Use Remove Duplicate Stage.
Keep the Last Record for Max value.
In Input link, SORT and PARTITION on PersonNumber and use only SORT for TimeStamp field in ascending order.

You will get one unique row for each personnumber with max date/time value.

Hope this helps.
kennyapril
Participant
Posts: 248
Joined: Fri Jul 30, 2010 9:04 am

Post by kennyapril »

I can do the same as you said for getting the max date for each personnumber but my requirement was to get max time for each day and the result should have all the dates which the personnumber has but should be only max time of each day.
and eliminated the other times of that particular date.

hope you understood it


thank you
Regards,
Kenny
mobashshar
Participant
Posts: 91
Joined: Wed Apr 20, 2005 7:59 pm
Location: U.S.

Post by mobashshar »

Sure..
Then what you do is strip out Date and Time from your input timestamp field using TimestampToDate and TimestampToTime function.

Now you will have three fields as personnumber, date and time..

Now use Remove Duplicate Stage and keep the last row and SORT and PARTITION on PERSONNUMBER and DATE and SORT (only) with ascending order on TIME.

In this case you will get the result as follows:

PersonNumber Date Time
1 01-01-2011 22:00:00
1 01-02-2011 23:59:00
1 01-03-2011 16:16:16
2 01-01-2011 18:16:00
2 01-02-2011 06:06:06
and so on...

Down the line if you want to convert the different date and time fields into one timestamp field you can use TimestampFromDateTime function.

Hope this helps
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

It's fairly straightforward when you express it as:

There are duplicates across columns 1 and 2. I need to keep unique values across those columns, keeping the row with the max value of column 3.

It doesn't matter what type of data it is. This is a classic problem and the solution has been covered several times in the forum.

split date and time into different columns if not already. So now you have three columns.
1) partition on personnumber, or personnumber and date (columns 1 & 2)
2) sort on personnumber, date and time all ascending (columns 1, 2 & 3)
3) remove duplicates on personnumber and date, keeping the last row

In IS 8.5, you can also do this with a transformer loop, but there's no real need to unless you need to perform transformations as well.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
kennyapril
Participant
Posts: 248
Joined: Fri Jul 30, 2010 9:04 am

Post by kennyapril »

After the source in the transformer stage I split the timestamp into two fields date and time.
After transformer I used remove duplicates stage, In that stage I hash partitioned and sorted personnumber ascending order.
And the rest date and time fields only sorting and ascending order.
In the same stage I removed duplicates using personnumber and date as keys.

In this the way ou suggested, please correct me If i am wrong


Thanks for your time!!
Regards,
Kenny
mobashshar
Participant
Posts: 91
Joined: Wed Apr 20, 2005 7:59 pm
Location: U.S.

Post by mobashshar »

except

just make sure to hash partition and sort DATE field also with personnumber. Time field will be sort only.

rest is looking fine
kennyapril
Participant
Posts: 248
Joined: Fri Jul 30, 2010 9:04 am

Post by kennyapril »

Yes now I hash partitioned and sorted the date field along with the person number.
Time field only sort but all fields ascending. and retain the last in remove duplicates.

Thanks for your detailed explanation.
Regards,
Kenny
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

So... resolved?
-craig

"You can never have too many knives" -- Logan Nine Fingers
kennyapril
Participant
Posts: 248
Joined: Fri Jul 30, 2010 9:04 am

Post by kennyapril »

Yes resolved!!
Regards,
Kenny
Post Reply