Page 1 of 1

stage to get the max time for each date

Posted: Thu Oct 13, 2011 2:37 pm
by kennyapril
I have records coming in as duplicated and have same dates and different times.
I also have a uniquekey personnumber which is unique for each person.

Can anyone suggest me a stage where I can get the max time for that particular day and pick that single record and eliminate the records of other timings.

I tried using aggregator but it does not do the max for a date.


Thanks

Posted: Thu Oct 13, 2011 3:28 pm
by ray.wurlod
Convert the date to a Julian date (an integer) and get the max of that. Convert back to a date when needed.

Posted: Thu Oct 13, 2011 5:06 pm
by kennyapril
converting to Julian to transformer is using the function julianday fromdate.
and I have timestamp, so do I need to change it to date and then julian.
how do I do the max julian date in the transformer?




Also, can I truncate the timestamp field and make it a new field after that I will sort the newfield in decending order, personnumber as keys in sort stage

pass the original date field also with no change and remove the duplicates from the sort stage by making them false.will the new field descending does the descending for original field?

please suggest me will that work

Thanks

Posted: Thu Oct 13, 2011 7:57 pm
by mobashshar
Or....

Use Remove Duplicate Stage.
Keep the Last Record for Max value.
In Input link, SORT and PARTITION on PersonNumber and use only SORT for TimeStamp field in ascending order.

You will get one unique row for each personnumber with max date/time value.

Hope this helps.

Posted: Thu Oct 13, 2011 8:43 pm
by kennyapril
I can do the same as you said for getting the max date for each personnumber but my requirement was to get max time for each day and the result should have all the dates which the personnumber has but should be only max time of each day.
and eliminated the other times of that particular date.

hope you understood it


thank you

Posted: Thu Oct 13, 2011 9:41 pm
by mobashshar
Sure..
Then what you do is strip out Date and Time from your input timestamp field using TimestampToDate and TimestampToTime function.

Now you will have three fields as personnumber, date and time..

Now use Remove Duplicate Stage and keep the last row and SORT and PARTITION on PERSONNUMBER and DATE and SORT (only) with ascending order on TIME.

In this case you will get the result as follows:

PersonNumber Date Time
1 01-01-2011 22:00:00
1 01-02-2011 23:59:00
1 01-03-2011 16:16:16
2 01-01-2011 18:16:00
2 01-02-2011 06:06:06
and so on...

Down the line if you want to convert the different date and time fields into one timestamp field you can use TimestampFromDateTime function.

Hope this helps

Posted: Thu Oct 13, 2011 9:43 pm
by jwiles
It's fairly straightforward when you express it as:

There are duplicates across columns 1 and 2. I need to keep unique values across those columns, keeping the row with the max value of column 3.

It doesn't matter what type of data it is. This is a classic problem and the solution has been covered several times in the forum.

split date and time into different columns if not already. So now you have three columns.
1) partition on personnumber, or personnumber and date (columns 1 & 2)
2) sort on personnumber, date and time all ascending (columns 1, 2 & 3)
3) remove duplicates on personnumber and date, keeping the last row

In IS 8.5, you can also do this with a transformer loop, but there's no real need to unless you need to perform transformations as well.

Regards,

Posted: Fri Oct 14, 2011 8:33 am
by kennyapril
After the source in the transformer stage I split the timestamp into two fields date and time.
After transformer I used remove duplicates stage, In that stage I hash partitioned and sorted personnumber ascending order.
And the rest date and time fields only sorting and ascending order.
In the same stage I removed duplicates using personnumber and date as keys.

In this the way ou suggested, please correct me If i am wrong


Thanks for your time!!

Posted: Fri Oct 14, 2011 9:11 am
by mobashshar
except

just make sure to hash partition and sort DATE field also with personnumber. Time field will be sort only.

rest is looking fine

Posted: Fri Oct 14, 2011 9:22 am
by kennyapril
Yes now I hash partitioned and sorted the date field along with the person number.
Time field only sort but all fields ascending. and retain the last in remove duplicates.

Thanks for your detailed explanation.

Posted: Fri Oct 14, 2011 9:28 am
by chulett
So... resolved?

Posted: Fri Oct 14, 2011 9:31 am
by kennyapril
Yes resolved!!