How to reject older version of records
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 23
- Joined: Tue May 14, 2013 6:34 am
How to reject older version of records
Hi,
My input is like below.
Name,Version_No,Timestamp,detail
AAA,03,9/10/2013 03:51:57,qwer
AAA,02,9/10/2013 02:40:17,rfgt
AAA,04,9/10/2013 03:55:40,asdf1
BBB,01,9/9/2013 03:25:57,uyhj0
BBB,02,9/9/2013 03:51:45,wert
CCC,02,9/9/2013 02:30:50,uyhj12
CCC,03,9/9/2013 02:35:40,uyhj23
I need to process the only the records which has maximun version no and timestamp.
I did use create key change of sort stage based on key 'Name'. However,
It does work for only a few data.
Can I get any input?
My input is like below.
Name,Version_No,Timestamp,detail
AAA,03,9/10/2013 03:51:57,qwer
AAA,02,9/10/2013 02:40:17,rfgt
AAA,04,9/10/2013 03:55:40,asdf1
BBB,01,9/9/2013 03:25:57,uyhj0
BBB,02,9/9/2013 03:51:45,wert
CCC,02,9/9/2013 02:30:50,uyhj12
CCC,03,9/9/2013 02:35:40,uyhj23
I need to process the only the records which has maximun version no and timestamp.
I did use create key change of sort stage based on key 'Name'. However,
It does work for only a few data.
Can I get any input?
-
- Premium Member
- Posts: 1735
- Joined: Thu Mar 01, 2007 5:44 am
- Location: Troy, MI
If you sort your input by Name, then by desceding Version_No, then by descending Timestamp thenyour first record in each group will be the one you wish to keep and you could set your sort stage to not allow duplicates and get your desired result.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 23
- Joined: Tue May 14, 2013 6:34 am
My job design is
Table -----Sort-----File
While extracting, in a query I added name in ascending, version and timestamp in descending.
in sort stage, I applied hash partition and sort on 'Name'. And gave key as 'Name'.
For a few records I got '1' for first record (which are max of version & timestamp). For a few, i got '0', which are max of ver and timestamp.
Table -----Sort-----File
While extracting, in a query I added name in ascending, version and timestamp in descending.
in sort stage, I applied hash partition and sort on 'Name'. And gave key as 'Name'.
For a few records I got '1' for first record (which are max of version & timestamp). For a few, i got '0', which are max of ver and timestamp.
-
- Participant
- Posts: 23
- Joined: Tue May 14, 2013 6:34 am
If you go back to your sample data, can you have multiple records with identical "Name" and "Version" columns but with different "Timstamp" values?
If you use the sort as you've defined, with the "Name" column used to partition the data then add a remove duplicates stage using "Name" and "Version" as the key and keep the first record. If a given "Version" contains more than one timestamp, then the prior sort descending ensures that only the one with the highest value is kept.
If you use the sort as you've defined, with the "Name" column used to partition the data then add a remove duplicates stage using "Name" and "Version" as the key and keep the first record. If a given "Version" contains more than one timestamp, then the prior sort descending ensures that only the one with the highest value is kept.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 23
- Joined: Tue May 14, 2013 6:34 am
Sorry, just duplicate on the single key "Name". If the data is sorted correctly you will get one record per "Name" containing the most recent, highest numbered "Version".
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Premium Member
- Posts: 1735
- Joined: Thu Mar 01, 2007 5:44 am
- Location: Troy, MI
Seems your partitioning/sort order is messed up during transfer from table to sort stage.
Can you try sorting and then remove duplicate keeping same partitioning between sort and remove duplicate. Keep first or last based on your sort order?
Or a fork join will also work. These will have some overhead but just to try if it relates to data or partitioning/sorting.
By the way which DataStage version you have there.
Can you try sorting and then remove duplicate keeping same partitioning between sort and remove duplicate. Keep first or last based on your sort order?
Or a fork join will also work. These will have some overhead but just to try if it relates to data or partitioning/sorting.
By the way which DataStage version you have there.
Priyadarshi Kunal
Genius may have its limitations, but stupidity is not thus handicapped.
Genius may have its limitations, but stupidity is not thus handicapped.
-
- Participant
- Posts: 23
- Joined: Tue May 14, 2013 6:34 am
-
- Premium Member
- Posts: 1735
- Joined: Thu Mar 01, 2007 5:44 am
- Location: Troy, MI