How to reject older version of records

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sathyak098
Participant
Posts: 23
Joined: Tue May 14, 2013 6:34 am

How to reject older version of records

Post by sathyak098 »

Hi,

My input is like below.

Name,Version_No,Timestamp,detail
AAA,03,9/10/2013 03:51:57,qwer
AAA,02,9/10/2013 02:40:17,rfgt
AAA,04,9/10/2013 03:55:40,asdf1
BBB,01,9/9/2013 03:25:57,uyhj0
BBB,02,9/9/2013 03:51:45,wert
CCC,02,9/9/2013 02:30:50,uyhj12
CCC,03,9/9/2013 02:35:40,uyhj23

I need to process the only the records which has maximun version no and timestamp.
I did use create key change of sort stage based on key 'Name'. However,
It does work for only a few data.
Can I get any input?
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

I have to admit I did not get what you are trying to say. What problem you got while using create key change column?

Did you ensure it is correctly partitioned and sorted?
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If you sort your input by Name, then by desceding Version_No, then by descending Timestamp thenyour first record in each group will be the one you wish to keep and you could set your sort stage to not allow duplicates and get your desired result.
sathyak098
Participant
Posts: 23
Joined: Tue May 14, 2013 6:34 am

Post by sathyak098 »

My job design is
Table -----Sort-----File

While extracting, in a query I added name in ascending, version and timestamp in descending.

in sort stage, I applied hash partition and sort on 'Name'. And gave key as 'Name'.

For a few records I got '1' for first record (which are max of version & timestamp). For a few, i got '0', which are max of ver and timestamp.
sathyak098
Participant
Posts: 23
Joined: Tue May 14, 2013 6:34 am

Post by sathyak098 »

It is taking combination all 3 fields as a group. So for all the records I'm getting create key value '1'
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If you go back to your sample data, can you have multiple records with identical "Name" and "Version" columns but with different "Timstamp" values?

If you use the sort as you've defined, with the "Name" column used to partition the data then add a remove duplicates stage using "Name" and "Version" as the key and keep the first record. If a given "Version" contains more than one timestamp, then the prior sort descending ensures that only the one with the highest value is kept.
sathyak098
Participant
Posts: 23
Joined: Tue May 14, 2013 6:34 am

Post by sathyak098 »

No. I will get duplicates based on 'Name'. I will not get duplication combination of 2 fields(name and version (or) name and timestamp)
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Sorry, just duplicate on the single key "Name". If the data is sorted correctly you will get one record per "Name" containing the most recent, highest numbered "Version".
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

Seems your partitioning/sort order is messed up during transfer from table to sort stage.

Can you try sorting and then remove duplicate keeping same partitioning between sort and remove duplicate. Keep first or last based on your sort order?

Or a fork join will also work. These will have some overhead but just to try if it relates to data or partitioning/sorting.

By the way which DataStage version you have there.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
sathyak098
Participant
Posts: 23
Joined: Tue May 14, 2013 6:34 am

Post by sathyak098 »

Hi,
Im using 8.5.
and, I do need to send older version of records to rejections which will be used by other process, hence I can not apply remove duplicates.

So, in this situation, do I need to use double sort stages?
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

Try with two sort stage or try running on one node to check if it works fine.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
Post Reply