Get records with MAXIMUM value in a timestamp column

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kumarjit
Participant
Posts: 99
Joined: Fri Oct 12, 2012 7:47 am
Location: Kolkata

Get records with MAXIMUM value in a timestamp column

Post by kumarjit »

Hi there ,
I am having a set of data with three columns :
1. MEMBER (varchar)
2. PHONE (varchar)
3. UPDATE_TIME (timestamp)
The data looks something like this below :
M1 P1 T1
M1 P1 T7
M1 P1 T1
M1 P1 T1
M2 P2 T8
M2 P2 T2
M2 P2 T6
M2 P2 T2
where M's stand for MEMBER, P's stand for PHONE and T's stand for UPDATE_TIME and T1>T2>T6>T7>T8
What is expected is to create a Datastage job(no other tool like PL/SQL etc. will be accepted by business) that will implement the following logic:
1. For EACH MEMBER-PHONE combination, it will get ALL Records which has UPDATE_TIME=MAX of UPDATE_TIME for that MEMBER-PHONE combination, and insert these records in one table
2. Any other records thereafter, will be considered as DUPLICATE's , and for each of these records there will be another column called DUPCHECK which will be set to "DUPLICATE"

The outcome for the above mentioned data will be something like

Good records:
M1 P1 T1
M1 P1 T1
M1 P1 T1
M2 P2 T2
M2 P2 T2

Duplicate records:
M1 P1 T7 DUPLICATE
M2 P2 T6 DUPLICATE
M2 P2 T8 DUPLICATE

Please note, we needs to capture all records, none of them could be dropped/rejected, and I have already used the Remove Duplicate stage with Join stage to implement this logic,
but it is taking a hell lot of time.
1.Cant this logic be acheived using SORT stage only
2.There will be millions of data , so a very optimized approach will be highly appreciated.


Thanks
Kumarjit.
Pain is the best teacher, but very few attend his class..
jerome_rajan
Premium Member
Premium Member
Posts: 376
Joined: Sat Jan 07, 2012 12:25 pm
Location: Piscataway

Post by jerome_rajan »

Do a hash partition using MEMBER & PHONE and perform a descending sort using the time-stamp column.In the sort stage, create the KEY CHANGE COLUMN. Your records should be grouped and sorted and will look something like this.

Code: Select all

M1 P1 T1 1 
M1 P1 T1 0
M1 P1 T1 0
M1 P1 T7 1
M2 P2 T2 1
M2 P2 T2 0
M2 P2 T6 1
M2 P2 T8 1
Code your transformer logic to filter out all the records in a group (except the first record with value 1) which have KEY CHANGE value as 1.

The output should be as expected
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn

Life is really simple, but we insist on making it complicated.
kumarjit
Participant
Posts: 99
Joined: Fri Oct 12, 2012 7:47 am
Location: Kolkata

Post by kumarjit »

Thanks for the reply Rajan.

I understood the first part, but things start to mess up from the point where you have mentioned "Code your transformer logic to filter out all the records in a group....".
Could you please elaborate ?

Thanks in advance,
Kumarjit.
Pain is the best teacher, but very few attend his class..
jerome_rajan
Premium Member
Premium Member
Posts: 376
Joined: Sat Jan 07, 2012 12:25 pm
Location: Piscataway

Post by jerome_rajan »

Introduce a stage variable in the transformer that will hold a sum. Initialize the value of this stage variable(svCounter) to 0. For a given group (combination of MEMBER & PHONE), increment svCounter by 1 when there's a record with KEY_CHANGE=1. When svCounter=2, then increment svCounter for any value of KEY_CHANGE in that group.

Code: Select all

M   P   T  Key_Chg  svCounter
M1 P1 T1 1                    1
M1 P1 T1 0                    1
M1 P1 T1 0                    1
M1 P1 T7 1                    2
M2 P2 T2 1                    1
M2 P2 T2 0                    1
M2 P2 T6 1                    2
M2 P2 T8 1                    3
M2 P2 T8 0                    4
Add a constraint which filters out all records which have svCounter>1.
Hope it helps
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn

Life is really simple, but we insist on making it complicated.
kumarjit
Participant
Posts: 99
Joined: Fri Oct 12, 2012 7:47 am
Location: Kolkata

Post by kumarjit »

Thanks again. But, the things which I didn't understand are :

a) How could you increment the value of the stage variable svCounter for EACH GROUP (MEMBER-PHONE) ? The reason fo asking so is , how could a Transformer stage identify each unique group of MEMBER-PHONE ?
b) Why should I increment the value of svCounter to any value of KEY_CHANGE col for that group when svCounter=2 ? What if I just go on incrementing svCounter with the value of KEY_CHANGE col even after svCounter=2 ?

Thanks for your patience,
Kumarjit.
Pain is the best teacher, but very few attend his class..
jerome_rajan
Premium Member
Premium Member
Posts: 376
Joined: Sat Jan 07, 2012 12:25 pm
Location: Piscataway

Post by jerome_rajan »

The answer to your first question is elementary. You'd do better to figure out the answer yourself (Hint : Key Break or better still 'Looping' if you use 8.5 or above). If you can't, search the forum. It would be redundant if I repeat something as basic as that here.

To answer your second question, the idea is to filter out the data. You can increment the counter by 1, the key_change value or by the speed of light!
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn

Life is really simple, but we insist on making it complicated.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

:!: Kumarjit - please start using the Reply to topic 'button' rather than Reply with quote and quoting everything every time you post. Quote when you need to, just not all the time by default.

Thanks.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kumarjit
Participant
Posts: 99
Joined: Fri Oct 12, 2012 7:47 am
Location: Kolkata

Post by kumarjit »

Sorry , my bad ...... :( ..... Probably because I'm a first time user ..... But, I dont really see the Reply to topic button in my login ........

Anyways , thanks for the info.....
Pain is the best teacher, but very few attend his class..
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

No worries, I see you've found it now. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply