Historization in Datastage Parallel job.

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Piyush Singh Chauhan
Participant
Posts: 5
Joined: Wed Apr 03, 2013 5:34 am

Historization in Datastage Parallel job.

Post by Piyush Singh Chauhan »

I Have a situation in which I have to implement the below logic.

Existing record in the DB:

Key COL_A COL_B COL_C END_DT LTST_FLG
55 123 abc def 31/12/9999 Y

3 updates on the key 55
Key COL_A COL_B COL_C time
55 456 3/07/2015 8:00
55 ghi 3/07/2015 9:00
55 jkl 3/07/2015 10:00


Result should be like this:-

Key COL_A COL_B COL_C END_DT LTST_FLG
55 123 abc def 3/07/2015 8:00 N
55 456 abc def 3/07/2015 9:00 N
55 456 ghi def 3/07/2015 10:00 N
55 456 ghi jkl 31/12/9999 Y


I have used 2 stage variables to implement this and its working fine for one value of key but that is not working for other key columns.

Could anyone help me out on this ?

Many thanks in advance.

Piyush Singh
Last edited by Piyush Singh Chauhan on Mon Jul 20, 2015 5:51 am, edited 1 time in total.
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

So I am assuming here that your update data should actually look like this:

Code: Select all

3 updates on the key 55 
Key   COL_A     COL_B     COL_C    time 
55    456                          3/07/2015 8:00 
55              ghi                3/07/2015 9:00 
55                        jkl      3/07/2015 10:00 
Personally - I interpret is as follows:
What you are really doing is not an update but a delete/update then insert.
Also assuming here that the key is actually Key+END_DT
With this in mind I would do the following:

1. Read in the any existing records via lookup
2. From that reference record use a transformer to build up your records for insert. You would need to sort your inputs (including the existing record) by key value and end_dt. Then use a couple of stage variables to determine the output values for cols A,B and C, as well as determining which record is your lastest for the LTST_FLG value.
3. Output to one delete/update and one insert link.

EDIT: I have changed the above to say delete/update rather than just delete. Same logic applies.

You might also consider looking at the SCD stage (this looks like a Type2)
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

ShaneMuir wrote:You might also consider looking at the SCD stage (this looks like a Type2)
Yup.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Piyush Singh Chauhan
Participant
Posts: 5
Joined: Wed Apr 03, 2013 5:34 am

Post by Piyush Singh Chauhan »

Thanks for your replies. Apologies for wrong explanation.
Initially we were having type 2, but now we have to implement logic like below example.

I have tried using stage variable and storing the values in it then using the same stage varibale to pass the value if the value is null,
but this is not working as i have not considered the key for the loop.

Code: Select all

Key     COL_A      COL_B         COL_C          END_DT       LTST_FLG 
 55     123         abc           def           31/12/9999     Y 


3 updates on the key 55 
 Key   COL_A     COL_B     COL_C               time 
 55    456        null      null               3/07/2015 8:00 
 55    null        ghi      null               3/07/2015 9:00 
 55    null       null      jkl                3/07/2015 10:00 


Result after updates on the key 55 should be like this 
 Key   COL_A        COL_B             COL_C         time 
 55    456           abc               def         3/07/2015 8:00 
 55    456           ghi               def         3/07/2015 9:00 
 55    456           ghi               jkl         3/07/2015 10:00
Piyush
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

So the NULL in your incoming record indicates "no change" for that field?

And it's still a Type 2. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Piyush Singh Chauhan
Participant
Posts: 5
Joined: Wed Apr 03, 2013 5:34 am

Post by Piyush Singh Chauhan »

But now instead of null we have to populate the previous entry for the same key field in the columns
Piyush
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

Piyush Singh Chauhan wrote:I have tried using stage variable and storing the values in it then using the same stage varibale to pass the value if the value is null,
but this is not working as i have not considered the key for the loop.
No need for a loop
Just partition your data on the "Key" field and sort your data, on the key field and the date, prior to your transform.
In the transform, use a set of stage variables to build your outputs.
ie
svCol = If keyChange then Orig.col else if IsNull(New.Col) then svCol else if svCol <> New.Col then New.Col
Piyush Singh Chauhan
Participant
Posts: 5
Joined: Wed Apr 03, 2013 5:34 am

Post by Piyush Singh Chauhan »

Worst part is we have to do operations one the same key column
Piyush
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

So?

You are still outputting the same number rows that you input. So technically you are only ever performing one update per column per input row. (ie in your example 3 potential column updates per input row). You just apply each change in a stage variable. If you data is sorted and partitioned correctly it will apply the changes row by row, and output correctly.

This by the way is exactly what the SCD stage would do for you - you really should investigate its use.
Post Reply