Historization in Datastage Parallel job.

Piyush Singh Chauhan · Mon Jul 20, 2015 3:54 am

I Have a situation in which I have to implement the below logic.

Existing record in the DB:

Key COL_A COL_B COL_C END_DT LTST_FLG
55 123 abc def 31/12/9999 Y

3 updates on the key 55
Key COL_A COL_B COL_C time
55 456 3/07/2015 8:00
55 ghi 3/07/2015 9:00
55 jkl 3/07/2015 10:00

Result should be like this:-

Key COL_A COL_B COL_C END_DT LTST_FLG
55 123 abc def 3/07/2015 8:00 N
55 456 abc def 3/07/2015 9:00 N
55 456 ghi def 3/07/2015 10:00 N
55 456 ghi jkl 31/12/9999 Y

I have used 2 stage variables to implement this and its working fine for one value of key but that is not working for other key columns.

Could anyone help me out on this ?

Many thanks in advance.

Piyush Singh

ShaneMuir · Post by **ShaneMuir** » Mon Jul 20, 2015 5:25 am

So I am assuming here that your update data should actually look like this:

Code: Select all

3 updates on the key 55 
Key   COL_A     COL_B     COL_C    time 
55    456                          3/07/2015 8:00 
55              ghi                3/07/2015 9:00 
55                        jkl      3/07/2015 10:00

Personally - I interpret is as follows:
What you are really doing is not an update but a delete/update then insert.
Also assuming here that the key is actually Key+END_DT
With this in mind I would do the following:

1. Read in the any existing records via lookup
2. From that reference record use a transformer to build up your records for insert. You would need to sort your inputs (including the existing record) by key value and end_dt. Then use a couple of stage variables to determine the output values for cols A,B and C, as well as determining which record is your lastest for the LTST_FLG value.
3. Output to one delete/update and one insert link.

EDIT: I have changed the above to say delete/update rather than just delete. Same logic applies.

You might also consider looking at the SCD stage (this looks like a Type2)

chulett · Post by **chulett** » Mon Jul 20, 2015 6:25 am

ShaneMuir wrote:You might also consider looking at the SCD stage (this looks like a Type2)

Yup.

Piyush Singh Chauhan · Mon Jul 20, 2015 9:06 am

Thanks for your replies. Apologies for wrong explanation.
Initially we were having type 2, but now we have to implement logic like below example.

I have tried using stage variable and storing the values in it then using the same stage varibale to pass the value if the value is null,
but this is not working as i have not considered the key for the loop.

Code: Select all

Key     COL_A      COL_B         COL_C          END_DT       LTST_FLG 
 55     123         abc           def           31/12/9999     Y 


3 updates on the key 55 
 Key   COL_A     COL_B     COL_C               time 
 55    456        null      null               3/07/2015 8:00 
 55    null        ghi      null               3/07/2015 9:00 
 55    null       null      jkl                3/07/2015 10:00 


Result after updates on the key 55 should be like this 
 Key   COL_A        COL_B             COL_C         time 
 55    456           abc               def         3/07/2015 8:00 
 55    456           ghi               def         3/07/2015 9:00 
 55    456           ghi               jkl         3/07/2015 10:00

chulett · Post by **chulett** » Mon Jul 20, 2015 9:34 am

So the NULL in your incoming record indicates "no change" for that field?

And it's still a Type 2.

Piyush Singh Chauhan · Mon Jul 20, 2015 9:37 am

But now instead of null we have to populate the previous entry for the same key field in the columns

ShaneMuir · Post by **ShaneMuir** » Mon Jul 20, 2015 9:41 am

Piyush Singh Chauhan wrote:I have tried using stage variable and storing the values in it then using the same stage varibale to pass the value if the value is null,
but this is not working as i have not considered the key for the loop.

No need for a loop
Just partition your data on the "Key" field and sort your data, on the key field and the date, prior to your transform.
In the transform, use a set of stage variables to build your outputs.
ie
svCol = If keyChange then Orig.col else if IsNull(New.Col) then svCol else if svCol <> New.Col then New.Col

Piyush Singh Chauhan · Mon Jul 20, 2015 9:58 am

Worst part is we have to do operations one the same key column

ShaneMuir · Post by **ShaneMuir** » Mon Jul 20, 2015 10:11 am

So?

You are still outputting the same number rows that you input. So technically you are only ever performing one update per column per input row. (ie in your example 3 potential column updates per input row). You just apply each change in a stage variable. If you data is sorted and partitioned correctly it will apply the changes row by row, and output correctly.

This by the way is exactly what the SCD stage would do for you - you really should investigate its use.