SCD In Parallel Jobs

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
shivakumar
Participant
Posts: 31
Joined: Wed Mar 17, 2004 3:33 am

SCD In Parallel Jobs

Post by shivakumar »

HI,

I am getting the records from the Source table and after doing the Look up with target I have to Insert the New records and also the Updated records in the Target table.

If i Use the Change Data capture Stage Or Difference stage then it will gives the Updated records onlyin the out put,but I have to Mainatain the History in the Target table ie I want the Existing record and also the Updated record in the Target Table.I dont have Flag Columns in the Target Tables.

Is this Possible in Parallel jobs Without Using the Transformer Logic ?

Regards
Siva
Last edited by shivakumar on Tue Aug 01, 2006 1:29 am, edited 1 time in total.
opdas
Participant
Posts: 115
Joined: Wed Feb 01, 2006 7:25 am

Post by opdas »

In CDC Stage you will get the updated, copy, insert and deleted records based on the logical keys you have defined.
Om Prakash


"There are things that are known, and there are things that are unknown, and in between there are doors"
iamnagus
Participant
Posts: 48
Joined: Wed Sep 29, 2004 1:16 am

Post by iamnagus »

This topic already discussed in the earlier days.

Refer this topic:
viewtopic.php?t=102403&highlight=scd

Regards,
iamnagus.
shivakumar
Participant
Posts: 31
Joined: Wed Mar 17, 2004 3:33 am

Re: SCD In Parallel Jobs

Post by shivakumar »

[quote="shivakumar"]HI,

I am getting the records from the Source table and after doing the Look up with target I have to Insert the New records and also the Updated records in the Target table.

If i Use the Change Data capture Stage Or Difference stage then it will gives the Updated records onlyin the out put,but I have to Mainatain the History in the Target table ie I want the Existing record and also the Updated record in the Target Table.I dont have Flag Columns in the Target Tables.

Is this Possible in Parallel jobs Without Using the Transformer Logic ?

Regards
Siva[/I dont have the Date fields in the Target Table,Here I am asking that How do I get the Updated and Existing Records without a Transformation Logic].Is there any Stages available to get the Updateed records and also the Previous records in the Output.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use a Compare stage; as well as the result you get each source row as a single column subrecord (you can promote these later with Promote Subrecord stages).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

There are two ways to get and apply your change capture. You start with a Before set of data and an After set of data. If you use the change capture stage alone it gives you new/changed/deleted/unchanged records. There is a flag to turn each one of these on or off and an extra field is written out that indicates what type of change it is.

You can now either apply this to a target database table using insert/update/delete/load stages and transformers and filters OR you can merge it with your Before set of data using the Change Apply stage.

The Change Apply will give you the new changes and the old history.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

The difficulty in SCD in parallel is the complicated nature of a type 1, 2, and hybrid 1-2. You need to transform a new row and compare it to the current row. You may compare just a subset of all columns. This means that if you have 40 columns, maybe only 35 pertain to looking for a difference. You may even use CRC32 to facilitate this comparison but computing CRC32 values for the column groups in current and new and then do an equivalency test.

In addition, the hybrd 1-2 type means a smaller subset of columns cause an update, and another subset of columns cause an insert of the next variant. In addition, in type 1 handling you potentially update the current row with an effective end date and optionally remove a current indicator flag.

If there is any chance that you may have time ordered data and need to detect a repeating natural key, then you would potentially have a whole family of SCD rows to generate for the same natural key in the correct order. This would require some device to stage/store the rows as you assemble them. PX has no storage mechanism for holding rows and updating them. .ds datasets are write once, you cannot reference a row from it and modify it. About the only method is a custom buildop in this case.

The combination of stages required to handle SCD type logic, as well as deal with all of the various potential inserts and update streams of rows make SCD in a parallel job a daunting task. If you have to deal with repeating natural keys of time ordered data, you're now outside the tool and writing a buildop. For this reason, unless performance requires it, you should consider using a Server job and hashed file stages. You can easily store the current row in a hashed file, transform the new row, reference and compare to the current row, and optionally overwrite the current row with a newer row or insert a new row as well as retire/update the current row. Then, extract from your staging hashed file your inserts and updates and then load to your target table.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
vnspn
Participant
Posts: 165
Joined: Mon Feb 12, 2007 11:42 am

Post by vnspn »

Hi shivakumar,

Can you please let us know if you were able to achive this logic. We have a similar situation.

Thanks.
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Post by keshav0307 »

for updated records, which column you are going to update in the target table?
then that is your history maintaining column.
you can ofcourse used CDC stage here
DSRajesh
Premium Member
Premium Member
Posts: 297
Joined: Mon Feb 05, 2007 10:37 pm

Post by DSRajesh »

You can use CDC with switch to do different types of update actions.
RD
Post Reply