See post 'SCD In Parallel Jobs' from shivakumar on Mon Jul 31, 2006 11:58 pm
Anyone want to give another opinion?
![Question :?:](./images/smilies/icon_question.gif)
Here's what ken had to say then (Hi Ken!)
The difficulty in SCD in parallel is the complicated nature of a type 1, 2, and hybrid 1-2. You need to transform a new row and compare it to the current row. You may compare just a subset of all columns. This means that if you have 40 columns, maybe only 35 pertain to looking for a difference. You may even use CRC32 to facilitate this comparison but computing CRC32 values for the column groups in current and new and then do an equivalency test.
In addition, the hybrd 1-2 type means a smaller subset of columns cause an update, and another subset of columns cause an insert of the next variant. In addition, in type 1 handling you potentially update the current row with an effective end date and optionally remove a current indicator flag.
If there is any chance that you may have time ordered data and need to detect a repeating natural key, then you would potentially have a whole family of SCD rows to generate for the same natural key in the correct order. This would require some device to stage/store the rows as you assemble them. PX has no storage mechanism for holding rows and updating them. .ds datasets are write once, you cannot reference a row from it and modify it. About the only method is a custom buildop in this case.
The combination of stages required to handle SCD type logic, as well as deal with all of the various potential inserts and update streams of rows make SCD in a parallel job a daunting task. If you have to deal with repeating natural keys of time ordered data, you're now outside the tool and writing a buildop. For this reason, unless performance requires it, you should consider using a Server job and hashed file stages. You can easily store the current row in a hashed file, transform the new row, reference and compare to the current row, and optionally overwrite the current row with a newer row or insert a new row as well as retire/update the current row. Then, extract from your staging hashed file your inserts and updates and then load to your target table.
_________________
Kenneth Bland
Mike
![Shocked :shock:](./images/smilies/icon_eek.gif)
![Rolling Eyes :roll:](./images/smilies/icon_rolleyes.gif)
![Surprised :o](./images/smilies/icon_surprised.gif)