Implementing SCD in server job

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
allavivek
Premium Member
Premium Member
Posts: 211
Joined: Sat May 01, 2010 5:07 pm

Implementing SCD in server job

Post by allavivek »

Hi All,

Iam trying to implement SCD2 in server job...

Code: Select all

ODBC(Dim table)------->Transformer-------->SEQ(capturingupdates)
                             |
                             |
                        hash file
                             |
                             |
          ODBC(source / reference)
If any record got updated from reference , a new record of same key is created with updated values which is basically SCD2..

My doubt is in usage of hash file my duplicate key records are getting overwritten...so that during lookup iam getting wrong results...

Q.Is there any option in hash file stage to disable overwriting of records or is there any other way....

Q.Is there any other way to implement SCD2 in server jobs...

Thanks All...
Last edited by allavivek on Tue Jul 13, 2010 1:32 am, edited 3 times in total.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Wrap your design in Code tags so we can see what connects to what.

Without that there's not much sense in trying to essay an answer.

The usual solution for SCD2 in server jobs involves two Hashed File stages referring to the same hashed file.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
allavivek
Premium Member
Premium Member
Posts: 211
Joined: Sat May 01, 2010 5:07 pm

Post by allavivek »

Hi Ray,

Thanks for reply,

I edited my job ...give me suggestion in solving...

thank you
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Make sure that caching is disabled in both stages, and that the writing Hashed File stage is set to "lock for update".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
allavivek
Premium Member
Premium Member
Posts: 211
Joined: Sat May 01, 2010 5:07 pm

Post by allavivek »

Hi Ray,

I tried using what you've suggested me..but not getting the result..

What i understood is "preload file to memory" , if it is set the lookup is done from memory and if record id not found lookup is done in file...

By doing this if iam getting a record which is updated twice before i will not get expected result...i will get old values instead of new...

correct me if iam wrong...

thank you
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Do not pre-load to memory. You want to be able to write to the hashed file as the job runs. Likewise to not enable write cache. You want to be able to ensure that a record written is immediately available when the next row is processed.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Post by vivekgadwal »

ray.wurlod wrote:Make sure that caching is disabled in both stages, and that the writing Hashed File stage is set to "lock for update".
That is a good suggestion. I also think that the problem is with the data or with the fetching of the data. He seems to be having multiple rows with the same Hashing key, which will get overwritten in the Hashed file.

OP - Why cannot you do a direct lookup to the database?
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

They could as long as they are willing to commit every record as it is written to the target database and take the performance hit on the lookup as well. Much better to stick with the hashed file lookup and set it up properly, it really is simple - especially after you've done it once. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Post by vivekgadwal »

chulett wrote:They could as long as they are willing to commit every record as it is written to the target database and take the performance hit on the lookup as well. Much better to stick with the hashed file lookup and set it up properly, it really is simple - especially after you've done it once. :wink:
True. Hashed file lookups are the way to go.

However, the OP's job design shows that he is capturing the updates in a Sequential file. That means, he is not too worried about the updates happening immediately. Hence my suggestion to go directly to the database source. :)

Also, another contributing factor for my suggestion is w.r.t the problem he is having. He is overwriting rows because of the same key as he is placing them in a Hashed file. As far as I know, there is no way to override this in a Hashed file.
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

It's hard to tell exactly what is going on with the OP's job as what is posted doesn't make much sense, at least to me. However, in general, it is quite straight-forward to do Type2 SCDs in a Server job and the proper way to do that is what Ray posted - the details first get populated into a reference hashed file, not just keys but all appropriate elements. You either don't cache the lookup or if you do, you make sure you have it 'Locked for updates'. If you get a hashed file hit and determine (in the Transformer via stage variables) that nothing has changed, you do nothing. Hashed file miss? Insert. Hashed file hit and change? Same Insert plus an Update. In either case you write the entire set of new data back to the reference hashed file so it is available for the next input row.

Yes, there is no way to stop a hashed file write from doing a destructive overwrite for a given set of key fields, that's just how they work. However, in this case we want it to do that so we always have a record of the most current active record in the target.
-craig

"You can never have too many knives" -- Logan Nine Fingers
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Post by vivekgadwal »

chulett wrote: Yes, there is no way to stop a hashed file write from doing a destructive overwrite for a given set of key fields, that's just how they work. However, in this case we want it to do that so we always have a record of the most current active record in the target.
@OP - Craig has given you a very detailed explanation of how to do this within the Server job. It doesn't get any better than this. You could accomplish the whole thing within a job. It is not very clear why you are capturing only updates within a Sequential file stage and mentioned that you are implementing SCD2 within this job. If it is something else you are trying to implement, have a more detailed post on it so that we can help accordingly. :)
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
Post Reply