Multiple Jobs updating the same Hash File

jkumar5444 · Post by **jkumar5444** » Mon Apr 07, 2008 1:25 pm

I am trying to clean up some existing jobs. There are 2 feeds coming in from different systems. Each of them has CUSTID as the primary key. There is a hash file (CUST_ID.hsh) that contains the CUSTID along with a generated Surrogate Key (DWH_CUST_ID). The jobs are using SDKSequences to maintain the current surrogate key value.

Each job stream updates the CUST_ID.hsh hash file with new CUSTID and DWH_CUST_ID values. There is a situation where each feed may have a new CUSTID and may try to insert the same DWH_CUST_ID value for two different CUSTIDs. Obviously, the CUST_ID.hsh file will only take one of the rows and reject the other. I want to avoid that situation.

What is the most robust solution to this issue besides scheduling the jobs at different intervals? Thanks!

Jay.

ray.wurlod · Post by **ray.wurlod** » Mon Apr 07, 2008 3:14 pm

Welcome aboard.

The rule for Hashed File stage is "destructive overwrite" - last one in wins.

Therefore your "obviously" statement is incorrect.

But your job logic is dangerous - why not use two separate hashed files?

jkumar5444 · Post by **jkumar5444** » Tue Apr 08, 2008 7:03 am

Hi Ray. Thanks for your response and information regarding how the Hashed File Stage works. I agree with you regarding the fact that the logic is dangerous. That is why I was looking for an alternative that would be cleaner. We can't use two separate Hashed Files because the Hashed File is used as a reference in other job Streams.

I thought about using tables instead of Hashed Files for lookups. I think this would maintain better integrity, but I'm not conviced that its 100% effective. As I mentioned before, the only alternative that I can see is to stagger these job streams so that they don't run at the same time. Please let me know what you think. Thanks.

Jay.

mikegohl · Post by **mikegohl** » Tue Apr 08, 2008 7:17 am

What about letting both jobs use a different range of keys?

chulett · Post by **chulett** » Tue Apr 08, 2008 7:17 am

jkumar5444 wrote:Each job stream updates the CUST_ID.hsh hash file with new CUSTID and DWH_CUST_ID values. There is a situation where each feed may have a new CUSTID and may try to insert the same DWH_CUST_ID value for two different CUSTIDs.

How is this situation where 'the same DWH_CUSTID could be assigned two different CUSTIDs' happening? Will staggering the job stream actually prevent this, or is it just that the destructive overwrite into the hashed file happens in the 'right order' when you do that?

Is there not a way to correct the surrogate handling so that this 'situation' does not occur?

jkumar5444 · Post by **jkumar5444** » Tue Apr 08, 2008 8:10 am

If the jobs are staggered, this situation will not arise because the logic looks up existing keys before assigning new keys. So if the streams are staggered, one stream will insert new surrogate keys. The next stream will find existing keys and new keys if not found in the first stream. This works and I have verified the data.

Using a different range of keys will not help if both streams are running at the same time. This will actually increase the likelyhood of multiple DWH_CUST_IDs values for each CUST_ID.

I appreciate the comments and questions. Please keep them coming. I'm sure I'll find a decent solution from all the great feedback. Thanks.

Jay.

thurmy34 · Post by **thurmy34** » Thu Apr 10, 2008 2:11 am

Hi
What do you with your hashed file after the two jobs are executed ?
What do you think about using a parameter for the name of the hashed file ?
In that case you'll have to execute the job(s) who read it twice.

jkumar5444 · Post by **jkumar5444** » Mon Apr 14, 2008 12:09 pm

Thanks to everyone for their generous input. I have decided to to stagger the job run times. I am marking this topic as resolved. Thank you once again!

Jay.

jkumar5444 · Post by **jkumar5444** » Mon Apr 14, 2008 12:11 pm

Thanks to everyone for their generous input. I have decided to to stagger the job run times. I am marking this topic as resolved. Thank you once again!

Jay.

DSXchange

Multiple Jobs updating the same Hash File

Multiple Jobs updating the same Hash File

Re: Multiple Jobs updating the same Hash File