Lookup problem with a multi-instance job - Strange !

kduke · Post by **kduke** » Mon Feb 07, 2005 4:01 pm

I think your key to the hash file needs to be just the first field not the first 3 fields.

Anjan Roy · Post by **Anjan Roy** » Mon Feb 07, 2005 4:09 pm

kduke wrote:I think your key to the hash file needs to be just the first field not the first 3 fields.

I need to determine if the record exists for the given security, start date and type and based on that I need to either insert or update the record. That is the reason I have the 3 fields as keys.

kduke · Post by **kduke** » Mon Feb 07, 2005 4:33 pm

I do not get it. It looks like it is working perfect then. What is it that we do not understand.

Anjan Roy · Post by **Anjan Roy** » Mon Feb 07, 2005 4:46 pm

kduke wrote:I do not get it. It looks like it is working perfect then. What is it that we do not understand.

I explained in the initial post.

I am having a strange problem with one particular security. There are 8 records for this security in my main input file. After split, all 8 records are falling into the same input file. The lookup is not finding the record in the hash file and sending it as an insert instead of an update. However, when I trim my main input file and keep only this security, all the 8 records are scattered across 8 different files, the lookup is finding the record in the hash file and processing it as an update.

Scenario One -
Before Split -Initial File = 1.6M rows.
After Split -8 Files of 200K each.
All 8 records belonging to security 101 are placed in ONE split file.
Each split file is processed by a separate job thread.
In this case all the 8 records for security 101 are processed by the SAME job thread.
Lookup DOES NOT find the record with 101+ Date = 1/1/2001 00:00:00 + Type = Maturity.

Scenario Two -
Before Split - Initial File = 8 Rows (Only rows belonging to security 101)
After Split - 8 files of 1 record each.
All 8 records belonging to security 101 are placed in DIFFERENT split files.
Each split file is processed by a separate job thread.
In this case all the 8 records for security 101 are processed by the DIFFERENT job thread.
Lookup DOES find the record with 101+ Date = 1/1/2001 00:00:00 + Type = Maturity.

kduke · Post by **kduke** » Mon Feb 07, 2005 4:51 pm

When you insert a record do you also update your hash file?

Ken is correct cache needs to off. All these need to be in one split file as well.

Anjan Roy · Post by **Anjan Roy** » Mon Feb 07, 2005 4:53 pm

kduke wrote:When you insert a record do you also update your hash file?

Ken is correct cache needs to off. All these need to be in one split file as well.

I don't update the hash file in the job. It is only read. The hash file is created by another job that runs before this job.

kduke · Post by **kduke** » Mon Feb 07, 2005 4:56 pm

There is your problem. You need to reflect in the hash file all the keys in the target otherwise your insert needs to be an update.

Anjan Roy · Post by **Anjan Roy** » Mon Feb 07, 2005 5:06 pm

kduke wrote:There is your problem. You need to reflect in the hash file all the keys in the target otherwise your insert needs to be an update.

Kim - I don't need to update the hash file. My objective is to find the action (insert or update) for the incoming record based on the records already existing in the target.

For example, if there is already a record with security number = 101, Date = 1/1/2001 00:00:00 and type= MATURITY, I need not insert it... I would update the existing record in the database.

kduke · Post by **kduke** » Mon Feb 07, 2005 5:38 pm

What if you inserted 2 records before? It is not in the hash file. You job says it is an insert. Really it is an update.

Anjan Roy · Post by **Anjan Roy** » Mon Feb 07, 2005 5:40 pm

kduke wrote:What if you inserted 2 records before? It is not in the hash file. You job says it is an insert. Really it is an update.

I will ALWAYS be getting a single record for a given security number + start date + type combination.

kduke · Post by **kduke** » Mon Feb 07, 2005 5:51 pm

How can you update anything then if you only have one of these?

Anjan Roy · Post by **Anjan Roy** » Mon Feb 07, 2005 5:53 pm

kduke wrote:How can you update anything then if you only have one of these?

If there is a change to an existing record (a record that was written during a previous day's batch).

The hash file is created by a job that reads from the database and populate the hash file. This job runs at the beginning of the batch before any of the other jobs start.

kcbland · Post by **kcbland** » Mon Feb 07, 2005 9:13 pm

Verify the path to the hash file. If you're using the project as the default, do a sanity check and View data. Sometimes the hash file being written to is not the same one being referenced. Also double-check the filename is still the same. DataStage Server has this "feature" of updating the hash file name sometimes when you rename the link connected to the stage. Could you be running the job differently under the multi-instance mode (intelligent job control), maybe using a different path to the hash file?

You've made me a believer that you know what you're doing, it's just something pesky somewhere. It seems you've constructed things correctly.

Anjan Roy · Post by **Anjan Roy** » Tue Feb 08, 2005 4:51 pm

kcbland wrote:Verify the path to the hash file. If you're using the project as the default, do a sanity check and View data. Sometimes the hash file being written to is not the same one being referenced. Also double-check the filename is still the same. DataStage Server has this "feature" of updating the hash file name sometimes when you rename the link connected to the stage. Could you be running the job differently under the multi-instance mode (intelligent job control), maybe using a different path to the hash file?

You've made me a believer that you know what you're doing, it's just something pesky somewhere. It seems you've constructed things correctly.

I verified the path and data. It all looks correct. Maybe I am missing something here. Seems like I will have to take it as one of those datastage 'don't-know-why-it-happens' "features"...

kcbland · Post by **kcbland** » Tue Feb 08, 2005 9:06 pm

Anjan Roy wrote:Seems like I will have to take it as one of those datastage 'don't-know-why-it-happens' "features"...

NO.

I have trained, taught, and used this product since 1998. I am certified in deploying this product (got the paper to prove it) and served over 4 years with Ascential as a consultant. I have never encountered your error, and after deploying thousands of jobs in 7+ years in mission critical environments I should have.

You need to methodically trace down, from the beginning, this hash file and how it's created and referenced. You may consider going into a separate project, using a reduced dataset, and separate unix work area. You will find it, and you will be angry when you do. I seriously doubt this is a bug.

Please let us know when you do find it, as it will bother some of us (like me obviously) that you may give up and attribute it to "one of those things". If you need help, export the jobs involved and send them to me and I'll take a quick look: Ken@KennethBland.com