Lookup problem with a multi-instance job - Strange !
Moderators: chulett, rschirm, roy
I explained in the initial post.kduke wrote:I do not get it. It looks like it is working perfect then. What is it that we do not understand.
I am having a strange problem with one particular security. There are 8 records for this security in my main input file. After split, all 8 records are falling into the same input file. The lookup is not finding the record in the hash file and sending it as an insert instead of an update. However, when I trim my main input file and keep only this security, all the 8 records are scattered across 8 different files, the lookup is finding the record in the hash file and processing it as an update.
Scenario One -
Before Split -Initial File = 1.6M rows.
After Split -8 Files of 200K each.
All 8 records belonging to security 101 are placed in ONE split file.
Each split file is processed by a separate job thread.
In this case all the 8 records for security 101 are processed by the SAME job thread.
Lookup DOES NOT find the record with 101+ Date = 1/1/2001 00:00:00 + Type = Maturity.
Scenario Two -
Before Split - Initial File = 8 Rows (Only rows belonging to security 101)
After Split - 8 files of 1 record each.
All 8 records belonging to security 101 are placed in DIFFERENT split files.
Each split file is processed by a separate job thread.
In this case all the 8 records for security 101 are processed by the DIFFERENT job thread.
Lookup DOES find the record with 101+ Date = 1/1/2001 00:00:00 + Type = Maturity.
Kim - I don't need to update the hash file. My objective is to find the action (insert or update) for the incoming record based on the records already existing in the target.kduke wrote:There is your problem. You need to reflect in the hash file all the keys in the target otherwise your insert needs to be an update.
For example, if there is already a record with security number = 101, Date = 1/1/2001 00:00:00 and type= MATURITY, I need not insert it... I would update the existing record in the database.
If there is a change to an existing record (a record that was written during a previous day's batch).kduke wrote:How can you update anything then if you only have one of these?
The hash file is created by a job that reads from the database and populate the hash file. This job runs at the beginning of the batch before any of the other jobs start.
Verify the path to the hash file. If you're using the project as the default, do a sanity check and View data. Sometimes the hash file being written to is not the same one being referenced. Also double-check the filename is still the same. DataStage Server has this "feature" of updating the hash file name sometimes when you rename the link connected to the stage. Could you be running the job differently under the multi-instance mode (intelligent job control), maybe using a different path to the hash file?
You've made me a believer that you know what you're doing, it's just something pesky somewhere. It seems you've constructed things correctly.
You've made me a believer that you know what you're doing, it's just something pesky somewhere. It seems you've constructed things correctly.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
I verified the path and data. It all looks correct. Maybe I am missing something here. Seems like I will have to take it as one of those datastage 'don't-know-why-it-happens' "features"...kcbland wrote:Verify the path to the hash file. If you're using the project as the default, do a sanity check and View data. Sometimes the hash file being written to is not the same one being referenced. Also double-check the filename is still the same. DataStage Server has this "feature" of updating the hash file name sometimes when you rename the link connected to the stage. Could you be running the job differently under the multi-instance mode (intelligent job control), maybe using a different path to the hash file?
You've made me a believer that you know what you're doing, it's just something pesky somewhere. It seems you've constructed things correctly.
![Rolling Eyes :roll:](./images/smilies/icon_rolleyes.gif)
![Rolling Eyes :roll:](./images/smilies/icon_rolleyes.gif)
Anjan Roy wrote:Seems like I will have to take it as one of those datastage 'don't-know-why-it-happens' "features"...![]()
NO.
I have trained, taught, and used this product since 1998. I am certified in deploying this product (got the paper to prove it) and served over 4 years with Ascential as a consultant. I have never encountered your error, and after deploying thousands of jobs in 7+ years in mission critical environments I should have.
You need to methodically trace down, from the beginning, this hash file and how it's created and referenced. You may consider going into a separate project, using a reduced dataset, and separate unix work area. You will find it, and you will be angry when you do. I seriously doubt this is a bug.
Please let us know when you do find it, as it will bother some of us (like me obviously) that you may give up and attribute it to "one of those things". If you need help, export the jobs involved and send them to me and I'll take a quick look: Ken@KennethBland.com
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle