All,
I have the following job design:
DB2 Stage ----------->IPC-------->SeqFile
SeqFile---------------->Transfomer-------->HashedFile
The DB2 Stage contains a user defined query of joining two tables. 10 million records are extracted and later hashed.
While tyring to do incremental loads into the hashed file the problems encountered are that all the records have to be continued to be fetched using DB2 stage while doing a lookup against the previous generated Hashed File(say ABC) so that the hashed file(ABC) could be refreshed.
While both tables have an update timestamp field defined this would be a problem since an update on one table does'nt necessiate an update on the other and an insert on one table does'nt mean that a new record is also inserted into the other table.
Any suggestions please.
Thanks
Aditya.
Updating records into hashed file
Moderators: chulett, rschirm, roy
From what I understand, you are having difficulty in retrieving the Last Updated timestamp for incremental loads since you are looking at two date fields from two different tables? Am I correct? If that is the case, you might want to calculate the maximum of the two dates from the two tables and then use the greatest date among them for incremental loading. If that is not the case, please explain it again as I can't make out anything more than this. Also what I didn't understand was
Why would you refresh ABC hashed file? What exactly do you have in that file?While tyring to do incremental loads into the hashed file the problems encountered are that all the records have to be continued to be fetched using DB2 stage while doing a lookup against the previous generated Hashed File(say ABC) so that the hashed file(ABC) could be refreshed.
Kris
Where's the "Any" key?-Homer Simpson
Where's the "Any" key?-Homer Simpson
You didn't answer the first question. Assuming what I have asked is agreed by you, the best way I have found to do an incremental loading is, create a seperate job to get the lastupdated timestamp from your target table or hashedfile and store it in a seperate hashed file. Now, you can pass this date into the User-defined sql in your DB2 stage as a parameter from within a Job Sequence using UtilityHashLookup routine. There are quite a number of posts describing this process and how to use UtilityHashLookup routine. Search should get you them. Now, to get the maximum date from your source data, you need to do what I have explained in the earlier post. I have done this in the past successfully.
IHTH
IHTH
Kris
Where's the "Any" key?-Homer Simpson
Where's the "Any" key?-Homer Simpson
Can you share with us the structure of the Hashed File (especially the key structure, from which db table they are coming).aditya wrote: When I try to bring records which have been either update or inserted in both those tables and then try to do a full outer join on those records some of the records might have empty values for the fields.
-- Aditya.
The problem is very much similar to something popularly known as Insert/Update Anamolies with your Hashed File design.
How do you plan to handle it? Creating separate Hashed File may be
![Confused :?](./images/smilies/icon_confused.gif)
We would need more information to help you on this.
Success consists of getting up just one more time than you fall.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Look at increasing the transaction isolation level in DB2 or find some other means for guaranteeing a consistent snapshot. That done, you can maintain the hashed file in "real time" using a pair of hashed file stages plus "disabled, lock for update" as the read cache setting.
The job design in the original post - is this two separate jobs, or one job containing two independent streams? If the latter, how do you synchronize reading from and writing to the text file?
Can you specify (in English, ignoring the fact that DataStage might be involved) what the desired result is to be? That is, can you create some form of target-from-source mapping document, and post the gist of it here?
The job design in the original post - is this two separate jobs, or one job containing two independent streams? If the latter, how do you synchronize reading from and writing to the text file?
Can you specify (in English, ignoring the fact that DataStage might be involved) what the desired result is to be? That is, can you create some form of target-from-source mapping document, and post the gist of it here?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.