Updating records into hashed file

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
aditya
Charter Member
Charter Member
Posts: 41
Joined: Sat May 28, 2005 7:32 am

Updating records into hashed file

Post by aditya »

All,

I have the following job design:

DB2 Stage ----------->IPC-------->SeqFile

SeqFile---------------->Transfomer-------->HashedFile

The DB2 Stage contains a user defined query of joining two tables. 10 million records are extracted and later hashed.

While tyring to do incremental loads into the hashed file the problems encountered are that all the records have to be continued to be fetched using DB2 stage while doing a lookup against the previous generated Hashed File(say ABC) so that the hashed file(ABC) could be refreshed.

While both tables have an update timestamp field defined this would be a problem since an update on one table does'nt necessiate an update on the other and an insert on one table does'nt mean that a new record is also inserted into the other table.

Any suggestions please.

Thanks
Aditya.
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

From what I understand, you are having difficulty in retrieving the Last Updated timestamp for incremental loads since you are looking at two date fields from two different tables? Am I correct? If that is the case, you might want to calculate the maximum of the two dates from the two tables and then use the greatest date among them for incremental loading. If that is not the case, please explain it again as I can't make out anything more than this. Also what I didn't understand was
While tyring to do incremental loads into the hashed file the problems encountered are that all the records have to be continued to be fetched using DB2 stage while doing a lookup against the previous generated Hashed File(say ABC) so that the hashed file(ABC) could be refreshed.
Why would you refresh ABC hashed file? What exactly do you have in that file?
Kris

Where's the "Any" key?-Homer Simpson
aditya
Charter Member
Charter Member
Posts: 41
Joined: Sat May 28, 2005 7:32 am

Post by aditya »

Kris,

Thanks for the reply:

The hashed file ABC which is used for the lookup is also the same hashed file which has to be fed with the incremental loads.

Thanks
Aditya.
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

You didn't answer the first question. Assuming what I have asked is agreed by you, the best way I have found to do an incremental loading is, create a seperate job to get the lastupdated timestamp from your target table or hashedfile and store it in a seperate hashed file. Now, you can pass this date into the User-defined sql in your DB2 stage as a parameter from within a Job Sequence using UtilityHashLookup routine. There are quite a number of posts describing this process and how to use UtilityHashLookup routine. Search should get you them. Now, to get the maximum date from your source data, you need to do what I have explained in the earlier post. I have done this in the past successfully.

IHTH
Kris

Where's the "Any" key?-Homer Simpson
aditya
Charter Member
Charter Member
Posts: 41
Joined: Sat May 28, 2005 7:32 am

Post by aditya »

Kris,

The two tables which are being joined have the update timestamp fields.

When I try to bring records which have been either update or inserted in both those tables and then try to do a full outer join on those records some of the records might have empty values for the fields.

-- Aditya.
loveojha2
Participant
Posts: 362
Joined: Thu May 26, 2005 12:59 am

Post by loveojha2 »

aditya wrote: When I try to bring records which have been either update or inserted in both those tables and then try to do a full outer join on those records some of the records might have empty values for the fields.

-- Aditya.
Can you share with us the structure of the Hashed File (especially the key structure, from which db table they are coming).

The problem is very much similar to something popularly known as Insert/Update Anamolies with your Hashed File design.

How do you plan to handle it? Creating separate Hashed File may be :?

We would need more information to help you on this.
Success consists of getting up just one more time than you fall.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Look at increasing the transaction isolation level in DB2 or find some other means for guaranteeing a consistent snapshot. That done, you can maintain the hashed file in "real time" using a pair of hashed file stages plus "disabled, lock for update" as the read cache setting.

The job design in the original post - is this two separate jobs, or one job containing two independent streams? If the latter, how do you synchronize reading from and writing to the text file?

Can you specify (in English, ignoring the fact that DataStage might be involved) what the desired result is to be? That is, can you create some form of target-from-source mapping document, and post the gist of it here?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply