Suggestions needed to improve the performance(Hashedfile)

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

raoraghunandan
Charter Member
Charter Member
Posts: 19
Joined: Sun Jul 20, 2003 4:29 am

Post by raoraghunandan »

kris007 wrote:
Kris, that's basically what Ken explained in his post.
Hey Craig, I am doing exactly what Ken has suggested. But Raghu seems to come up with a different idea. Was just trying to findout what he is suggesting, clearly so that it would help. It wouldn't hurt to know if there is another way..right :wink:

Kris
Here is what I was thinking:

1) You have 80 million records in Target with a timestamp.

2) You load these 80 million keys into a hash file (this is a problem process and takes lot of time according to you).

3)You get 1 million incremental records...you look up the hash file and you are either updating or inserting based on the availability of the keys. This is not a problem process according to you. So now you have (say) 80.6 million records in target , assuming .4 million records were updates.

4) Now try to insert the 1 million records into your hash file ie. do not clear the hash file before you insert, based on your timestamp. Your hash file will now have 80.6 million records too. This time this job will be quick as the no. of records is less.

Repeat step 3) and step 4) for your incremental loads..

Wouldn't this work ? Have I missed out something ?

However, this design is not complete and still has other problems like what happens when you want to do a full load ? You might have to handle it through other jobs.

Thanks,
Raghu
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Kris isn't processing 80 million records, that's what's in the target.
-craig

"You can never have too many knives" -- Logan Nine Fingers
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

I understand that Craig. But he needs to build that hash file with the target for the lkup. Right? Or am on the wrong track...?
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

The key here is to not load all 80 million target records into the hashed used for lookup - there's no need for that. That's what Ken explained - a method to only load the natural keys from the 80 million target records that match up with the staged incremental data. That way your lookup has the minimum amount of information needed for the current run.
-craig

"You can never have too many knives" -- Logan Nine Fingers
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Even in that case. When he is building the hashed file for the first time, if the natural key count is in millions, with multiple cpu's at our service, he will def. see a performance boost. And for the daily runs he can maintain a seperate job for the incremental load as raoraghunandan suggested.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
raoraghunandan
Charter Member
Charter Member
Posts: 19
Joined: Sun Jul 20, 2003 4:29 am

Post by raoraghunandan »

chulett wrote:The key here is to not load all 80 million target records into the hashed used for lookup - there's no need for that. That's what Ken explained - a method to only load the natural keys from the 80 million target records that match up with the staged incremental data. That way your lookup has the minimum amount of information needed for the current run.
I am only suggesting to process the hash file load incrementally. While this may be a storage issue, my question regarding Ken's suggestion is that would not joining a 80 mill table with another big table, be slow in itself?

-Raghu
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

I am quite sure that we are never going to do a full load anytime now and in near future. So, its all a matter of doing the daily incremental loading. And, also when I load the hashed file with more than 80 million keys..it has to be a 64bit hashed file( since its gonna hit 2.2GB) which again I think is going to affect my performance. At this time, given my situation, I feel that Ken's solution is the most approachable one, since it just involves an extra step of staging the delta data which is also going to be way faster than my previously designed job.
And for the daily runs he can maintain a seperate job for the incremental load as raoraghunandan suggested.
I am under the impression that daily load and incremental loading are same if we are doing it with respect to the timestamp inthe target table.Please correct me if I am wrong.

Kris
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Thats exactly what i meant Kris. I am more used to the term Daily over here at my client as our incremental loads are of Daily frequency.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
raoraghunandan
Charter Member
Charter Member
Posts: 19
Joined: Sun Jul 20, 2003 4:29 am

Post by raoraghunandan »

kris007 wrote:I am quite sure that we are never going to do a full load anytime now and in near future. So, its all a matter of doing the daily incremental loading. And, also when I load the hashed file with more than 80 million keys..it has to be a 64bit hashed file( since its gonna hit 2.2GB) which again I think is going to affect my performance. At this time, given my situation, I feel that Ken's solution is the most approachable one, since it just involves an extra step of staging the delta data which is also going to be way faster than my previously designed job.
And for the daily runs he can maintain a seperate job for the incremental load as raoraghunandan suggested.
I am under the impression that daily load and incremental loading are same if we are doing it with respect to the timestamp inthe target table.Please correct me if I am wrong.

Kris
It depends if your timestamp goes on to hours and minutes. All the best and let us know your results.

-Raghu
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

raoraghunandan wrote:my question regarding Ken's suggestion is that would not joining a 80 mill table with another big table, be slow in itself?
Not if it's done properly. :wink:

We do this all the time, it's a matter of ensuring the query is as optimal as possible. If it takes hints or an additional index, so be it. The net savings over loading everything into a gi-huge-ic hashed file can be substantial.
-craig

"You can never have too many knives" -- Logan Nine Fingers
thurmy34
Premium Member
Premium Member
Posts: 198
Joined: Fri Mar 31, 2006 8:27 am
Location: Paris

Post by thurmy34 »

Hi all
This topic is great because we all have this king of problem.
What about the Replace existing rows completely.(I have this option in the DB2 plugging).
You don't have to do a lookup but i don't know the impact on the Database.
Hope This Helps
Regards
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

That 'update action' first deletes any record and then inserts a new one in its place. If you are willing to burn two DML statements for every record when one would do, I suppose it could be used in that manner. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply