hash file issue

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
xli
Charter Member
Charter Member
Posts: 74
Joined: Fri May 09, 2003 12:31 am

hash file issue

Post by xli »

Hi, Folks

I have hash file issue need your advice :

I have a daily running job populate table t1 and t2 with the same 4 millions records. All data will be kept for 3 month, hence the estimated number of records for these tables will grow up to about 240 millions.

t2 is used as lookup to check the source date duplicate. I am really worried about the performance issue, I had never handle such a big table as lookup. I wonder if I am supposed to use oracle table as lookup or, create a hash table (what a big hash table!). or, is there any other way may help improve the lookup performance ?

Thanks in advance

Regards

xiong
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

xiong

Don't do it. Do the lookup an Oracle plugin. The trade off is in the time it takes to build the hash file versus how much time you save doing the lookup. Do it both ways and let us know.

Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com
jseclen
Participant
Posts: 133
Joined: Wed Mar 05, 2003 4:19 pm
Location: Lima - Peru. Sudamerica
Contact:

Post by jseclen »

Hi Xiong,

You must considerate the follow items:

1) The lookup table must to have a primary key (or index) with the statistics updated to improve performance lookup.
2) Don't use hashed files, is very hard to build a big file .. for the numbers of records will take many hours to create...
3) use the fields that belong of the index of the lookup table.

I hope this help.

Miguel Seclen
Lima - Peru
JDionne
Participant
Posts: 342
Joined: Wed Aug 27, 2003 1:06 pm

Post by JDionne »

I may be out of my leage here but...why are you using a dupication of the info as a look up. why not just create a lookup of just the keys. it will be much faster to look up than looking up against the full record. give that a thought
Jim


quote:Originally posted by jseclen
[br]Hi Xiong,

You must considerate the follow items:

1) The lookup table must to have a primary key (or index) with the statistics updated to improve performance lookup.
2) Don't use hashed files, is very hard to build a big file .. for the numbers of records will take many hours to create...
3) use the fields that belong of the index of the lookup table.

I hope this help.

Miguel Seclen
Lima - Peru
Post Reply