Perfomance tuning advice?......

SHARAD123 · Post by **SHARAD123** » Thu Jun 05, 2008 3:06 am

Hi forum,
I have a job which extracts about 7 million data, performs a look up using a hashed file, does some transformations and loads into two different tables. At the current rate, it takes around 3 hours for completion.
Can anyone advice me as to how to improve the performance.....
Is the look up a over head.....?

Thanks in advance...

ArndW · Post by **ArndW** » Thu Jun 05, 2008 4:04 am

Sharad,

that question is impossible to answer with the information given. This is like me asking you "I need to make my car go 120MpH. What should I do?"

7 million rows is not a lot if each row is 10 bytes. If each row is 100Kb then that is a different matter. If you run this on a single 386SX processor machine it is different from an IBM P-Series with 16 CPUs. Is your database an ODBC connection to Excel or a Teradata system? The list of questions can go on and on.

You need to measure your job and discover where the bottleneck is. You have 648 rows per second average speed. If you remove your database output and replace it with flat files, what is your speed? If it remains the same, you have discovered that the output DB writes are not the bottleneck...

nivas · Post by **nivas** » Thu Jun 05, 2008 7:20 am

performance will be effected by many reasons.

you may try by developing one job for each table load instead of one job for multiple tables.
You may try to use efficient plugin for data load, for example multiload if the data base is teradata, bulk load if it is oracle.

thanks
nivas

vivekgadwal · Post by **vivekgadwal** » Thu Jun 05, 2008 11:47 pm

ArndW wrote:Sharad,

7 million rows is not a lot if each row is 10 bytes. If each row is 100Kb then that is a different matter.

ArndW is right. Keep in mind, Hash files have a limit of 2.2GB (if I remember it correct) and if that limit is exceeded, then we have data flowing into the overflow bucket and performance is impaired! May be, looking into the size of the Hash file could help in fixing the performance. As was mentioned already, using flat files instead of DB stages should reduce the processing time immensely!

ArndW · Post by **ArndW** » Fri Jun 06, 2008 12:33 am

The Hashed File limit is due to internal pointers and can be avoided on large files by declaring them as 64BIT; this 2Gb limitation has nothing to do with the internal buckets and overflow groups, just pointer sizes.

chulett · Post by **chulett** » Fri Jun 06, 2008 6:38 am

Right. Go over the dreaded '2GB Barrier' and performance is dramatically affect because the hashed file will corrupt internally. Then, as noted, it's time to switch to 64bit hashed files.

SHARAD123 · Post by **SHARAD123** » Fri Jun 06, 2008 7:21 am

Thank you guys.

ray.wurlod · Post by **ray.wurlod** » Sun Jun 08, 2008 11:20 pm

chulett wrote:Right. Go over the dreaded '2GB Barrier' and performance is dramatically affect because the hashed file will corrupt internally. Then, as noted, it's time to switch to 64bit hashed files.

Close, but no cigar. The right time to switch is before the hashed file gets to 2GB. This means that you monitor them, at least the large/growing ones.