Perfomance tuning advice?......

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
SHARAD123
Premium Member
Premium Member
Posts: 54
Joined: Wed Jan 09, 2008 12:05 am

Perfomance tuning advice?......

Post by SHARAD123 »

Hi forum,
I have a job which extracts about 7 million data, performs a look up using a hashed file, does some transformations and loads into two different tables. At the current rate, it takes around 3 hours for completion.
Can anyone advice me as to how to improve the performance.....
Is the look up a over head.....?

Thanks in advance...
222102
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Sharad,

that question is impossible to answer with the information given. This is like me asking you "I need to make my car go 120MpH. What should I do?"

7 million rows is not a lot if each row is 10 bytes. If each row is 100Kb then that is a different matter. If you run this on a single 386SX processor machine it is different from an IBM P-Series with 16 CPUs. Is your database an ODBC connection to Excel or a Teradata system? The list of questions can go on and on.

You need to measure your job and discover where the bottleneck is. You have 648 rows per second average speed. If you remove your database output and replace it with flat files, what is your speed? If it remains the same, you have discovered that the output DB writes are not the bottleneck...
nivas
Participant
Posts: 117
Joined: Sun Mar 21, 2004 4:40 pm

Post by nivas »

performance will be effected by many reasons.

you may try by developing one job for each table load instead of one job for multiple tables.
You may try to use efficient plugin for data load, for example multiload if the data base is teradata, bulk load if it is oracle.

thanks
nivas
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Post by vivekgadwal »

ArndW wrote:Sharad,

7 million rows is not a lot if each row is 10 bytes. If each row is 100Kb then that is a different matter.
ArndW is right. Keep in mind, Hash files have a limit of 2.2GB (if I remember it correct) and if that limit is exceeded, then we have data flowing into the overflow bucket and performance is impaired! May be, looking into the size of the Hash file could help in fixing the performance. As was mentioned already, using flat files instead of DB stages should reduce the processing time immensely!
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The Hashed File limit is due to internal pointers and can be avoided on large files by declaring them as 64BIT; this 2Gb limitation has nothing to do with the internal buckets and overflow groups, just pointer sizes.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Right. Go over the dreaded '2GB Barrier' and performance is dramatically affect because the hashed file will corrupt internally. Then, as noted, it's time to switch to 64bit hashed files. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
SHARAD123
Premium Member
Premium Member
Posts: 54
Joined: Wed Jan 09, 2008 12:05 am

Post by SHARAD123 »

Thank you guys.
222102
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

chulett wrote:Right. Go over the dreaded '2GB Barrier' and performance is dramatically affect because the hashed file will corrupt internally. Then, as noted, it's time to switch to 64bit hashed files. :wink:
Close, but no cigar. The right time to switch is before the hashed file gets to 2GB. This means that you monitor them, at least the large/growing ones.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply