Hi forum,
I have a job which extracts about 7 million data, performs a look up using a hashed file, does some transformations and loads into two different tables. At the current rate, it takes around 3 hours for completion.
Can anyone advice me as to how to improve the performance.....
Is the look up a over head.....?
Thanks in advance...
Perfomance tuning advice?......
Moderators: chulett, rschirm, roy
Sharad,
that question is impossible to answer with the information given. This is like me asking you "I need to make my car go 120MpH. What should I do?"
7 million rows is not a lot if each row is 10 bytes. If each row is 100Kb then that is a different matter. If you run this on a single 386SX processor machine it is different from an IBM P-Series with 16 CPUs. Is your database an ODBC connection to Excel or a Teradata system? The list of questions can go on and on.
You need to measure your job and discover where the bottleneck is. You have 648 rows per second average speed. If you remove your database output and replace it with flat files, what is your speed? If it remains the same, you have discovered that the output DB writes are not the bottleneck...
that question is impossible to answer with the information given. This is like me asking you "I need to make my car go 120MpH. What should I do?"
7 million rows is not a lot if each row is 10 bytes. If each row is 100Kb then that is a different matter. If you run this on a single 386SX processor machine it is different from an IBM P-Series with 16 CPUs. Is your database an ODBC connection to Excel or a Teradata system? The list of questions can go on and on.
You need to measure your job and discover where the bottleneck is. You have 648 rows per second average speed. If you remove your database output and replace it with flat files, what is your speed? If it remains the same, you have discovered that the output DB writes are not the bottleneck...
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Premium Member
- Posts: 457
- Joined: Tue Sep 25, 2007 4:05 pm
ArndW is right. Keep in mind, Hash files have a limit of 2.2GB (if I remember it correct) and if that limit is exceeded, then we have data flowing into the overflow bucket and performance is impaired! May be, looking into the size of the Hash file could help in fixing the performance. As was mentioned already, using flat files instead of DB stages should reduce the processing time immensely!ArndW wrote:Sharad,
7 million rows is not a lot if each row is 10 bytes. If each row is 100Kb then that is a different matter.
Vivek Gadwal
Experience is what you get when you didn't get what you wanted
Experience is what you get when you didn't get what you wanted
The Hashed File limit is due to internal pointers and can be avoided on large files by declaring them as 64BIT; this 2Gb limitation has nothing to do with the internal buckets and overflow groups, just pointer sizes.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Close, but no cigar. The right time to switch is before the hashed file gets to 2GB. This means that you monitor them, at least the large/growing ones.chulett wrote:Right. Go over the dreaded '2GB Barrier' and performance is dramatically affect because the hashed file will corrupt internally. Then, as noted, it's time to switch to 64bit hashed files.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.