slow performance while merging 2 files in merge stage
Moderators: chulett, rschirm, roy
slow performance while merging 2 files in merge stage
I have 2 jobs with exactly the same design:
merge ---> xfm ---> sequential file
One job does about 20k rows/sec but another is only doing 100 rows/sec
I check the log with trace level 1 could not get anything out of it.
Would appreciate if anyone can advise.
merge ---> xfm ---> sequential file
One job does about 20k rows/sec but another is only doing 100 rows/sec
I check the log with trace level 1 could not get anything out of it.
Would appreciate if anyone can advise.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
What's different about the jobs?
Trace level 1 means that every piece of data, as well as being processed, has to be logged to the stage trace file. That can be an immense overhead.
Trace level 1 means that every piece of data, as well as being processed, has to be logged to the stage trace file. That can be an immense overhead.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 197
- Joined: Sun Jul 15, 2007 11:45 pm
- Location: Prague
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Trace level 1 is invoked only under two circumstances.
- (a) stage tracing specifies "capture data"
(b) running in the Debugger
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Re: slow performance while merging 2 files in merge stage
deswhk wrote:I have 2 jobs with exactly the same design:
merge ---> xfm ---> sequential file
One job does about 20k rows/sec but another is only doing 100 rows/sec
I check the log with trace level 1 could not get anything out of it.
Would appreciate if anyone can advise.
To add to the above:-
job 1 merges file A and file B
job 2 merges file C and sequential file from job 1 (output of job 1)
The diffence between the 2 jobs are the mapping keys
File A.keycol - varchar2(144) display 0
File B.keycol - varchar2(15) display 17
left outer join on File A
sequential file from job 1.keycol - varchar2(144) display 0
File C.keycol - varchar2(18) display 0
left outer join on sequential file from job 1
I turned on the trace level 1 only for debugging purposes.
-
- Premium Member
- Posts: 457
- Joined: Tue Sep 25, 2007 4:05 pm
Re: slow performance while merging 2 files in merge stage
How big is the file created as an output of Job1? If it is really big, then there might be a significant time consumption to do a left outer join on sequential file's key columns.deswhk wrote:job 1 merges file A and file Bdeswhk wrote:One job does about 20k rows/sec but another is only doing 100 rows/sec
job 2 merges file C and sequential file from job 1 (output of job 1)
The diffence between the 2 jobs are the mapping keys
File A.keycol - varchar2(144) display 0
File B.keycol - varchar2(15) display 17
left outer join on File A
sequential file from job 1.keycol - varchar2(144) display 0
File C.keycol - varchar2(18) display 0
left outer join on sequential file from job 1
Vivek Gadwal
Experience is what you get when you didn't get what you wanted
Experience is what you get when you didn't get what you wanted
Re: slow performance while merging 2 files in merge stage
both files are about the same size ~2.4million. Would pure inner join be much much faster? thks.
Re: slow performance while merging 2 files in merge stage
both files are about the same size ~2.4million. Would pure inner join be much much faster? thks.
Re: slow performance while merging 2 files in merge stage
job 1:-
file A ~2.4million
file B ~1.6million
job 2:-
output from job1 ~2.4 million
file C ~2.1million
from the number of records, they are almost the same. Job 1 flies but job 2 crawls...
file A ~2.4million
file B ~1.6million
job 2:-
output from job1 ~2.4 million
file C ~2.1million
from the number of records, they are almost the same. Job 1 flies but job 2 crawls...
-
- Premium Member
- Posts: 457
- Joined: Tue Sep 25, 2007 4:05 pm
Re: slow performance while merging 2 files in merge stage
Tell me this...is there some other processing that is being performed in Job 2...like sorting etc.? If not, I do not know why one job with merge logic is running faster than its twin (at least with same logic).deswhk wrote:both files are about the same size ~2.4million. Would pure inner join be much much faster? thks.
I have found merge stages to cause some performance issues. So, I had to find some work arounds in those scenarios. Well, if you have to use merge stage and perform a 'left outer join' on the seq. files, please follow through all the required stuff that is given in the manual and do not use the Trace level '1'.
If it really doesn't matter for you what kind of join you are using, and you can use a pure inner join, then do not use the merge stage for Job 2. Instead, what I would propose is, move the output of Job 1 into a Hash file and in Job 2, join on the req. key columns and output them into a seq. file. Something like this:
Code: Select all
Hash_File_from_Job1
|
| Lookup
V
File D --------> XFM----> Output_Seq_File
Vivek Gadwal
Experience is what you get when you didn't get what you wanted
Experience is what you get when you didn't get what you wanted
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
I turned on the trace only for debugging purposes. Else, there is no reason why I should do so as it will take up more resources. Anyway, even without the trace turned on, it is still as slow if not slower.
The job doesn't do any sorting. Anyway, there is no sort option in the merge stage. The transformer stage doesn't do any complex logic either, merely selecting the required columns into the output.
Even with pure inner join, the merge stage is still slow.
One reason why I didn't put it into a hashed file for lookup is the huge number of records at about 2.4 million. Am not sure if this is a good idea of doing a lookup from a hashed file instead of the merge stage.
I will give it a try and see if the performance improves.
The job doesn't do any sorting. Anyway, there is no sort option in the merge stage. The transformer stage doesn't do any complex logic either, merely selecting the required columns into the output.
Even with pure inner join, the merge stage is still slow.
One reason why I didn't put it into a hashed file for lookup is the huge number of records at about 2.4 million. Am not sure if this is a good idea of doing a lookup from a hashed file instead of the merge stage.
I will give it a try and see if the performance improves.
-
- Premium Member
- Posts: 457
- Joined: Tue Sep 25, 2007 4:05 pm
Well...as long is the size of the Hash file doesn't exceed 2.2 GB limit, you are fine using it. After you load the Hash file do an analysis (ANALYZE.FILE) on it and see how big the file is!deswhk wrote: Even with pure inner join, the merge stage is still slow.
One reason why I didn't put it into a hashed file for lookup is the huge number of records at about 2.4 million. Am not sure if this is a good idea of doing a lookup from a hashed file instead of the merge stage.
Vivek Gadwal
Experience is what you get when you didn't get what you wanted
Experience is what you get when you didn't get what you wanted