Page 1 of 1

slow performance while merging 2 files in merge stage

Posted: Wed May 21, 2008 11:22 pm
by deswhk
I have 2 jobs with exactly the same design:
merge ---> xfm ---> sequential file

One job does about 20k rows/sec but another is only doing 100 rows/sec
I check the log with trace level 1 could not get anything out of it.
Would appreciate if anyone can advise.

Posted: Wed May 21, 2008 11:24 pm
by ray.wurlod
What's different about the jobs?

Trace level 1 means that every piece of data, as well as being processed, has to be logged to the stage trace file. That can be an immense overhead.

Posted: Wed May 21, 2008 11:27 pm
by DS_FocusGroup
Why does DS do a trace 1 level ? i mean in what circumstances? Can we avoid this ?

thanks

Posted: Wed May 21, 2008 11:36 pm
by ray.wurlod
Trace level 1 is invoked only under two circumstances.
  • (a) stage tracing specifies "capture data"

    (b) running in the Debugger
Avoid it by using neither of these. The default should be trace level 0.

Re: slow performance while merging 2 files in merge stage

Posted: Thu May 22, 2008 2:04 am
by deswhk
deswhk wrote:I have 2 jobs with exactly the same design:
merge ---> xfm ---> sequential file

One job does about 20k rows/sec but another is only doing 100 rows/sec
I check the log with trace level 1 could not get anything out of it.
Would appreciate if anyone can advise.

To add to the above:-
job 1 merges file A and file B
job 2 merges file C and sequential file from job 1 (output of job 1)

The diffence between the 2 jobs are the mapping keys
File A.keycol - varchar2(144) display 0
File B.keycol - varchar2(15) display 17
left outer join on File A

sequential file from job 1.keycol - varchar2(144) display 0
File C.keycol - varchar2(18) display 0
left outer join on sequential file from job 1

I turned on the trace level 1 only for debugging purposes.

Re: slow performance while merging 2 files in merge stage

Posted: Thu May 22, 2008 1:59 pm
by vivekgadwal
deswhk wrote:
deswhk wrote:One job does about 20k rows/sec but another is only doing 100 rows/sec
job 1 merges file A and file B
job 2 merges file C and sequential file from job 1 (output of job 1)

The diffence between the 2 jobs are the mapping keys
File A.keycol - varchar2(144) display 0
File B.keycol - varchar2(15) display 17
left outer join on File A

sequential file from job 1.keycol - varchar2(144) display 0
File C.keycol - varchar2(18) display 0
left outer join on sequential file from job 1
How big is the file created as an output of Job1? If it is really big, then there might be a significant time consumption to do a left outer join on sequential file's key columns.

Re: slow performance while merging 2 files in merge stage

Posted: Fri May 23, 2008 5:06 am
by deswhk
both files are about the same size ~2.4million. Would pure inner join be much much faster? thks.

Re: slow performance while merging 2 files in merge stage

Posted: Fri May 23, 2008 5:08 am
by deswhk
both files are about the same size ~2.4million. Would pure inner join be much much faster? thks.

Re: slow performance while merging 2 files in merge stage

Posted: Fri May 23, 2008 5:13 am
by deswhk
job 1:-
file A ~2.4million
file B ~1.6million

job 2:-
output from job1 ~2.4 million
file C ~2.1million

from the number of records, they are almost the same. Job 1 flies but job 2 crawls...

Re: slow performance while merging 2 files in merge stage

Posted: Fri May 23, 2008 8:49 am
by vivekgadwal
deswhk wrote:both files are about the same size ~2.4million. Would pure inner join be much much faster? thks.
Tell me this...is there some other processing that is being performed in Job 2...like sorting etc.? If not, I do not know why one job with merge logic is running faster than its twin (at least with same logic).

I have found merge stages to cause some performance issues. So, I had to find some work arounds in those scenarios. Well, if you have to use merge stage and perform a 'left outer join' on the seq. files, please follow through all the required stuff that is given in the manual and do not use the Trace level '1'.

If it really doesn't matter for you what kind of join you are using, and you can use a pure inner join, then do not use the merge stage for Job 2. Instead, what I would propose is, move the output of Job 1 into a Hash file and in Job 2, join on the req. key columns and output them into a seq. file. Something like this:

Code: Select all

       Hash_File_from_Job1
                  |
                  | Lookup
                  V
File D --------> XFM----> Output_Seq_File

Posted: Fri May 23, 2008 1:49 pm
by DSguru2B
....or force load them into temp tables with proper indexes on the joining keys. Then you can perform any join with results in decent amount of time.

Posted: Fri May 23, 2008 3:37 pm
by ray.wurlod
Are you tracing a stage or debugging in the slow job and not in the fast job?

Posted: Sun May 25, 2008 11:56 pm
by deswhk
I turned on the trace only for debugging purposes. Else, there is no reason why I should do so as it will take up more resources. Anyway, even without the trace turned on, it is still as slow if not slower.

The job doesn't do any sorting. Anyway, there is no sort option in the merge stage. The transformer stage doesn't do any complex logic either, merely selecting the required columns into the output.
Even with pure inner join, the merge stage is still slow.
One reason why I didn't put it into a hashed file for lookup is the huge number of records at about 2.4 million. Am not sure if this is a good idea of doing a lookup from a hashed file instead of the merge stage.
I will give it a try and see if the performance improves.

Posted: Mon May 26, 2008 5:23 pm
by vivekgadwal
deswhk wrote: Even with pure inner join, the merge stage is still slow.
One reason why I didn't put it into a hashed file for lookup is the huge number of records at about 2.4 million. Am not sure if this is a good idea of doing a lookup from a hashed file instead of the merge stage.
Well...as long is the size of the Hash file doesn't exceed 2.2 GB limit, you are fine using it. After you load the Hash file do an analysis (ANALYZE.FILE) on it and see how big the file is!