slow performance while merging 2 files in merge stage

deswhk · Post by **deswhk** » Wed May 21, 2008 11:22 pm

I have 2 jobs with exactly the same design:
merge ---> xfm ---> sequential file

One job does about 20k rows/sec but another is only doing 100 rows/sec
I check the log with trace level 1 could not get anything out of it.
Would appreciate if anyone can advise.

ray.wurlod · Post by **ray.wurlod** » Wed May 21, 2008 11:24 pm

What's different about the jobs?

Trace level 1 means that every piece of data, as well as being processed, has to be logged to the stage trace file. That can be an immense overhead.

DS_FocusGroup · Post by **DS_FocusGroup** » Wed May 21, 2008 11:27 pm

Why does DS do a trace 1 level ? i mean in what circumstances? Can we avoid this ?

thanks

ray.wurlod · Post by **ray.wurlod** » Wed May 21, 2008 11:36 pm

Trace level 1 is invoked only under two circumstances.

(a) stage tracing specifies "capture data"

(b) running in the Debugger

Avoid it by using neither of these. The default should be trace level 0.

deswhk · Post by **deswhk** » Thu May 22, 2008 2:04 am

deswhk wrote:I have 2 jobs with exactly the same design:
merge ---> xfm ---> sequential file

One job does about 20k rows/sec but another is only doing 100 rows/sec
I check the log with trace level 1 could not get anything out of it.
Would appreciate if anyone can advise.

To add to the above:-
job 1 merges file A and file B
job 2 merges file C and sequential file from job 1 (output of job 1)

The diffence between the 2 jobs are the mapping keys
File A.keycol - varchar2(144) display 0
File B.keycol - varchar2(15) display 17
left outer join on File A

sequential file from job 1.keycol - varchar2(144) display 0
File C.keycol - varchar2(18) display 0
left outer join on sequential file from job 1

I turned on the trace level 1 only for debugging purposes.

vivekgadwal · Post by **vivekgadwal** » Thu May 22, 2008 1:59 pm

deswhk wrote:
deswhk wrote:One job does about 20k rows/sec but another is only doing 100 rows/sec
job 1 merges file A and file B
job 2 merges file C and sequential file from job 1 (output of job 1)

The diffence between the 2 jobs are the mapping keys
File A.keycol - varchar2(144) display 0
File B.keycol - varchar2(15) display 17
left outer join on File A

sequential file from job 1.keycol - varchar2(144) display 0
File C.keycol - varchar2(18) display 0
left outer join on sequential file from job 1

How big is the file created as an output of Job1? If it is really big, then there might be a significant time consumption to do a left outer join on sequential file's key columns.

deswhk · Post by **deswhk** » Fri May 23, 2008 5:06 am

both files are about the same size ~2.4million. Would pure inner join be much much faster? thks.

deswhk · Post by **deswhk** » Fri May 23, 2008 5:08 am

both files are about the same size ~2.4million. Would pure inner join be much much faster? thks.

deswhk · Post by **deswhk** » Fri May 23, 2008 5:13 am

job 1:-
file A ~2.4million
file B ~1.6million

job 2:-
output from job1 ~2.4 million
file C ~2.1million

from the number of records, they are almost the same. Job 1 flies but job 2 crawls...

vivekgadwal · Post by **vivekgadwal** » Fri May 23, 2008 8:49 am

deswhk wrote:both files are about the same size ~2.4million. Would pure inner join be much much faster? thks.

Tell me this...is there some other processing that is being performed in Job 2...like sorting etc.? If not, I do not know why one job with merge logic is running faster than its twin (at least with same logic).

I have found merge stages to cause some performance issues. So, I had to find some work arounds in those scenarios. Well, if you have to use merge stage and perform a 'left outer join' on the seq. files, please follow through all the required stuff that is given in the manual and do not use the Trace level '1'.

If it really doesn't matter for you what kind of join you are using, and you can use a pure inner join, then do not use the merge stage for Job 2. Instead, what I would propose is, move the output of Job 1 into a Hash file and in Job 2, join on the req. key columns and output them into a seq. file. Something like this:

Code: Select all

       Hash_File_from_Job1
                  |
                  | Lookup
                  V
File D --------> XFM----> Output_Seq_File

DSguru2B · Post by **DSguru2B** » Fri May 23, 2008 1:49 pm

....or force load them into temp tables with proper indexes on the joining keys. Then you can perform any join with results in decent amount of time.

ray.wurlod · Post by **ray.wurlod** » Fri May 23, 2008 3:37 pm

Are you tracing a stage or debugging in the slow job and not in the fast job?

deswhk · Post by **deswhk** » Sun May 25, 2008 11:56 pm

I turned on the trace only for debugging purposes. Else, there is no reason why I should do so as it will take up more resources. Anyway, even without the trace turned on, it is still as slow if not slower.

The job doesn't do any sorting. Anyway, there is no sort option in the merge stage. The transformer stage doesn't do any complex logic either, merely selecting the required columns into the output.
Even with pure inner join, the merge stage is still slow.
One reason why I didn't put it into a hashed file for lookup is the huge number of records at about 2.4 million. Am not sure if this is a good idea of doing a lookup from a hashed file instead of the merge stage.
I will give it a try and see if the performance improves.

vivekgadwal · Post by **vivekgadwal** » Mon May 26, 2008 5:23 pm

deswhk wrote: Even with pure inner join, the merge stage is still slow.
One reason why I didn't put it into a hashed file for lookup is the huge number of records at about 2.4 million. Am not sure if this is a good idea of doing a lookup from a hashed file instead of the merge stage.

Well...as long is the size of the Hash file doesn't exceed 2.2 GB limit, you are fine using it. After you load the Hash file do an analysis (ANALYZE.FILE) on it and see how big the file is!

DSXchange

slow performance while merging 2 files in merge stage

slow performance while merging 2 files in merge stage

Re: slow performance while merging 2 files in merge stage

Re: slow performance while merging 2 files in merge stage

Re: slow performance while merging 2 files in merge stage

Re: slow performance while merging 2 files in merge stage

Re: slow performance while merging 2 files in merge stage

Re: slow performance while merging 2 files in merge stage