slow performance while merging 2 files in merge stage

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
deswhk
Participant
Posts: 29
Joined: Mon Sep 03, 2007 7:45 pm

slow performance while merging 2 files in merge stage

Post by deswhk »

I have 2 jobs with exactly the same design:
merge ---> xfm ---> sequential file

One job does about 20k rows/sec but another is only doing 100 rows/sec
I check the log with trace level 1 could not get anything out of it.
Would appreciate if anyone can advise.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What's different about the jobs?

Trace level 1 means that every piece of data, as well as being processed, has to be logged to the stage trace file. That can be an immense overhead.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DS_FocusGroup
Premium Member
Premium Member
Posts: 197
Joined: Sun Jul 15, 2007 11:45 pm
Location: Prague

Post by DS_FocusGroup »

Why does DS do a trace 1 level ? i mean in what circumstances? Can we avoid this ?

thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Trace level 1 is invoked only under two circumstances.
  • (a) stage tracing specifies "capture data"

    (b) running in the Debugger
Avoid it by using neither of these. The default should be trace level 0.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
deswhk
Participant
Posts: 29
Joined: Mon Sep 03, 2007 7:45 pm

Re: slow performance while merging 2 files in merge stage

Post by deswhk »

deswhk wrote:I have 2 jobs with exactly the same design:
merge ---> xfm ---> sequential file

One job does about 20k rows/sec but another is only doing 100 rows/sec
I check the log with trace level 1 could not get anything out of it.
Would appreciate if anyone can advise.

To add to the above:-
job 1 merges file A and file B
job 2 merges file C and sequential file from job 1 (output of job 1)

The diffence between the 2 jobs are the mapping keys
File A.keycol - varchar2(144) display 0
File B.keycol - varchar2(15) display 17
left outer join on File A

sequential file from job 1.keycol - varchar2(144) display 0
File C.keycol - varchar2(18) display 0
left outer join on sequential file from job 1

I turned on the trace level 1 only for debugging purposes.
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Re: slow performance while merging 2 files in merge stage

Post by vivekgadwal »

deswhk wrote:
deswhk wrote:One job does about 20k rows/sec but another is only doing 100 rows/sec
job 1 merges file A and file B
job 2 merges file C and sequential file from job 1 (output of job 1)

The diffence between the 2 jobs are the mapping keys
File A.keycol - varchar2(144) display 0
File B.keycol - varchar2(15) display 17
left outer join on File A

sequential file from job 1.keycol - varchar2(144) display 0
File C.keycol - varchar2(18) display 0
left outer join on sequential file from job 1
How big is the file created as an output of Job1? If it is really big, then there might be a significant time consumption to do a left outer join on sequential file's key columns.
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
deswhk
Participant
Posts: 29
Joined: Mon Sep 03, 2007 7:45 pm

Re: slow performance while merging 2 files in merge stage

Post by deswhk »

both files are about the same size ~2.4million. Would pure inner join be much much faster? thks.
deswhk
Participant
Posts: 29
Joined: Mon Sep 03, 2007 7:45 pm

Re: slow performance while merging 2 files in merge stage

Post by deswhk »

both files are about the same size ~2.4million. Would pure inner join be much much faster? thks.
deswhk
Participant
Posts: 29
Joined: Mon Sep 03, 2007 7:45 pm

Re: slow performance while merging 2 files in merge stage

Post by deswhk »

job 1:-
file A ~2.4million
file B ~1.6million

job 2:-
output from job1 ~2.4 million
file C ~2.1million

from the number of records, they are almost the same. Job 1 flies but job 2 crawls...
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Re: slow performance while merging 2 files in merge stage

Post by vivekgadwal »

deswhk wrote:both files are about the same size ~2.4million. Would pure inner join be much much faster? thks.
Tell me this...is there some other processing that is being performed in Job 2...like sorting etc.? If not, I do not know why one job with merge logic is running faster than its twin (at least with same logic).

I have found merge stages to cause some performance issues. So, I had to find some work arounds in those scenarios. Well, if you have to use merge stage and perform a 'left outer join' on the seq. files, please follow through all the required stuff that is given in the manual and do not use the Trace level '1'.

If it really doesn't matter for you what kind of join you are using, and you can use a pure inner join, then do not use the merge stage for Job 2. Instead, what I would propose is, move the output of Job 1 into a Hash file and in Job 2, join on the req. key columns and output them into a seq. file. Something like this:

Code: Select all

       Hash_File_from_Job1
                  |
                  | Lookup
                  V
File D --------> XFM----> Output_Seq_File
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

....or force load them into temp tables with proper indexes on the joining keys. Then you can perform any join with results in decent amount of time.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Are you tracing a stage or debugging in the slow job and not in the fast job?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
deswhk
Participant
Posts: 29
Joined: Mon Sep 03, 2007 7:45 pm

Post by deswhk »

I turned on the trace only for debugging purposes. Else, there is no reason why I should do so as it will take up more resources. Anyway, even without the trace turned on, it is still as slow if not slower.

The job doesn't do any sorting. Anyway, there is no sort option in the merge stage. The transformer stage doesn't do any complex logic either, merely selecting the required columns into the output.
Even with pure inner join, the merge stage is still slow.
One reason why I didn't put it into a hashed file for lookup is the huge number of records at about 2.4 million. Am not sure if this is a good idea of doing a lookup from a hashed file instead of the merge stage.
I will give it a try and see if the performance improves.
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Post by vivekgadwal »

deswhk wrote: Even with pure inner join, the merge stage is still slow.
One reason why I didn't put it into a hashed file for lookup is the huge number of records at about 2.4 million. Am not sure if this is a good idea of doing a lookup from a hashed file instead of the merge stage.
Well...as long is the size of the Hash file doesn't exceed 2.2 GB limit, you are fine using it. After you load the Hash file do an analysis (ANALYZE.FILE) on it and see how big the file is!
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
Post Reply