Improve performance with multiple transformers

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

myukassign
Premium Member
Premium Member
Posts: 238
Joined: Fri Jul 25, 2008 8:55 am

Improve performance with multiple transformers

Post by myukassign »

I would like to improve the performance of my job. If anyone can help me with some ideas should be great. Please suggest me if a change in design or change in stage can give me any better.

My job is like this.

Job1 . Dataset (xmls) --> XML Input stage -> move data of 40 different complex elements to 50 datasets.

Before writing to dataset I do an in line sorting and partition the data with hash and store in the datasets.
(I am happy with this job--finish it in less than 1 minute)

Job2. I read this 50 datasets ----> Transformer(simple transformation rules)---> Push to dataset. If any records meet my criteria of rejection in transformer, it will go to a shared container. That means all 50 transforers reject will collect by a funnel and push to the shared container.Since I partion data and store it in previous job, I am using same partition.

This Job2 is taking almost 15 minutes to complete.


Is there anyway to improve the performance. Even if I run the job without a single xml , still it is taking the same time which tells me...the problem is not becuase of the huge data load. Its the design that creating problem..


Any help.....
ETLJOB
Participant
Posts: 87
Joined: Thu May 01, 2008 1:15 pm
Location: INDIA

Re: Improve performance with multiple transformers

Post by ETLJOB »

myukassign wrote: Job2. I read this 50 datasets ----> Transformer(simple transformation rules)---> Push to dataset.

I guess the problem is with the read. How you read these datasets?
Is there anything available in datasets, something similar to "file pattern" option in sequential file? If yes, are you making use of it?

Also, can you look in the job monitor to find out which stage is taking more time?
myukassign
Premium Member
Premium Member
Posts: 238
Joined: Fri Jul 25, 2008 8:55 am

Re: Improve performance with multiple transformers

Post by myukassign »

NO File patters is thr in dataset so I did not use that.

Its a normal read.
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Anything else you can tell us about the job and the source file? How many rows ends up taking the 15 minutes?

Also, what happens if you just have "Source File ---- Transformer --- (some target), and then a Constraint in the Transformer that is impossible (such as 1=0)....or you could also just have the source file go into a dummy Copy Stage......how fast does that read the source data?

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
myukassign
Premium Member
Premium Member
Posts: 238
Joined: Fri Jul 25, 2008 8:55 am

Post by myukassign »

eostic wrote:Anything else you can tell us about the job and the source file? How many rows ends up taking the 15 minutes?

Also, what happens if you just have "Source File ---- Transformer --- (some target), and then a Constraint in the Transformer that is impossible (such as 1=0)....or you could also just have the source file go into a dummy Copy Stage......how fast does that read the source data?

Ernie
At my first post I mentioned that, even if I run the job wihtout even a single record it will take almost the same time... so it has nothing to do with the data....I think this 50 transformers is killing my peace...
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Now it's 50 transformers rather than "datasets"? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Is operator combination enabled or disabled?

Use Monitor or Performance Analyzerto report on the CPU consumption. This will provide guidance about which sets of stages could benefit from disabling operator combination.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
myukassign
Premium Member
Premium Member
Posts: 238
Joined: Fri Jul 25, 2008 8:55 am

Post by myukassign »

ray.wurlod wrote:Is operator combination enabled or disabled?

Use Monitor or Performance Analyzerto report on the CPU consumption. This will provide guidance about which sets of stages could benefit from disabling operator combination.
When I start the perfomance analyzer it giveme warning window "No perfomance data avaialbe"


How to use this...
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It's all in the manual. You have to capture the perfomance data when you run the job, then the Performance Analyzer reports on the captured statistics.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
myukassign
Premium Member
Premium Member
Posts: 238
Joined: Fri Jul 25, 2008 8:55 am

Post by myukassign »

ray.wurlod wrote:It's all in the manual. You have to capture the perfomance data when you run the job, then the Performance Analyzer reports on the captured statistics.
I tried with different combination of enabling and disabling operator but not much improvement in perfomance.

1. As I said before....The layout of my job is like this

DS -----------> Transformer --------------> DS
DS -----------> Transformer --------------> DS
DS -----------> Transformer --------------> DS

Sometimes in a job I have 50 such DS-->TRNS--->DS.

When I reduce the number of transformer and breaking in to multiple jobs the job is giving me some what better perfomance.

What should be the best approch? Is there anything else I should try other than operation combination enabling/diabling ?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

myukassign wrote:1. As I said before....The layout of my job is like this

DS -----------> Transformer --------------> DS
DS -----------> Transformer --------------> DS
DS -----------> Transformer --------------> DS

Sometimes in a job I have 50 such DS-->TRNS--->DS.
:shock:

No, this is the first time you've fully described your job layout in such a manner as to make it clear to us.
myukassign also wrote:When I reduce the number of transformer and breaking in to multiple jobs the job is giving me some what better perfomance.
I would certainly hope so. What an... interesting... approach. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

My bet is that this is simply an initialization issue. You are saying that it takes that long even if the source file is completely empty? Sorry I missed that point earlier.

There are a ton of processes being started here. Hopefully you are using a single node config file, but even if not, you seem to be doing your own level of parallelism on top of the pipeline parallelism that is inherent in the platform. Look at the OS, when this thing is finally running, you probably have a LOT of osh processes running.

I'd like to know more about why you need the multiple levels of "designer based" parallelism...what are you trying to accomplish.....

...and then further, consider using Server for your solution. Even if you still end up having the "designer based" parallelism, you will end up with vastly fewer processes and it will probably start up much sooner.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Thanks Ernie. Wanted to convey that when I posted, but wasn't quite sure how to articulate it properly. Too dang early in the morning. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
myukassign
Premium Member
Premium Member
Posts: 238
Joined: Fri Jul 25, 2008 8:55 am

Post by myukassign »

Oki Let me explain you what I am trying to do....

1. I have an input XML which has almost 100 complex elements. The idea is , source system send me all the related table information in each complex elements of that XML.

2. I need to move each complex element to 100 datasets with some small transformation rules like null chk, add timestamps, load dates etc for each of these and finally load to tables.

So my design is like that.

If I don't do designer based parallisam then imagine, I need to create 100 differnt jobs one for each complex element.

Hope you understood why such design is implemented.

Please suggest me any design flow you see here ...or the server job approch is better in this case?

I hope with this answer I will be able to close this thread...

Thanks a lot for your valuable suggestion
agpt
Participant
Posts: 151
Joined: Sun May 16, 2010 12:53 am

Post by agpt »

are you doing same kind of transformation processing on all 100 elements?

if so, you might want to use one transformer job to do this processing on all the elements first at one go and then may be you can use a switch stage to send the output to different DS based on value of complex elements.
Post Reply