Spliting data using transformer vs filter stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
girija
Participant
Posts: 89
Joined: Fri Mar 24, 2006 1:51 pm
Location: Hartford

Spliting data using transformer vs filter stage

Post by girija »

Hi All,

In my job I am spliting the data based on some criterion and by using filter stage.

I received the comment from the reviewer :

"It is actually more efficient to use a transformer because the transformer is compiled and the filter is interpreted and adds more overhead than the transformer".

I don't know this before.

Waiting for your comments regarding this statement.

Thanks
Girija S
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Create a job with a row generator and the column you are using to filter, run it through the filter stage and into a copy stage that has no output. Use enough rows of data so your job runs at least 10 minutes.

Change the job to a transform stage and re-run with the same amount of data.

What results do you get?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The statement is true in version 7.5 and later.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
girija
Participant
Posts: 89
Joined: Fri Mar 24, 2006 1:51 pm
Location: Hartford

Post by girija »

Thanks Ray. I was just waiting for this statement.
You already mentioned earlier that they did lots of change in transformer in 7.5 and onwards.

Thanks again for your comment.

Girija S
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

girija - I just spent 5 minutes testing this at Version 8

Simple RowGen -> Trans/Filter -> Copy Stage.

The Row Generator creates one seeded Random integer column with values of 1 to 100. The Filter and Transform only pass on rows with a value of > 50, i.e. about 50% filtration/constraint rates for 10 Million rows on a 1-node configuration.

Both tests repeated several times on an otherwise unused system
~7-8 Seconds Filter Stage.
~10-11 Seconds Transform Stage.
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

We had an IBM trainer tell us the same thing, that a buildop or transformer would do the filter more efficiently than the filter stage. I am surprised at the results you are seeing, ArndW. That is very interesting...

Did you put a sleep command in your transformer to skew your results? :wink: j/k

Brad
It is not that I am addicted to coffee, it's just that I need it to survive.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Would you like to repeat the tests with slightly more complex conditions, maybe an OR conjunction or a leading-wildcard LIKE condition?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply