Page 1 of 1

Spliting data using transformer vs filter stage

Posted: Fri Oct 10, 2008 9:21 am
by girija
Hi All,

In my job I am spliting the data based on some criterion and by using filter stage.

I received the comment from the reviewer :

"It is actually more efficient to use a transformer because the transformer is compiled and the filter is interpreted and adds more overhead than the transformer".

I don't know this before.

Waiting for your comments regarding this statement.

Thanks
Girija S

Posted: Fri Oct 10, 2008 9:25 am
by ArndW
Create a job with a row generator and the column you are using to filter, run it through the filter stage and into a copy stage that has no output. Use enough rows of data so your job runs at least 10 minutes.

Change the job to a transform stage and re-run with the same amount of data.

What results do you get?

Posted: Fri Oct 10, 2008 1:21 pm
by ray.wurlod
The statement is true in version 7.5 and later.

Posted: Fri Oct 10, 2008 1:30 pm
by girija
Thanks Ray. I was just waiting for this statement.
You already mentioned earlier that they did lots of change in transformer in 7.5 and onwards.

Thanks again for your comment.

Girija S

Posted: Sat Oct 11, 2008 5:32 am
by ArndW
girija - I just spent 5 minutes testing this at Version 8

Simple RowGen -> Trans/Filter -> Copy Stage.

The Row Generator creates one seeded Random integer column with values of 1 to 100. The Filter and Transform only pass on rows with a value of > 50, i.e. about 50% filtration/constraint rates for 10 Million rows on a 1-node configuration.

Both tests repeated several times on an otherwise unused system
~7-8 Seconds Filter Stage.
~10-11 Seconds Transform Stage.

Posted: Mon Oct 13, 2008 11:52 am
by bcarlson
We had an IBM trainer tell us the same thing, that a buildop or transformer would do the filter more efficiently than the filter stage. I am surprised at the results you are seeing, ArndW. That is very interesting...

Did you put a sleep command in your transformer to skew your results? :wink: j/k

Brad

Posted: Mon Oct 13, 2008 3:49 pm
by ray.wurlod
Would you like to repeat the tests with slightly more complex conditions, maybe an OR conjunction or a leading-wildcard LIKE condition?