Spliting data using transformer vs filter stage

girija · Post by **girija** » Fri Oct 10, 2008 9:21 am

Hi All,

In my job I am spliting the data based on some criterion and by using filter stage.

I received the comment from the reviewer :

"It is actually more efficient to use a transformer because the transformer is compiled and the filter is interpreted and adds more overhead than the transformer".

I don't know this before.

Waiting for your comments regarding this statement.

Thanks
Girija S

ArndW · Post by **ArndW** » Fri Oct 10, 2008 9:25 am

Create a job with a row generator and the column you are using to filter, run it through the filter stage and into a copy stage that has no output. Use enough rows of data so your job runs at least 10 minutes.

Change the job to a transform stage and re-run with the same amount of data.

What results do you get?

ray.wurlod · Post by **ray.wurlod** » Fri Oct 10, 2008 1:21 pm

The statement is true in version 7.5 and later.

girija · Post by **girija** » Fri Oct 10, 2008 1:30 pm

Thanks Ray. I was just waiting for this statement.
You already mentioned earlier that they did lots of change in transformer in 7.5 and onwards.

Thanks again for your comment.

Girija S

ArndW · Post by **ArndW** » Sat Oct 11, 2008 5:32 am

girija - I just spent 5 minutes testing this at Version 8

Simple RowGen -> Trans/Filter -> Copy Stage.

The Row Generator creates one seeded Random integer column with values of 1 to 100. The Filter and Transform only pass on rows with a value of > 50, i.e. about 50% filtration/constraint rates for 10 Million rows on a 1-node configuration.

Both tests repeated several times on an otherwise unused system
~7-8 Seconds Filter Stage.
~10-11 Seconds Transform Stage.

bcarlson · Post by **bcarlson** » Mon Oct 13, 2008 11:52 am

We had an IBM trainer tell us the same thing, that a buildop or transformer would do the filter more efficiently than the filter stage. I am surprised at the results you are seeing, ArndW. That is very interesting...

Did you put a sleep command in your transformer to skew your results?

j/k

Brad

ray.wurlod · Post by **ray.wurlod** » Mon Oct 13, 2008 3:49 pm

Would you like to repeat the tests with slightly more complex conditions, maybe an OR conjunction or a leading-wildcard LIKE condition?