Partioning to be used in transformer stage

Grace J. · Post by **Grace J.** » Wed Nov 12, 2008 7:49 am

Hi,

I need to use a transformer stage after a sequantial file stage which is the source. I have no idea on which partition to be used in the transformer stage that is after the sequential stage. Can anyone help me on this...
Thanks in advance....

Regards,
Grace J.

ray.wurlod · Post by **ray.wurlod** » Wed Nov 12, 2008 7:53 am

By default the Transformer stage will execute in all nodes of the default node pool specified in the current configuration file (named in $APT_CONFIG_FILE). You can choose to override this if you wish, but it's not necessary in the job design you specified unless you are doing some task that requires execution on only one node.

kandyshandy · Post by **kandyshandy** » Wed Nov 12, 2008 11:59 am

What are you trying to do in the transformer?

Grace J. · Post by **Grace J.** » Wed Nov 12, 2008 10:41 pm

jst appyling trimming function and filtering

kandyshandy · Post by **kandyshandy** » Thu Nov 13, 2008 9:20 am

Then you don't have to worry about partitioning type in Transformer stage. AUTO partitioning method in transformer stage will decide the best partitioning method for you.

Nagaraj · Post by **Nagaraj** » Tue Nov 18, 2008 6:47 pm

Since you are reading from a sequential the name itself says it will run in sequential mode, so i believe the next stage also will become sequential,
There is nothing much you can do here apart from changing the properites in the sequential file.

ray.wurlod · Post by **ray.wurlod** » Tue Nov 18, 2008 8:43 pm

False.

There will be a partitioner between the Sequential File stage and any downstream stage that executes in parallel.

Further, for sufficiently large sequential files, you can assign multiple readers. If you assign N readers, each processes 1/N of the lines in the file.

Nagaraj · Post by **Nagaraj** » Tue Nov 18, 2008 9:02 pm

If we increase the number of readers per node...to say 5 and we have some 10 million records...in a single large file......so each reader will take 2 million records and process the data in parallel.

If the numbers of readers is set to one, will it not run the whole downstream sequentially? it wont induce any partition,

Can you please explain....! without changing the number of readers per node.

Thanks

ray.wurlod · Post by **ray.wurlod** » Wed Nov 19, 2008 12:48 am

With one reader per node, the Sequential File stage will execute sequentially. But you will note that there is a "fan out" icon on the link between this and the next stage, indicating that the next stage will execute in parallel (which you can verify by inspecting its Advanced tab).

DSXchange