Page 1 of 1

Partioning to be used in transformer stage

Posted: Wed Nov 12, 2008 7:49 am
by Grace J.
Hi,

I need to use a transformer stage after a sequantial file stage which is the source. I have no idea on which partition to be used in the transformer stage that is after the sequential stage. Can anyone help me on this...
Thanks in advance....

Regards,
Grace J.

Posted: Wed Nov 12, 2008 7:53 am
by ray.wurlod
By default the Transformer stage will execute in all nodes of the default node pool specified in the current configuration file (named in $APT_CONFIG_FILE). You can choose to override this if you wish, but it's not necessary in the job design you specified unless you are doing some task that requires execution on only one node.

Posted: Wed Nov 12, 2008 11:59 am
by kandyshandy
What are you trying to do in the transformer?

jst appyling trimming function and filtering

Posted: Wed Nov 12, 2008 10:41 pm
by Grace J.
jst appyling trimming function and filtering

Re: jst appyling trimming function and filtering

Posted: Thu Nov 13, 2008 9:20 am
by kandyshandy
Then you don't have to worry about partitioning type in Transformer stage. AUTO partitioning method in transformer stage will decide the best partitioning method for you.

Posted: Tue Nov 18, 2008 6:47 pm
by Nagaraj
Since you are reading from a sequential the name itself says it will run in sequential mode, so i believe the next stage also will become sequential,
There is nothing much you can do here apart from changing the properites in the sequential file.

Posted: Tue Nov 18, 2008 8:43 pm
by ray.wurlod
False.

There will be a partitioner between the Sequential File stage and any downstream stage that executes in parallel.

Further, for sufficiently large sequential files, you can assign multiple readers. If you assign N readers, each processes 1/N of the lines in the file.

Posted: Tue Nov 18, 2008 9:02 pm
by Nagaraj
If we increase the number of readers per node...to say 5 and we have some 10 million records...in a single large file......so each reader will take 2 million records and process the data in parallel.

If the numbers of readers is set to one, will it not run the whole downstream sequentially? it wont induce any partition,

Can you please explain....! without changing the number of readers per node.

Thanks

Posted: Wed Nov 19, 2008 12:48 am
by ray.wurlod
With one reader per node, the Sequential File stage will execute sequentially. But you will note that there is a "fan out" icon on the link between this and the next stage, indicating that the next stage will execute in parallel (which you can verify by inspecting its Advanced tab).