Page 1 of 1

same partition after a sequential file stage??

Posted: Wed Apr 27, 2011 12:05 am
by srinivas.nettalam
This is not an interview question and is just for my understanding.

A sequential file stage is executed in parallel(file pattern)
and there is a data link to copy stage and then to dataset.The copy stage by default uses same partition.As per the defintion of same parition it has to apply the preceeding stage's partition but which partition is applied to sequential file stage in this scenario?

Posted: Wed Apr 27, 2011 12:09 am
by ray.wurlod
Partitioning only ever occurs on an input link. How the data are partitioned within the Sequential File stage will depend on a number of factors, but will typically be one file per partition if you have the same number of files as partitions. There are other properties such as "treat as File Set" that can affect the way that this works.

Posted: Wed Apr 27, 2011 12:26 am
by srinivas.nettalam
Thanks Ray for your reply.Partition occurs only on input link but in this scenario what parition would same partition invoke?
Since I am not a premium member ,I could understand from your reply that 1 file per partition at the sequential file stage and the same is applied to copy stage but what about the rows in each file?would they be distributed in round robin?

Posted: Wed Apr 27, 2011 4:31 am
by zulfi123786
when you have "same" pratitioning set on the copy stage i/p link, it means that the i/p link of copy stage should not attempt a re-partitioning. what ever records are placed in what ever partitions they would remain in their respective partitions unless you have any node map/ node pool constraints set on the copy stage.

Regarding the partitioning at seq file stage, as ray said it should be as one file per partition else a round robin which is the default one in most cases.