This is not an interview question and is just for my understanding.
A sequential file stage is executed in parallel(file pattern)
and there is a data link to copy stage and then to dataset.The copy stage by default uses same partition.As per the defintion of same parition it has to apply the preceeding stage's partition but which partition is applied to sequential file stage in this scenario?
same partition after a sequential file stage??
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 134
- Joined: Tue Jun 15, 2010 2:10 am
- Location: Bangalore
same partition after a sequential file stage??
N.Srinivas
India.
India.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Partitioning only ever occurs on an input link. How the data are partitioned within the Sequential File stage will depend on a number of factors, but will typically be one file per partition if you have the same number of files as partitions. There are other properties such as "treat as File Set" that can affect the way that this works.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 134
- Joined: Tue Jun 15, 2010 2:10 am
- Location: Bangalore
Thanks Ray for your reply.Partition occurs only on input link but in this scenario what parition would same partition invoke?
Since I am not a premium member ,I could understand from your reply that 1 file per partition at the sequential file stage and the same is applied to copy stage but what about the rows in each file?would they be distributed in round robin?
Since I am not a premium member ,I could understand from your reply that 1 file per partition at the sequential file stage and the same is applied to copy stage but what about the rows in each file?would they be distributed in round robin?
N.Srinivas
India.
India.
-
- Premium Member
- Posts: 730
- Joined: Tue Nov 04, 2008 10:14 am
- Location: Bangalore
when you have "same" pratitioning set on the copy stage i/p link, it means that the i/p link of copy stage should not attempt a re-partitioning. what ever records are placed in what ever partitions they would remain in their respective partitions unless you have any node map/ node pool constraints set on the copy stage.
Regarding the partitioning at seq file stage, as ray said it should be as one file per partition else a round robin which is the default one in most cases.
Regarding the partitioning at seq file stage, as ray said it should be as one file per partition else a round robin which is the default one in most cases.