What partitioning is best for transformer and what is the default one?
"Auto" partitioning is the default and in my opinion it is the best, unless you have a specific data scenario, which may warrant other partitioning methods.
When I sort the data in database stage and for some conversion functions I use transformer before JOIN stage would it unsort the data by default
An "Auto" partitioning method tries to retain the same partitioning method of the previous stage, though not necessarily. So we can explicitly specify "SAME" partitioning method in the transformer stage, to make sure.
But the tricky question can be, if we use SAME partitioning method, the record will stay in the same partition but will the sort order within the partition be preserved?
Whenever i design my jobs, i "hash-key partition and sort" the data initially and carry the it through different stages using "SAME" partitioning method. This is useful when the key columns stay the same over the various processing stages. My observation is that, the sorted records remain in order when "SAME" partitioning is used.
The question raised by Kris is rather interesting and dying to hear more comments from the seniors.