Hi All,
I have a requirement to join two datasets created in two other jobs. One of the job is running on single node and other is running on multi node.
In both the jobs before creating datasets, data is sorted on key and hashed on key.
In third job I am joining these two datasets on same key with SAME partition before join and I am running job on multi node.
Is it give the correct join results?
My confusion is, I am using a dataset which is created on single node in multi node job with same partition.
Any ideas please?
Partitioning in Datasets
Moderators: chulett, rschirm, roy
Partitioning in Datasets
--
Swathi Ch
Swathi Ch
Thanks Chulett,
I am testing this job, it seems that I am getting the correct results but want to confirm with our expertise people here.
If I am creating a hashed dataset on single node means all the records will go into one partition on that node, in other dataset records will be scattered along multi nodes, then how datastage will take care of this join?
I am testing this job, it seems that I am getting the correct results but want to confirm with our expertise people here.
If I am creating a hashed dataset on single node means all the records will go into one partition on that node, in other dataset records will be scattered along multi nodes, then how datastage will take care of this join?
--
Swathi Ch
Swathi Ch
-
- Premium Member
- Posts: 1735
- Joined: Thu Mar 01, 2007 5:44 am
- Location: Troy, MI
Dataset created by job on single node will not have partition at all as all the key will be on same node. Datastage by default inserts sort and hash partitioning (for join) unless you change the environment variable to force datastage not to do it.
In this case, just to be on safer side, hash partition data to match the dataset created on multiple node, IMHO.
In this case, just to be on safer side, hash partition data to match the dataset created on multiple node, IMHO.
Priyadarshi Kunal
Genius may have its limitations, but stupidity is not thus handicapped.
Genius may have its limitations, but stupidity is not thus handicapped.
Thanks Chulett, Priyadarshi,
It means eventhough dataset is created on single node, when reading the same dataset from multinode, datastage automatically insert the tsort and hash operators internally and do a repartition the single node dataset data as per multinode requirements.
Thanks, Good point.
It means eventhough dataset is created on single node, when reading the same dataset from multinode, datastage automatically insert the tsort and hash operators internally and do a repartition the single node dataset data as per multinode requirements.
Thanks, Good point.
--
Swathi Ch
Swathi Ch
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: