Hash partition key's datatype changes partition behavior?
Posted: Wed Feb 02, 2011 3:32 pm
Hello,
I'm running Datastage 8.1 FP1 on Windows Server 2003.
I'm experiencing a behavior that the documentation does not mention and is different than I expect.
When using the Hash partitioning method in a parallel transformer stage, record distribution across the nodes appear to be dependent upon the datatype of the selected hash key field.
I created a simple job to remove all other logic and isolate the odd behavior that I am experiencing. For example: The test has a 4-node config file using the hash partition method with ColA as the selected partition key. My input data set is 16 records total with the following values:
ColA
"1"
"1"
"1"
"1"
"2"
"2"
"2"
"2"
"3"
"3"
"3"
"3"
"4"
"4"
"4"
"4"
Based on the job monitor in Director,
When ColA's datatype is set as Varchar (works as intended)
node1=4 records
node2=4 records
node3=4 records
node4=4 records
When ColA's datatype is set as Integer (not sure why this occurs)
node1=0 records
node2=4 records
node3=12 records
node4=0 records
When ColA's datatype is set as Decimal (not sure why this occurs)
node1=16 records
node2=0 records
node3=0 records
node4=0 records
Any explanation for why I'm receiving this behavior?
Thanks,
David
I'm running Datastage 8.1 FP1 on Windows Server 2003.
I'm experiencing a behavior that the documentation does not mention and is different than I expect.
When using the Hash partitioning method in a parallel transformer stage, record distribution across the nodes appear to be dependent upon the datatype of the selected hash key field.
I created a simple job to remove all other logic and isolate the odd behavior that I am experiencing. For example: The test has a 4-node config file using the hash partition method with ColA as the selected partition key. My input data set is 16 records total with the following values:
ColA
"1"
"1"
"1"
"1"
"2"
"2"
"2"
"2"
"3"
"3"
"3"
"3"
"4"
"4"
"4"
"4"
Based on the job monitor in Director,
When ColA's datatype is set as Varchar (works as intended)
node1=4 records
node2=4 records
node3=4 records
node4=4 records
When ColA's datatype is set as Integer (not sure why this occurs)
node1=0 records
node2=4 records
node3=12 records
node4=0 records
When ColA's datatype is set as Decimal (not sure why this occurs)
node1=16 records
node2=0 records
node3=0 records
node4=0 records
Any explanation for why I'm receiving this behavior?
Thanks,
David