Hi,
I need to join the columns (using join stage) which have MD5 hash values (using Checksum stage for this).
I have same data in source and target so expected to match all the records but join is not happening properly. I am doing HASH partition before join. When analysed output of HASH partition then it is giving different result (count in each partition is different) for source and target records for each partition.
It seems partition does not happen in same way for source and target.
Please help if you know the reason and how I can resolve this as I have to join on column having MD5 hash values.
Thanks.
HASH Partition not working for Checksum values
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 41
- Joined: Wed Oct 08, 2008 9:19 am
Right, was thinking much the same thing.
If that turns out to not be the issue, I for one would need more clarification about certain aspects of this. For example, the join is "not happening properly" because the partitioning is wrong (i.e. matching join keys don't go to the same partition) or do you mean something else. And it seems to me the simplest test to see if your core logic is sound is to run it on a single node. Is that something you've tried?
If that turns out to not be the issue, I for one would need more clarification about certain aspects of this. For example, the join is "not happening properly" because the partitioning is wrong (i.e. matching join keys don't go to the same partition) or do you mean something else. And it seems to me the simplest test to see if your core logic is sound is to run it on a single node. Is that something you've tried?
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 41
- Joined: Wed Oct 08, 2008 9:19 am
Well... in continuing to ponder this, it seems we can infer an answer. So the join itself is in fact working, assuming that "in sequence" means "sequentially" a.k.a. either on a single node or the stage being constrained. Which means we're back to exactly what are you partitioning on? Please detail for us (words, screenshot) that information.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers