unique key calculation

ArunaDas_Maharana · Post by **ArunaDas_Maharana** » Wed Aug 01, 2012 1:10 pm

hi,

i have a parallel job, (@PARTITIONNUM+1): @ INROWNUM function is used to gererate a unique key for further processing (inserting in database).

the job is on 8.5, running on 8 node. records are failing while inserting into datase using (@PARTITIONNUM+1): @ INROWNUM as unique key.

volume is 5000, observation is record number 464 onwards on partition number 4 are duplicated.

well i tried to think of all the combinations but can't come up with duplicate using (@PARTITIONNUM+1): @ INROWNUM.

could you please help me.

jwiles · Post by **jwiles** » Wed Aug 01, 2012 2:19 pm

This sounds suspiciously like the same issue as stated in this thread. Perhaps you can answer the same questions which were asked of Chandra234 in that thread?

Or maybe this (the thought just occurred to me): Is this an interview question?

Regards,

Kryt0n · Post by **Kryt0n** » Wed Aug 01, 2012 4:48 pm

jwiles wrote:This sounds suspiciously like the same issue as stated in this thread. Perhaps you can answer the same questions which were asked of DSRajesh in that thread?

If it's not the same then there is something spooky going on... two people with the exact same issue down to the row number... [cue]twilight zone[/cue]

ArunaDas_Maharana · Post by **ArunaDas_Maharana** » Thu Aug 09, 2012 1:09 pm

thanks for sharing the link, yes guys it does looks spooky to me ...

well i would say a coincident, or IBM no more supporting 7.x from 2013 forcing everyone to migrate to 8.x enviornment replicating similar issues.

well you guys sure have good memory and imagination

jwiles , reply makes sense to me for more nodes, it can explain the duplicates.
Code:
Partition 0 Record 1464: 0+1 : 1464 = 1 : 1464 = 11464
Partition 10 Record 464: 10+1 : 464 = 11 : 464 = 11464

but in 8 node first digit always should be unique. doesn't explain the duplicates yet.

I will post if i find something logical.

Thanks guys for replying.

jwiles · Post by **jwiles** » Thu Aug 09, 2012 1:27 pm

It was suggested in the other thread to do the following:

- peek the partition number, row number and your "unique" key value for the duplicate rows
- When you have duplicates, what was the generated key and what do you believe it SHOULD have been?
- Did duplicates occur within the SAME partition (partition numbers are equal), different partitions or a combination of both?

We never received answers to those questions and therefore were unable to assist further in that thread.

Also, what is the data type of the generated key column (both stage variables and output link columns as appropriate)? PaulVL showed a good example of how a duplicate could occur when using a non-string key column (some sort of numeric data type).

Gathering this data helps in identifying a potential pattern or condition that leads to the duplicates.

Regards,

ArunaDas_Maharana · Post by **ArunaDas_Maharana** » Thu Aug 09, 2012 4:54 pm

please find below my response....

peek the partition number, row number and your "unique" key value for the duplicate rows

i tried to simulate in other test enviornment , but to be honest can't replicate the same. this is production job which is showing duplicates intermittengly.
yes, if i could get this data it would be helpful. i am trying to move a change in prod but not sure whether i will get the approvals. let you know.

- When you have duplicates, what was the generated key and what do you believe it SHOULD have been?
the pattern is with 464 row onwards duplicates are getting generated until 520 row. and the partition is 4 always.

so 5464, 5464,5465,5465,5466,5466......5520,5521 it should have been

5464,5465,5466.....

- Did duplicates occur within the SAME partition (partition numbers are equal), different partitions or a combination of both?

yes in same partition mostly 4 , combination of both. both row number and partition number is duplicated

Also, what is the data type of the generated key column (both stage variables and output link columns as appropriate)?

i think it might be culprit in this behaviour , the data type is smallint.

ArunaDas_Maharana · Post by **ArunaDas_Maharana** » Thu Aug 09, 2012 4:56 pm

guys i found the spooky connection also.

the person is same org, he was trying to help the team until i came to know about the issue.

Out of curiousity i reached to you guys..

okay so we did resolve one mystery :D

jwiles · Post by **jwiles** » Fri Aug 10, 2012 9:02 am

Ah...smallint (unsigned), which is 2 bytes long, has a maximum value of 65535. This is possibly the primary cause of the duplicates, but I'm uncertain of how this would not have appeared in DS 7x unless the row quantities were not sufficient.

I would recommend either an unsigned Integer or unsigned BigInt for storing the generated key, regardless of the logic used to create it.

Regards,

ArunaDas_Maharana · Post by **ArunaDas_Maharana** » Fri Aug 10, 2012 12:40 pm

yeah i would agree with you, i was also not sure with the data type smallint when i saw the code, not at all my favourite for storing any calculation.