Page 1 of 1

unique key calculation

Posted: Wed Aug 01, 2012 1:10 pm
by ArunaDas_Maharana
hi,

i have a parallel job, (@PARTITIONNUM+1): @ INROWNUM function is used to gererate a unique key for further processing (inserting in database).

the job is on 8.5, running on 8 node. records are failing while inserting into datase using (@PARTITIONNUM+1): @ INROWNUM as unique key.

volume is 5000, observation is record number 464 onwards on partition number 4 are duplicated.

well i tried to think of all the combinations but can't come up with duplicate using (@PARTITIONNUM+1): @ INROWNUM.

could you please help me.

Posted: Wed Aug 01, 2012 2:19 pm
by jwiles
This sounds suspiciously like the same issue as stated in this thread. Perhaps you can answer the same questions which were asked of Chandra234 in that thread?

Or maybe this (the thought just occurred to me): Is this an interview question?

Regards,

Posted: Wed Aug 01, 2012 4:48 pm
by Kryt0n
jwiles wrote:This sounds suspiciously like the same issue as stated in this thread. Perhaps you can answer the same questions which were asked of DSRajesh in that thread?
If it's not the same then there is something spooky going on... two people with the exact same issue down to the row number... [cue]twilight zone[/cue]

Posted: Thu Aug 09, 2012 1:09 pm
by ArunaDas_Maharana
thanks for sharing the link, yes guys it does looks spooky to me ...

well i would say a coincident, or IBM no more supporting 7.x from 2013 forcing everyone to migrate to 8.x enviornment replicating similar issues.

well you guys sure have good memory and imagination :wink:

jwiles , reply makes sense to me for more nodes, it can explain the duplicates.
Code:
Partition 0 Record 1464: 0+1 : 1464 = 1 : 1464 = 11464
Partition 10 Record 464: 10+1 : 464 = 11 : 464 = 11464

but in 8 node first digit always should be unique. doesn't explain the duplicates yet.

I will post if i find something logical.

Thanks guys for replying.

Posted: Thu Aug 09, 2012 1:27 pm
by jwiles
It was suggested in the other thread to do the following:

- peek the partition number, row number and your "unique" key value for the duplicate rows
- When you have duplicates, what was the generated key and what do you believe it SHOULD have been?
- Did duplicates occur within the SAME partition (partition numbers are equal), different partitions or a combination of both?

We never received answers to those questions and therefore were unable to assist further in that thread.

Also, what is the data type of the generated key column (both stage variables and output link columns as appropriate)? PaulVL showed a good example of how a duplicate could occur when using a non-string key column (some sort of numeric data type).

Gathering this data helps in identifying a potential pattern or condition that leads to the duplicates.

Regards,

Posted: Thu Aug 09, 2012 4:54 pm
by ArunaDas_Maharana
please find below my response....

peek the partition number, row number and your "unique" key value for the duplicate rows

i tried to simulate in other test enviornment , but to be honest can't replicate the same. this is production job which is showing duplicates intermittengly.
yes, if i could get this data it would be helpful. i am trying to move a change in prod but not sure whether i will get the approvals. let you know.

- When you have duplicates, what was the generated key and what do you believe it SHOULD have been?
the pattern is with 464 row onwards duplicates are getting generated until 520 row. and the partition is 4 always.

so 5464, 5464,5465,5465,5466,5466......5520,5521 it should have been

5464,5465,5466.....

- Did duplicates occur within the SAME partition (partition numbers are equal), different partitions or a combination of both?

yes in same partition mostly 4 , combination of both. both row number and partition number is duplicated

Also, what is the data type of the generated key column (both stage variables and output link columns as appropriate)?

i think it might be culprit in this behaviour , the data type is smallint.

Posted: Thu Aug 09, 2012 4:56 pm
by ArunaDas_Maharana
guys i found the spooky connection also.

the person is same org, he was trying to help the team until i came to know about the issue.

Out of curiousity i reached to you guys..

okay so we did resolve one mystery :D

Posted: Fri Aug 10, 2012 9:02 am
by jwiles
Ah...smallint (unsigned), which is 2 bytes long, has a maximum value of 65535. This is possibly the primary cause of the duplicates, but I'm uncertain of how this would not have appeared in DS 7x unless the row quantities were not sufficient.

I would recommend either an unsigned Integer or unsigned BigInt for storing the generated key, regardless of the logic used to create it.

Regards,

Posted: Fri Aug 10, 2012 12:40 pm
by ArunaDas_Maharana
yeah i would agree with you, i was also not sure with the data type smallint when i saw the code, not at all my favourite for storing any calculation.