unique key calculation

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ArunaDas_Maharana
Participant
Posts: 42
Joined: Thu Dec 11, 2008 11:07 am

unique key calculation

Post by ArunaDas_Maharana »

hi,

i have a parallel job, (@PARTITIONNUM+1): @ INROWNUM function is used to gererate a unique key for further processing (inserting in database).

the job is on 8.5, running on 8 node. records are failing while inserting into datase using (@PARTITIONNUM+1): @ INROWNUM as unique key.

volume is 5000, observation is record number 464 onwards on partition number 4 are duplicated.

well i tried to think of all the combinations but can't come up with duplicate using (@PARTITIONNUM+1): @ INROWNUM.

could you please help me.
Thanks,
Aruna
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

This sounds suspiciously like the same issue as stated in this thread. Perhaps you can answer the same questions which were asked of Chandra234 in that thread?

Or maybe this (the thought just occurred to me): Is this an interview question?

Regards,
Last edited by jwiles on Wed Aug 01, 2012 10:30 pm, edited 1 time in total.
- james wiles


All generalizations are false, including this one - Mark Twain.
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

jwiles wrote:This sounds suspiciously like the same issue as stated in this thread. Perhaps you can answer the same questions which were asked of DSRajesh in that thread?
If it's not the same then there is something spooky going on... two people with the exact same issue down to the row number... [cue]twilight zone[/cue]
ArunaDas_Maharana
Participant
Posts: 42
Joined: Thu Dec 11, 2008 11:07 am

Post by ArunaDas_Maharana »

thanks for sharing the link, yes guys it does looks spooky to me ...

well i would say a coincident, or IBM no more supporting 7.x from 2013 forcing everyone to migrate to 8.x enviornment replicating similar issues.

well you guys sure have good memory and imagination :wink:

jwiles , reply makes sense to me for more nodes, it can explain the duplicates.
Code:
Partition 0 Record 1464: 0+1 : 1464 = 1 : 1464 = 11464
Partition 10 Record 464: 10+1 : 464 = 11 : 464 = 11464

but in 8 node first digit always should be unique. doesn't explain the duplicates yet.

I will post if i find something logical.

Thanks guys for replying.
Thanks,
Aruna
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

It was suggested in the other thread to do the following:

- peek the partition number, row number and your "unique" key value for the duplicate rows
- When you have duplicates, what was the generated key and what do you believe it SHOULD have been?
- Did duplicates occur within the SAME partition (partition numbers are equal), different partitions or a combination of both?

We never received answers to those questions and therefore were unable to assist further in that thread.

Also, what is the data type of the generated key column (both stage variables and output link columns as appropriate)? PaulVL showed a good example of how a duplicate could occur when using a non-string key column (some sort of numeric data type).

Gathering this data helps in identifying a potential pattern or condition that leads to the duplicates.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
ArunaDas_Maharana
Participant
Posts: 42
Joined: Thu Dec 11, 2008 11:07 am

Post by ArunaDas_Maharana »

please find below my response....

peek the partition number, row number and your "unique" key value for the duplicate rows

i tried to simulate in other test enviornment , but to be honest can't replicate the same. this is production job which is showing duplicates intermittengly.
yes, if i could get this data it would be helpful. i am trying to move a change in prod but not sure whether i will get the approvals. let you know.

- When you have duplicates, what was the generated key and what do you believe it SHOULD have been?
the pattern is with 464 row onwards duplicates are getting generated until 520 row. and the partition is 4 always.

so 5464, 5464,5465,5465,5466,5466......5520,5521 it should have been

5464,5465,5466.....

- Did duplicates occur within the SAME partition (partition numbers are equal), different partitions or a combination of both?

yes in same partition mostly 4 , combination of both. both row number and partition number is duplicated

Also, what is the data type of the generated key column (both stage variables and output link columns as appropriate)?

i think it might be culprit in this behaviour , the data type is smallint.
Thanks,
Aruna
ArunaDas_Maharana
Participant
Posts: 42
Joined: Thu Dec 11, 2008 11:07 am

Post by ArunaDas_Maharana »

guys i found the spooky connection also.

the person is same org, he was trying to help the team until i came to know about the issue.

Out of curiousity i reached to you guys..

okay so we did resolve one mystery :D
Thanks,
Aruna
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Ah...smallint (unsigned), which is 2 bytes long, has a maximum value of 65535. This is possibly the primary cause of the duplicates, but I'm uncertain of how this would not have appeared in DS 7x unless the row quantities were not sufficient.

I would recommend either an unsigned Integer or unsigned BigInt for storing the generated key, regardless of the logic used to create it.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
ArunaDas_Maharana
Participant
Posts: 42
Joined: Thu Dec 11, 2008 11:07 am

Post by ArunaDas_Maharana »

yeah i would agree with you, i was also not sure with the data type smallint when i saw the code, not at all my favourite for storing any calculation.
Thanks,
Aruna
Post Reply