Duplicate identifier created by state file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
rohanf
Participant
Posts: 30
Joined: Fri Oct 07, 2005 5:39 am
Location: pune india

Duplicate identifier created by state file

Post by rohanf »

Hi All,

I am using a Surrogate key generator stage to create a primary key identifier. However there are 254 load rejects out of 4312066 with the Sqplcode: 1403 (ORA 1403: No data found).

I have searched the forum for 1403 issue and was unable to get a conclusive resolution.

But, when I loaded all the data into a dummy table with no primary key for the generated column, I could see 254 records having duplicate identifiers. Not sure how and in what case can the Surrogate key stage create a duplicate identifier.

Please let me know if anybody has got this issue before.

Thanks & Regards
Rohan
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Can you post your complete error message(s)? "No data found" <> "duplicate key". It would also help to detail your job design, # nodes, etc etc.
-craig

"You can never have too many knives" -- Logan Nine Fingers
rohanf
Participant
Posts: 30
Joined: Fri Oct 07, 2005 5:39 am
Location: pune india

Post by rohanf »

Hi,

There are no errors in the log .... There is just sqlcode(1403) in the reject file ....

Job design is as below:
Transformation:-
Join-> Transformer->Lookup->Filter->Modify->Surrogate->Tgt


Load:-
Tgt->Copy->Oracle enterprise**- - - - - - ->Reject file

** The options are:
Write method: "Upsert"
Upsert order: "Insert then update"
Upsert mode: "User defined Update & Insert"
Update query is disable using Where (1=2).

The job log has no warnings or fatal errors.

Please let me know in case more info is needed.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You're getting those errors when your fake update sql fires on the duplicate attempts. How exactly do you have the Surrogate stage setup? Does this happen if you run on a single node?
-craig

"You can never have too many knives" -- Logan Nine Fingers
rohanf
Participant
Posts: 30
Joined: Fri Oct 07, 2005 5:39 am
Location: pune india

Post by rohanf »

The surrogate stage has following properties:-
1: Source Type: Flat File
2: Generate key from last Highest Value=No

The Partition is Auto for all the stages after the Surrogate stage. i.e. Tgt and the load job's stages.

I haven't tried running on one node. Not even sure if this error can be re-created.

Currently we are analysing this to ensure that this issue does not arise during the production run.

Also interestingly: The identifiers that are duplicated are the last 254 identifiers created by the surrogate stage.
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

I can't remember the surrogate key stage too well, but I seem to recall there being a 'block' setting. This value requires some tinkering, but from what I recall, the stage will take your start value and the block size and for each node start with the start value plus the value of the block setting. So if the block was 1000, and the start value is 1, the first node would start generating the surrogate values at 1 and increase each value by 1, and the second node would start counting at 1001 and increase each value by etc. This meant that if you had more than 2000 records (in this example) that you would get overlaps in the generated value.

So if you using multiple nodes, divide the number of records by the number of nodes and round up a little and use this as your block size.

By the way its quite possible I am remembering this all wrong.
crazysaint
Participant
Posts: 2
Joined: Thu Aug 24, 2006 9:08 am
Location: USA

Similar Issue

Post by crazysaint »

Hi Rohan,

Were you able to identify the root cause for this issue? I am also facing a similar kind of issue where the surrogate key generator is generating keys which it had already created in the past and populated to the target table. One thing I noticed about the keys generated was that the min value of the key generated was in the order of 10k and the max value was in the order of 2 million and the job is processing around 2k records. I am really surprised at this big diff between the max and min key values.

Heres the way my stage is setup:
Source Type: Flat File
File Block Size = System Picked
File Initial Value = 1

It would be great if someone can provide some inputs on this.

Thanks for your time.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Do you lookup your existing target to check for existing key values ? This may cause the update to fire.

Also check whether there are duplicates in the incoming stream.
Post Reply