Duplicate identifier created by state file

rohanf · Post by **rohanf** » Mon May 17, 2010 5:38 am

Hi All,

I am using a Surrogate key generator stage to create a primary key identifier. However there are 254 load rejects out of 4312066 with the Sqplcode: 1403 (ORA 1403: No data found).

I have searched the forum for 1403 issue and was unable to get a conclusive resolution.

But, when I loaded all the data into a dummy table with no primary key for the generated column, I could see 254 records having duplicate identifiers. Not sure how and in what case can the Surrogate key stage create a duplicate identifier.

Please let me know if anybody has got this issue before.

Thanks & Regards
Rohan

chulett · Post by **chulett** » Mon May 17, 2010 6:33 am

Can you post your complete error message(s)? "No data found" <> "duplicate key". It would also help to detail your job design, # nodes, etc etc.

rohanf · Post by **rohanf** » Mon May 17, 2010 6:59 am

Hi,

There are no errors in the log .... There is just sqlcode(1403) in the reject file ....

Job design is as below:
Transformation:-
Join-> Transformer->Lookup->Filter->Modify->Surrogate->Tgt

Load:-
Tgt->Copy->Oracle enterprise**- - - - - - ->Reject file

** The options are:
Write method: "Upsert"
Upsert order: "Insert then update"
Upsert mode: "User defined Update & Insert"
Update query is disable using Where (1=2).

The job log has no warnings or fatal errors.

Please let me know in case more info is needed.

chulett · Post by **chulett** » Mon May 17, 2010 7:10 am

You're getting those errors when your fake update sql fires on the duplicate attempts. How exactly do you have the Surrogate stage setup? Does this happen if you run on a single node?

rohanf · Post by **rohanf** » Mon May 17, 2010 7:34 am

The surrogate stage has following properties:-
1: Source Type: Flat File
2: Generate key from last Highest Value=No

The Partition is Auto for all the stages after the Surrogate stage. i.e. Tgt and the load job's stages.

I haven't tried running on one node. Not even sure if this error can be re-created.

Currently we are analysing this to ensure that this issue does not arise during the production run.

Also interestingly: The identifiers that are duplicated are the last 254 identifiers created by the surrogate stage.

ShaneMuir · Post by **ShaneMuir** » Mon May 17, 2010 9:01 am

I can't remember the surrogate key stage too well, but I seem to recall there being a 'block' setting. This value requires some tinkering, but from what I recall, the stage will take your start value and the block size and for each node start with the start value plus the value of the block setting. So if the block was 1000, and the start value is 1, the first node would start generating the surrogate values at 1 and increase each value by 1, and the second node would start counting at 1001 and increase each value by etc. This meant that if you had more than 2000 records (in this example) that you would get overlaps in the generated value.

So if you using multiple nodes, divide the number of records by the number of nodes and round up a little and use this as your block size.

By the way its quite possible I am remembering this all wrong.

crazysaint · Post by **crazysaint** » Thu Jun 17, 2010 2:48 am

Hi Rohan,

Were you able to identify the root cause for this issue? I am also facing a similar kind of issue where the surrogate key generator is generating keys which it had already created in the past and populated to the target table. One thing I noticed about the keys generated was that the min value of the key generated was in the order of 10k and the max value was in the order of 2 million and the job is processing around 2k records. I am really surprised at this big diff between the max and min key values.

Heres the way my stage is setup:
Source Type: Flat File
File Block Size = System Picked
File Initial Value = 1

It would be great if someone can provide some inputs on this.

Thanks for your time.

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Thu Jun 17, 2010 3:51 am

Do you lookup your existing target to check for existing key values ? This may cause the update to fire.

Also check whether there are duplicates in the incoming stream.

DSXchange

Duplicate identifier created by state file

Duplicate identifier created by state file

Similar Issue