Generation of Surrogate Key

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

JoshGeorge
Participant
Posts: 612
Joined: Thu May 03, 2007 4:59 am
Location: Melbourne

Post by JoshGeorge »

For Kenneth:

Good post. Again I'm not trying to debate on the best approach to take the seed. Undoubtedly, it is from the database. Approach I posted is on a complete tool based one.

Thanks to you Kenneth! I have included your point and updated the blog post.
This case you need to pick the max value from the target table and use the same strategy specified above. Pass it thru a transformer and update the environment variable before the start of main job using the same conditions.
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

kcbland wrote:For Craig:

If you create a table with a primary key index in Oracle, Sybase, etc, you can't drop the index without dropping the table. If you create the table without a primary key index, but add a unique index afterwards, the database will use that one just the same as a primary key index.

Now why would one want to drop a primary key index? DUPLICATES!! AHAHHHAHHAHA :twisted: I got 'em!

If you DIRECT PATH bulk load your table in Oracle, the duplicates aren't found until the load has completed and the deferred index updates are attempted. AAHAHAHHAHAHAH Unusable indexes due to duplicates! Now try to find them!!! AHAHAHHAAH You can't query the table and you can't drop the index. :shock:

If you created a secondary unique index, you have to find duplicates without the benefit of an index. :shock: That's why it's recommended that you use non-unique secondary indexes instead of unique or primary key indexes in data warehouses.

So, go to AskTom.Oracle.com and read the wisdom therein, better yet, buy his books. He speaks specifically to the differences between OLTP and OLAP architectures. The gem is the one about non-unique secondary indexes - that one is powerful in Oracle DIRECT path loaded surrogate key managed environments. 8)
Ah... I see that I misread you. I thought you were simply saying that having a surrogate key as the primary key was the bad idea. I'm aware of the indexing issues you've noted, however you've articulated it much mo' better than I probably could have - thanks for that. Heck, I've just been the victim of this exact issue - bulk load with duplicate primary keys that leaves indexes 'unusable' without any hint of a problem from sqlldr.

By the way, I'm pretty sure you can query the table as long as you specifically avoid usage of the unusable index. For Oracle, that can mean a FULL hint will allow you to full table scan the table and find the duplicates. Or at least it just worked for the situation I had found myself in.

I've read his site and have at least one of Tom's books around here somewhere. However, I rarely get the chance to leverage some of those gems as (so far) the database has either been already in place or under the creative control of other folks. :evil:
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

For Ken: did you mean to add five exclamation points after each of those maniacal laughs?

(See Maskerade by Terry Pratchett)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

chulett wrote: By the way, I'm pretty sure you can query the table as long as you specifically avoid usage of the unusable index. For Oracle, that can mean a FULL hint will allow you to full table scan the table and find the duplicates. Or at least it just worked for the situation I had found myself in.
The DBA's have to get involved, disabling indexes and such, but the problem is the delete. You need to delete the loaded rows and update the index, kind of hard to do if it's disabled. If it's enabled, it's corrupt, and won't allow deletes. To find the duplicates your query has to look at the surrogate keys and group by skey having count(1) > 1 and will use the index so won't work. Can't drop a primary key index, so even disabled it doesn't help you. At least if a secondary index it can be dropped, but a non-unique index will still be usable and allow you to do an efficient query and delete.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

ray.wurlod wrote:For Ken: did you mean to add five exclamation points after each of those maniacal laughs?

(See Maskerade by Terry Pratchett)
I was in a weird mood yesterday, hence my Opera Ghost laughter. :lol:
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Great... The Phantom of the Exchange lives! :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply