Surrogate Key maintenance

Archive of postings to DataStageUsers@Oliver.com. This forum intended only as a reference and cannot be posted to.

Moderators: chulett, rschirm

Locked
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Surrogate Key maintenance

Post by admin »

This is a topic for an orphaned message.
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

The BCI is a set of BASIC functions that mimic the ODBC 2.0 API. So its ODBC protocols.

You are right that you need to perform a reference lookup against the dimension table to determine existence/change. However, this can be done extremely efficiently by pre-loading a hashed file. Something like SELECT A.CODE, A.DESCR FROM DIMTABLE A WHERE A.SURRKEY = (SELECT
MAX(B.SURRKEY) FROM DIMTABLE B WHERE A.CODE = B.CODE);

Then the lookups can be done against CODE (the "natural key" in your terms) as the key in the hashed file (an equi-join, so its fast). Existence is an ISNULL test on the returned key, and testing for difference needs only to compare the auditable columns.

If there are likely to be duplicates in the input stream you must ensure that the hashed file is NOT pre-loaded to memory, and that it is also updated when a changed value occurs. If there can not be duplicates (SELECT DISTINCT for example), then this check can be omitted.

> ----------
> From: David Barham[SMTP:djbarham@powerup.com.au]
> Reply To: informix-datastage@oliver.com
> Sent: Friday, 13 October 2000 11:17
> To: informix-datastage@oliver.com
> Subject: RE: Surrogate Key maintenance (was ORACLE OCI)
>
> I can see Im going to be a little more careful about how I word
> things in future. Thanks Ray, this is a perfectly reasonable option.
> Your approach has the advantage that it can all be done in the one
> transform stage as the transform function will only be executed when
> one actually inserts. (As opposed to looking up a sequence which needs
> to be in a separate transform stage to avoid incrementing it when one
> is not inserting).
>
> Is the BCI using ODBC (in which case I know what you are talking
> about) or using OCI (something I havent tried in basic code)?
>
> I still dont think you can avoid doing the lookup first though to see
> if the dimension record already exists.
>
> *sigh* I guess thats what I get for missing the " Programming with
> DataStage BASIC" class.
>
> (Dont suppose an electronic copy of the course notes might
> accidentally, anonymously arrive in my mail box, amongst all the other
> spam I get?)
>
> David Barham
> Information Technology Consultant
> InformAtect Pty Ltd
> Brisbane, Australia
>
> -----Original Message-----
> From: Ray Wurlod [mailto:ray.wurlod@informix.com]
> Sent: Friday, 13 October 2000 17:44
> To: informix-datastage@oliver.com
> Cc: Anthony Corrente; Kylie Jones
> Subject: Surrogate Key maintenance (was ORACLE OCI)
>
> More good news is that you CAN do it (and I know Phil is a gun BASIC
> programmer).
>
> Construct a routine using BCI (BASIC SQL Client Interface) functions
> to SELECT MAX(surrogate_key) FROM dimension_table, load this into
> COMMON, then construct a transform function to increment it when
> needed to generate the next row. This technique is taught on the
> "Programming with DataStage BASIC" class.
>
> > ----------
> > From: David Barham[SMTP:David.Barham@anglocoal.com.au]
> > Reply To: informix-datastage@oliver.com
> > Sent: Friday, 13 October 2000 04:55
> > To: informix-datastage@oliver.com
> > Subject: RE: ORACLE OCI
> >
> > Do you want the good news or the bad news?
> >
> > The bad news is that you cant do what you are trying to do.
> >
> > The good news is that you have figured out the correct/only way to
> > do
> it.
> >
> > We extensively use surrogate keys for our dimensions and manage them
> with
> > sequences.
> >
> > The approach we use is in one transform, look up the dimension table
> with
> > the "natural" key to get the surrogate key (we call it a UID). If
> > it exists, the same transform updates the dimension table. If it
> > does not exist, pass the data to another transform which looks up
> > the values from the sequence and inserts the new row to the
> > dimension.
> >
> > Alternatively, instead of a lookup to get the next value from
> > sequence, you could use a user defined query for the insert to
> > directly reference the sequence. For that matter, this would remove
> > the need for the second transform.
> >
> > For my money, the extra lookup is easier and makes it clearer in the
> > job as to what is happening.
> >
> > Having done a lookup to the dimension, you may as well test to see
> > if anything has changed and avoid the update if you dont need it.
> >
> > Have fun.
> >
> > -----Original Message-----
> > From: Walker, Phil (Forests Manukau) [SMTP:Phil.Walker@chh.co.nz]
> > Sent: Friday, October 13, 2000 10:54 AM
> > To: informix-datastage@oliver.com
> > Subject: ORACLE OCI
> >
> > Hi,
> >
> > We are trying to construct a job where we update/insert into
> > dimension table in oracle using the OCI plugin. The dimension table
> > is to have a
> surrogate
> > key, and we were wishing to set this within the oracle environment
> rather
> > than using the DataStage 4.0 surrogate key generation capability.
> > Therefore rather than having to do a lookup to see if the dimension
> > record already exists we would use the update/insert capability of
> > the plugin using an alternative unique index based on the source
> > system keys. We would not include the SERIAL surrogate key in the
> > columns to update/insert. This would work under SQL server
> > 7.0/informix with SERIAL data types, but the only thing I can find
> > in ORACLE is sequences, which from what I
> understand
> > have no direct relationship with a table, and therefore do not
> > behave
> like
> > a
> > SERIAL data type.
> >
> > I know we could do a lookup using the alternate unique index to
> > return
> the
> > surrogate key value, and therefore determine if an insert or update
> > was about to occur depending on whether a row was returned, but I
> > was trying to eliminate having to do this.
> >
> > Any suggestions?
> >
> > Phil.
> >
> >
> >
> **********************************************************************
> ***
> > This e-mail and any files transmitted with it may be confidential
> > and are intended solely for the use of the individual or entity to
> > whom they are addressed. If you have received this e-mail in error,
> > please notify the sender by return e-mail, and delete this e-mail
> > from your in-box. Do not copy it to anybody else
> >
> >
> **********************************************************************
> ***
> >
>
Locked