INSERTing maximum value

RakshaPai · Post by **RakshaPai** » Thu Jul 19, 2007 11:29 pm

Hello all,

please help me this problem....

i have a source table with 17 fileds anda destination table with 18 fields. i have to insert all the rows into the destination reading from the source. i am using a transformaer to map all the fields between the source and destination. my problem is, the 18th field in my destination table which is not mapped to any filed in the source is the primary key for the destination table.
As expected when i try to insert into the destination it gives me an error saying cannot insert a null value to the primary key.

Is there any function/way in which i can specify a one up/ MAX +1 value for this field in DS ?????

ray.wurlod · Post by **ray.wurlod** » Thu Jul 19, 2007 11:35 pm

Get it via a Lookup stage whose reference link is fed with the query SELECT MAX(keyvalue) + 1 FROM targettable and which is partitioned Entire. Downstream of that use a column generator to add a sequence column beginning at the partitionnumber and incrementing by the partitioncount, and use either a Transformer stage or user-defined SQL to add the two.

RakshaPai · Post by **RakshaPai** » Fri Jul 20, 2007 6:28 am

Thanks ray!

So is this going to look like

Look up----> Column generator----> transformer

source ----> transformer...

can i have 2 inputs to the transformer... if not how else do i map the column??

ray.wurlod · Post by **ray.wurlod** » Fri Jul 20, 2007 10:48 pm

There is a link from Source to Lookup (probably through a Column Generator unless you select the constant key from source). There is no link from Source to Transformer.

RakshaPai · Post by **RakshaPai** » Sun Jul 22, 2007 10:24 pm

ok... i am a lilttle confused here..i think i dint quite get the picture...

if i cannot have it to the transformer then how can i insert into the destination... all my columns except the primary key are mapped in the transformer. i need to insert all the rows from transformer + primary key into the destination al together to avoid the primary key constarint voilation.. how can i do this if the input from column generator does go to the transformer?

ray.wurlod · Post by **ray.wurlod** » Sun Jul 22, 2007 11:27 pm

Code: Select all

                               TargetA
                                  |
                                  V
Source  ----->  ColGen  ---->  Lookup  ---->  Transformer  ---->  TargetB

The Column Generator stage generates a column containing 'X' in every row. The query on TargetA has the form SELECT 'X', MAX(keycol) + 1 FROM targettable and is Entire partitioned.

In the Transformer stage keycol is derived by adding the value returned by the lookup (a constant) to a sequence generated by adding (partitioncount * rownumber + partitionnumber) to the value obtained from the lookup.

Zhang Bo · Post by **Zhang Bo** » Sun Jul 22, 2007 11:37 pm

i think the easiest way is you define the target table as *create table tbname (key_column integer not null generated always as identity (start with 1,increment by 1)),then your ETL job will just be full load from source table with 17 columns,the 18th column will be generated automatically,of course needn't transformer stage

chulett · Post by **chulett** » Sun Jul 22, 2007 11:49 pm

Perhaps, but only if your database supports the concept of an 'identity' column.

ray.wurlod · Post by **ray.wurlod** » Mon Jul 23, 2007 1:09 am

Easiest it may be, but there is necessarily a bottleneck as the process of generating the next key is necessarily single-threaded. Any approach using an identity column or sequence will suffer from this dilemma.

RakshaPai · Post by **RakshaPai** » Mon Jul 23, 2007 5:33 am

Thanks all,

i jus saw the " Surrogate Key Generator" in parallel job designer guide...

can i somehow make use of this.... when i read thru it quite looked like i can if i have no partitions...
only thing i cant figure out is:
the start value of the key has to be supplied and in my case i cannot code it as a job parameter because then each time i need look into the table to get the MAX + 1 value.... is there any way thru which i can dynamically supply this start value...??
one option i was thinking of was to first read this MAX+1 value to a file and then maybe use this value as a start value.. only thing is i don know if this can be done and if it will work

By the way we had thought of making the column IDENTITY, but there will be too many applications impacted after this change and hence we cant have this

.

@Ray

if i have a column generator downstream of the source, i will need to know the number of records to generate the sequence right?? i tried this option but now the problem is the number of records keep changing... so the number i specify in column generator for seq id is not always correct and i end with holes for each run of the job... any alternative to make the number of seq id same as numbed of records....

ray.wurlod · Post by **ray.wurlod** » Mon Jul 23, 2007 2:03 pm

The Column Generator in my design simply generates the constant 'X' that you use to execute the Lookup.

Technical Rant
"Holes" DO NOT MATTER in surrogate keys. Surrogate keys are about uniqueness - NOTHING ELSE.
With Round Robin partitioning you are unlikely to get any holes.

RakshaPai · Post by **RakshaPai** » Mon Jul 23, 2007 9:46 pm

my column generator is genating just 2 same values...

m i missing something...

ray.wurlod · Post by **ray.wurlod** » Mon Jul 23, 2007 11:37 pm

Yes, clearly. It should be generating only ONE "same value".

RakshaPai · Post by **RakshaPai** » Wed Jul 25, 2007 1:04 am

Thanks a lot guys...

finally i am able to generate the key values.

just one more question...

i have used the stages as suggested

lookuo--> column generator --> transformer

can i directly write the output of the transformer to the database or should i first have it in a file and write another job to isert into database from this while....
which option would be better???

ray.wurlod · Post by **ray.wurlod** » Wed Jul 25, 2007 3:38 am

Neither is "better". Which is best suited to your requirements? Is the target database available whenever you need it, or should you store the data so they can be loaded when the database is available? In that case, a Data Set is probably preferable to a file, since it preserves the parallel nature of your processing.

DSXchange

INSERTing maximum value

INSERTing maximum value

Re: INSERTing maximum value