INSERTing maximum value

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

RakshaPai
Participant
Posts: 35
Joined: Fri Jun 15, 2007 11:21 pm

INSERTing maximum value

Post by RakshaPai »

Hello all,

please help me this problem....

i have a source table with 17 fileds anda destination table with 18 fields. i have to insert all the rows into the destination reading from the source. i am using a transformaer to map all the fields between the source and destination. my problem is, the 18th field in my destination table which is not mapped to any filed in the source is the primary key for the destination table.
As expected when i try to insert into the destination it gives me an error saying cannot insert a null value to the primary key.

Is there any function/way in which i can specify a one up/ MAX +1 value for this field in DS ????? :(
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Get it via a Lookup stage whose reference link is fed with the query SELECT MAX(keyvalue) + 1 FROM targettable and which is partitioned Entire. Downstream of that use a column generator to add a sequence column beginning at the partitionnumber and incrementing by the partitioncount, and use either a Transformer stage or user-defined SQL to add the two.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
RakshaPai
Participant
Posts: 35
Joined: Fri Jun 15, 2007 11:21 pm

Post by RakshaPai »

Thanks ray!

So is this going to look like

Look up----> Column generator----> transformer

source ----> transformer...

can i have 2 inputs to the transformer... if not how else do i map the column??
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There is a link from Source to Lookup (probably through a Column Generator unless you select the constant key from source). There is no link from Source to Transformer.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
RakshaPai
Participant
Posts: 35
Joined: Fri Jun 15, 2007 11:21 pm

Post by RakshaPai »

ok... i am a lilttle confused here..i think i dint quite get the picture... :(

if i cannot have it to the transformer then how can i insert into the destination... all my columns except the primary key are mapped in the transformer. i need to insert all the rows from transformer + primary key into the destination al together to avoid the primary key constarint voilation.. how can i do this if the input from column generator does go to the transformer? :?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Code: Select all

                               TargetA
                                  |
                                  V
Source  ----->  ColGen  ---->  Lookup  ---->  Transformer  ---->  TargetB
The Column Generator stage generates a column containing 'X' in every row. The query on TargetA has the form SELECT 'X', MAX(keycol) + 1 FROM targettable and is Entire partitioned.

In the Transformer stage keycol is derived by adding the value returned by the lookup (a constant) to a sequence generated by adding (partitioncount * rownumber + partitionnumber) to the value obtained from the lookup.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Zhang Bo
Participant
Posts: 15
Joined: Sun Jul 15, 2007 9:22 pm

Re: INSERTing maximum value

Post by Zhang Bo »

i think the easiest way is you define the target table as *create table tbname (key_column integer not null generated always as identity (start with 1,increment by 1)),then your ETL job will just be full load from source table with 17 columns,the 18th column will be generated automatically,of course needn't transformer stage
MaDFroG
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Perhaps, but only if your database supports the concept of an 'identity' column.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Easiest it may be, but there is necessarily a bottleneck as the process of generating the next key is necessarily single-threaded. Any approach using an identity column or sequence will suffer from this dilemma.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
RakshaPai
Participant
Posts: 35
Joined: Fri Jun 15, 2007 11:21 pm

Post by RakshaPai »

Thanks all,

i jus saw the " Surrogate Key Generator" in parallel job designer guide... :) can i somehow make use of this.... when i read thru it quite looked like i can if i have no partitions...
only thing i cant figure out is:
the start value of the key has to be supplied and in my case i cannot code it as a job parameter because then each time i need look into the table to get the MAX + 1 value.... is there any way thru which i can dynamically supply this start value...??
one option i was thinking of was to first read this MAX+1 value to a file and then maybe use this value as a start value.. only thing is i don know if this can be done and if it will work :(


By the way we had thought of making the column IDENTITY, but there will be too many applications impacted after this change and hence we cant have this :roll: .

@Ray

if i have a column generator downstream of the source, i will need to know the number of records to generate the sequence right?? i tried this option but now the problem is the number of records keep changing... so the number i specify in column generator for seq id is not always correct and i end with holes for each run of the job... any alternative to make the number of seq id same as numbed of records....
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The Column Generator in my design simply generates the constant 'X' that you use to execute the Lookup.

Technical Rant
"Holes" DO NOT MATTER in surrogate keys. Surrogate keys are about uniqueness - NOTHING ELSE.

With Round Robin partitioning you are unlikely to get any holes.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
RakshaPai
Participant
Posts: 35
Joined: Fri Jun 15, 2007 11:21 pm

Post by RakshaPai »

my column generator is genating just 2 same values... :oops:
m i missing something...
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Yes, clearly. It should be generating only ONE "same value".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
RakshaPai
Participant
Posts: 35
Joined: Fri Jun 15, 2007 11:21 pm

Post by RakshaPai »

Thanks a lot guys... :)

finally i am able to generate the key values.

just one more question...

i have used the stages as suggested

lookuo--> column generator --> transformer

can i directly write the output of the transformer to the database or should i first have it in a file and write another job to isert into database from this while....
which option would be better???
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Neither is "better". Which is best suited to your requirements? Is the target database available whenever you need it, or should you store the data so they can be loaded when the database is available? In that case, a Data Set is probably preferable to a file, since it preserves the parallel nature of your processing.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply