issue when using 'Generate key from last highest value=NO'

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Suma Reddy
Participant
Posts: 7
Joined: Mon Oct 30, 2006 12:19 am

issue when using 'Generate key from last highest value=NO'

Post by Suma Reddy »

Hi,

I am facing some issue when using same surrogate key state file by separate jobs in parallel.

1.Currently I am using the stage in parallel mode with 'Generate key from last highest value=YES'.
Beacuse of this as expected value generated will not be continuous. But this approach is desired because if accidently the state file is deleted we can populate the state file with maximum value retrieved from the table(which was populated using this state file).

Before the file was deleted the content of state file
8002 8002

After the state file is created and updated with the max value from table i can view the content as:
1 8001
8002 8002
Since i have used 'Generate key from last highest value=YES', the job which uses this surrogate key will have the value starting from 8001. So no issue here but there is gap in sequence generated.

2. If i use surrogate stage in parallel mode(don't want to use sequential) and 'Generate key from last highest value=NO'. There is no problem until the state file is intact. Even though the data may not be continuous but each value will be used. There will not be any gap in the sequence. The problem comes if the state file is deleted.

Before the file was deleted the content of state file
8002 8002

After the state file is created and updated with the max value from table i can view the content as:
1 8001
8002 8002

In this case when I run job which makes use of above state file to populate the table, the value starts with 1. This data is already present in the table, hence the job aborts with unique key constraint problem.

Can you suggest any way [using surrogate stage in parallel mode(don't want to use sequential) and 'Generate key from last highest value=NO'] that after the state file is created and updated with maximum value from the table it has just 8002 8002 (8001 is the maximum value generated by table). I don't want it to have value '1 8001'

Thanks,
Suma
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: issue when using 'Generate key from last highest value=N

Post by chulett »

Let's get this out of the way first:
Suma Reddy wrote:Beacuse of this as expected value generated will not be continuous.
Suma Reddy also wrote:]So no issue here but there is gap in sequence generated.
Why is this in any kind of concern? Having "gaps" in surrogate key sequences is normal, expected and nothing to lose any sleep over. Do you have any particular reason for actually needing them to be contiguous and "gap free"?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Suma Reddy
Participant
Posts: 7
Joined: Mon Oct 30, 2006 12:19 am

Post by Suma Reddy »

No although contigous value is preferred it is not mandatory. Can you please let me know the maximum value generated by surrogate key generator using state file?

In my project I am expecting huge volume of data.And this table will grow with each passing day.

I am just wondering if I continue to lose value with this gap in sequence i shouldn't reach to limit of value generated by surrogate key generator.
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

Post by zulfi123786 »

Suma Reddy wrote:I am just wondering if I continue to lose value with this gap in sequence i shouldn't reach to limit of value generated by surrogate key generator.
The holes in the key values do exist if the block size is greater than one but the holes keep filling up when new keys are generated untill the block is completely utilised and then new blocks are reserved so you don't have to worry about the gaps to the extent of going beyond unsigned bigint range
- Zulfi
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Hi Suma,

IMO, the surrogate key state file has been a little "flaky" since the beginning.

Once you understand what is going on, you can deal with it.

When you create a new state file having a value of 8002 using the "Create and Update" action, you see the results that you have seen:
1 8001
8002 8002

When you update an existing state file with a value of 8002 using the "Update" action you see:
1 1
8002 8002

With the update action, you still have 1 key that will be "reused".

I have built myself a generic utility job sequence to sync a surrogate key state file:
1) remove an existing state file
2) create a state file
3) update the state file with the max value selected from database table
4) generate 1 surrogate key (to burn the "1 1" entry)

The final result is a state file with a single record:
8002 8002

The format of the state file is quite simple. It is simply two unsigned bigint fields stored in binary format with no field or record delimiters (16 bytes per record). You can use the od command to view it. I suspect they are stored with native endianness.

Mike
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

Post by zulfi123786 »

Mike wrote:You can use the od command to view it.
I have used the od command to have a look at the contents but at times od command did not correctly read the values. any such experience ?
- Zulfi
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

No such experience. I can't imagine how od "did not correctly read the values".

I'm more fluent in hex than octal, so I use od to dump it as hex.

Another quick way to view is to set up a sequential file stage. Define the record type as implicit, set delimiter and quote to none, define two fields as unsigned bigint, add the property to indicate that each field is binary, then use view data.

Mike
Suma Reddy
Participant
Posts: 7
Joined: Mon Oct 30, 2006 12:19 am

Post by Suma Reddy »

Thanks everyone for helping. Special thanks to Mike for explaining the state file in such a clear and simple manner.
Post Reply