issue when using 'Generate key from last highest value=NO'

Suma Reddy · Post by **Suma Reddy** » Wed Dec 07, 2011 2:58 am

Hi,

I am facing some issue when using same surrogate key state file by separate jobs in parallel.

1.Currently I am using the stage in parallel mode with 'Generate key from last highest value=YES'.
Beacuse of this as expected value generated will not be continuous. But this approach is desired because if accidently the state file is deleted we can populate the state file with maximum value retrieved from the table(which was populated using this state file).

Before the file was deleted the content of state file
8002 8002

After the state file is created and updated with the max value from table i can view the content as:
1 8001
8002 8002
Since i have used 'Generate key from last highest value=YES', the job which uses this surrogate key will have the value starting from 8001. So no issue here but there is gap in sequence generated.

2. If i use surrogate stage in parallel mode(don't want to use sequential) and 'Generate key from last highest value=NO'. There is no problem until the state file is intact. Even though the data may not be continuous but each value will be used. There will not be any gap in the sequence. The problem comes if the state file is deleted.

Before the file was deleted the content of state file
8002 8002

After the state file is created and updated with the max value from table i can view the content as:
1 8001
8002 8002

In this case when I run job which makes use of above state file to populate the table, the value starts with 1. This data is already present in the table, hence the job aborts with unique key constraint problem.

Can you suggest any way [using surrogate stage in parallel mode(don't want to use sequential) and 'Generate key from last highest value=NO'] that after the state file is created and updated with maximum value from the table it has just 8002 8002 (8001 is the maximum value generated by table). I don't want it to have value '1 8001'

Thanks,
Suma

chulett · Post by **chulett** » Wed Dec 07, 2011 6:48 am

Let's get this out of the way first:

Suma Reddy wrote:Beacuse of this as expected value generated will not be continuous.

Suma Reddy also wrote:]So no issue here but there is gap in sequence generated.

Why is this in any kind of concern? Having "gaps" in surrogate key sequences is normal, expected and nothing to lose any sleep over. Do you have any particular reason for actually needing them to be contiguous and "gap free"?

Suma Reddy · Post by **Suma Reddy** » Wed Dec 07, 2011 8:08 am

No although contigous value is preferred it is not mandatory. Can you please let me know the maximum value generated by surrogate key generator using state file?

In my project I am expecting huge volume of data.And this table will grow with each passing day.

I am just wondering if I continue to lose value with this gap in sequence i shouldn't reach to limit of value generated by surrogate key generator.

zulfi123786 · Post by **zulfi123786** » Wed Dec 07, 2011 9:48 am

Suma Reddy wrote:I am just wondering if I continue to lose value with this gap in sequence i shouldn't reach to limit of value generated by surrogate key generator.

The holes in the key values do exist if the block size is greater than one but the holes keep filling up when new keys are generated untill the block is completely utilised and then new blocks are reserved so you don't have to worry about the gaps to the extent of going beyond unsigned bigint range

Mike · Post by **Mike** » Wed Dec 07, 2011 12:04 pm

Hi Suma,

IMO, the surrogate key state file has been a little "flaky" since the beginning.

Once you understand what is going on, you can deal with it.

When you create a new state file having a value of 8002 using the "Create and Update" action, you see the results that you have seen:
1 8001
8002 8002

When you update an existing state file with a value of 8002 using the "Update" action you see:
1 1
8002 8002

With the update action, you still have 1 key that will be "reused".

I have built myself a generic utility job sequence to sync a surrogate key state file:
1) remove an existing state file
2) create a state file
3) update the state file with the max value selected from database table
4) generate 1 surrogate key (to burn the "1 1" entry)

The final result is a state file with a single record:
8002 8002

The format of the state file is quite simple. It is simply two unsigned bigint fields stored in binary format with no field or record delimiters (16 bytes per record). You can use the od command to view it. I suspect they are stored with native endianness.

Mike

zulfi123786 · Post by **zulfi123786** » Wed Dec 07, 2011 12:07 pm

Mike wrote:You can use the od command to view it.

I have used the od command to have a look at the contents but at times od command did not correctly read the values. any such experience ?

Mike · Post by **Mike** » Wed Dec 07, 2011 3:26 pm

No such experience. I can't imagine how od "did not correctly read the values".

I'm more fluent in hex than octal, so I use od to dump it as hex.

Another quick way to view is to set up a sequential file stage. Define the record type as implicit, set delimiter and quote to none, define two fields as unsigned bigint, add the property to indicate that each field is binary, then use view data.

Mike

Suma Reddy · Post by **Suma Reddy** » Thu Dec 08, 2011 7:23 am

Thanks everyone for helping. Special thanks to Mike for explaining the state file in such a clear and simple manner.

DSXchange

issue when using 'Generate key from last highest value=NO'

issue when using 'Generate key from last highest value=NO'

Re: issue when using 'Generate key from last highest value=N