state files in surrogate key generator

sheema · Post by **sheema** » Tue May 12, 2009 2:23 pm

Hi,

I have a Job1,where I am using the Surrogate key generator(SGK) stage to generate a unique sequence .I am using the State files to generate a sequence.Below are the steps which i have followed to get this working

1.Created a JobA with just SGK stage to create the state file with no inputs and output links to that stage.(Key Source Action=Create and Source Type=Flat File and Source Name=<path and name of the Stage file)

2.Now I run Job1 with SGK stage which has input links and output links to the stage.(the below options setup in SGK stage File Initial Value=100 and Source Type=Flat File and Source Name=<path and name of stage file> and Generated Output Column Name=<column Name>).

But the problem I am facing is the sequences are not incremented by no 1.Do I have to set up any setting to increment the value by one.
And is there any thing in the process which I am missing for a proper sequence generation.

Thanks

chulett · Post by **chulett** » Tue May 12, 2009 2:32 pm

What makes you think the increment isn't 1? What are you seeing?

sheema · Post by **sheema** » Tue May 12, 2009 2:40 pm

For example if I have 500 rows for which i would like to generate sequence starting from 1 to 500. the SGK stage generates sequences from 1-104 by incrementing with 1 but after 104 it jumps to 1001 and it increments in 1 until 1102 and again jumps to 2001.

chulett · Post by **chulett** » Tue May 12, 2009 2:48 pm

How many nodes is your job running on?

sheema · Post by **sheema** » Tue May 12, 2009 2:50 pm

4 nodes.

ray.wurlod · Post by **ray.wurlod** » Tue May 12, 2009 3:42 pm

Think about what that means.

chulett · Post by **chulett** » Tue May 12, 2009 4:03 pm

A little crazy, I know, but this is explained in the Parallel Job Developer's Guide pdf manual.

deepticr · Post by **deepticr** » Thu May 14, 2009 4:23 am

Hi,

I'm facing a similar issue with SKG stage. I looked at the documentation of Datastage v8 and it explains nothing. I find some explanation in "Parallel Job Developer Guide v7.5.1. It says that the numbers in each partition are incremented by the number of partitions defined. For instatnce if my start value =0 and the number of partitions I have is 2, then the numbers generated are as follows:

Partition 1
-------------
0 2 4 6 8

Partition 2
-------------
1 3 5 7 9

But this is not the way in which the surrogate keys are getting generated.

Here is what I have done:

1. Job A has the SKG stage in Create mode with no i/p or o/p links. I assume this generates the state file. If this is case ideally, we must be able to set the initial value and the increment. But there are no properties available to set these values. However, the job run successfully with this log message:
"Surrogate_Key_Generator_0,0: State file F:\IBM\InformationServer\Server\Projects\EE2SM_SDN_MSTR_DATA_DEV\Files\WorkingDir\BasicRetail\test_surrkey.txt is empty."
Does this mean that this job is only used for marking an already created file as the state file and the sequence number gets generated only after the first invocation of the state file through an SKG stage in another job?

2. Job B has a SKG stage followed by a Dataset. In the SKG stage I set the following properties:
Number of records =10
Generate Key From Last Highest Value=Y

The number of rows in the dataset is 20. Why is this happening when I have specified the number of records as 10? Does this mean 10 records per partition? The output I get is:
1 2 3 4 5 6 7 8 9 10
1001 1002 1003 ... 1010.
As per the explanation provided by the documentation the key generated (on 2 node configuration) ought to be 1 3 5 7 9 in one partition and 2 4 6 8 10 in another partition. Why is there a gap in the sequence of 1000?

Please help me in understanding this.

-Deepti

chulett · Post by **chulett** » Thu May 14, 2009 5:34 am

From what I understand, the stage has 'improvements' in the 8.x version over the 7.x version so perhaps nuances of how the stage works have changed as well. Pity about the lack of documentation, I'll have to check with our 8.x person and see what, if anything, I can find out.

priyadarshikunal · Post by **priyadarshikunal** » Thu May 14, 2009 7:18 am

1.

Job A has the SKG stage in Create mode with no i/p or o/p links. I assume this generates the state file.

Yes. An empty surrogate key file. you can just use a touch command in before job subroutine to do the same.

If this is case ideally, we must be able to set the initial value and the increment. But there are no properties available to set these values.

Only an empty file is generated with that. Intial value is set by the surrogate key generator stage with o/p link. There you can find out the intial value property.

However, the job run successfully with this log message:
"Surrogate_Key_Generator_0,0: State file F:\IBM\InformationServer\Server\Projects\EE2SM_SDN_MSTR_DATA_DEV\Files\WorkingDir\BasicRetail\test_surrkey.txt is empty."

Because the file is empty.

Does this mean that this job is only used for marking an already created file as the state file and the sequence number gets generated only after the first invocation of the state file through an SKG stage in another job?

Yes.

2. Job B has a SKG stage followed by a Dataset. In the SKG stage I set the following properties:
Number of records =10
Generate Key From Last Highest Value=Y

The number of rows in the dataset is 20. Why is this happening when I have specified the number of records as 10? Does this mean 10 records per partition?

10 records per node. If you want only 10 records restrict its execution by defining node map constraint.

The output I get is:
1 2 3 4 5 6 7 8 9 10
1001 1002 1003 ... 1010.
As per the explanation provided by the documentation the key generated (on 2 node configuration) ought to be 1 3 5 7 9 in one partition and 2 4 6 8 10 in another partition. Why is there a gap in the sequence of 1000?

In transformer the default block size is 1 where it will generate 1,3,5,..
and 2,4,6,...

In surrogate key generator stage it will generates like 1,2,3,.....
and 1001,1002,1003,.....

even on one node if the same file is being used twice at the same time result may be like you are getting. Because the first 1000 keys to be generated is reserved by the first instance and next 1000 is reserved by next instance to optimize the process.

Its only my observation and based on the tests performed by me. I never read that in documentaion possiblly its not there.

However you can try to change the default block size by choosing the option file block size to 1.

Test it and let us know. and don't forget to watch the performance. You may find any increase in run time while processing large number of records.

nagarjuna · Post by **nagarjuna** » Thu May 14, 2009 6:41 pm

Thats a good explanation..i also tested the same and observed that the way it generates depend on the block size .So , if you want it to be sequential then generate with block size=1 or using the option DB sequence .

srinivas.nettalam · Post by **srinivas.nettalam** » Wed Jul 10, 2013 1:02 am

The default block size is 1000 and you can always set it to 1 if you are particular about the sequence and not just the uniqueness of the values.

jwiles · Post by **jwiles** » Wed Jul 10, 2013 10:52 pm

Use the block size of 1 if you require it, otherwise you should consider the default or even larger block sizes if you are assigning a lot of surrogate keys. Lower value block sizes introduce more overhead to the job due to needing to access the statefile more often. I've seen a block size of 1 bring a job to a literal crawl when it was used unnecessarily in a high-volume situation.

Regards,

prasson_ibm · Post by **prasson_ibm** » Thu Jul 11, 2013 2:03 am

Just i wanted to add my observation,even your block size is 1 and partition is not round robin,you will land up in getting gap in your sequence number.

DSXchange

state files in surrogate key generator

state files in surrogate key generator

Facing a similar issue with surrogate key generator stage