state files in surrogate key generator
Moderators: chulett, rschirm, roy
state files in surrogate key generator
Hi,
I have a Job1,where I am using the Surrogate key generator(SGK) stage to generate a unique sequence .I am using the State files to generate a sequence.Below are the steps which i have followed to get this working
1.Created a JobA with just SGK stage to create the state file with no inputs and output links to that stage.(Key Source Action=Create and Source Type=Flat File and Source Name=<path and name of the Stage file)
2.Now I run Job1 with SGK stage which has input links and output links to the stage.(the below options setup in SGK stage File Initial Value=100 and Source Type=Flat File and Source Name=<path and name of stage file> and Generated Output Column Name=<column Name>).
But the problem I am facing is the sequences are not incremented by no 1.Do I have to set up any setting to increment the value by one.
And is there any thing in the process which I am missing for a proper sequence generation.
Thanks
I have a Job1,where I am using the Surrogate key generator(SGK) stage to generate a unique sequence .I am using the State files to generate a sequence.Below are the steps which i have followed to get this working
1.Created a JobA with just SGK stage to create the state file with no inputs and output links to that stage.(Key Source Action=Create and Source Type=Flat File and Source Name=<path and name of the Stage file)
2.Now I run Job1 with SGK stage which has input links and output links to the stage.(the below options setup in SGK stage File Initial Value=100 and Source Type=Flat File and Source Name=<path and name of stage file> and Generated Output Column Name=<column Name>).
But the problem I am facing is the sequences are not incremented by no 1.Do I have to set up any setting to increment the value by one.
And is there any thing in the process which I am missing for a proper sequence generation.
Thanks
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Facing a similar issue with surrogate key generator stage
Hi,
I'm facing a similar issue with SKG stage. I looked at the documentation of Datastage v8 and it explains nothing. I find some explanation in "Parallel Job Developer Guide v7.5.1. It says that the numbers in each partition are incremented by the number of partitions defined. For instatnce if my start value =0 and the number of partitions I have is 2, then the numbers generated are as follows:
Partition 1
-------------
0 2 4 6 8
Partition 2
-------------
1 3 5 7 9
But this is not the way in which the surrogate keys are getting generated.
Here is what I have done:
1. Job A has the SKG stage in Create mode with no i/p or o/p links. I assume this generates the state file. If this is case ideally, we must be able to set the initial value and the increment. But there are no properties available to set these values. However, the job run successfully with this log message:
"Surrogate_Key_Generator_0,0: State file F:\IBM\InformationServer\Server\Projects\EE2SM_SDN_MSTR_DATA_DEV\Files\WorkingDir\BasicRetail\test_surrkey.txt is empty."
Does this mean that this job is only used for marking an already created file as the state file and the sequence number gets generated only after the first invocation of the state file through an SKG stage in another job?
2. Job B has a SKG stage followed by a Dataset. In the SKG stage I set the following properties:
Number of records =10
Generate Key From Last Highest Value=Y
The number of rows in the dataset is 20. Why is this happening when I have specified the number of records as 10? Does this mean 10 records per partition? The output I get is:
1 2 3 4 5 6 7 8 9 10
1001 1002 1003 ... 1010.
As per the explanation provided by the documentation the key generated (on 2 node configuration) ought to be 1 3 5 7 9 in one partition and 2 4 6 8 10 in another partition. Why is there a gap in the sequence of 1000?
Please help me in understanding this.
-Deepti
I'm facing a similar issue with SKG stage. I looked at the documentation of Datastage v8 and it explains nothing. I find some explanation in "Parallel Job Developer Guide v7.5.1. It says that the numbers in each partition are incremented by the number of partitions defined. For instatnce if my start value =0 and the number of partitions I have is 2, then the numbers generated are as follows:
Partition 1
-------------
0 2 4 6 8
Partition 2
-------------
1 3 5 7 9
But this is not the way in which the surrogate keys are getting generated.
Here is what I have done:
1. Job A has the SKG stage in Create mode with no i/p or o/p links. I assume this generates the state file. If this is case ideally, we must be able to set the initial value and the increment. But there are no properties available to set these values. However, the job run successfully with this log message:
"Surrogate_Key_Generator_0,0: State file F:\IBM\InformationServer\Server\Projects\EE2SM_SDN_MSTR_DATA_DEV\Files\WorkingDir\BasicRetail\test_surrkey.txt is empty."
Does this mean that this job is only used for marking an already created file as the state file and the sequence number gets generated only after the first invocation of the state file through an SKG stage in another job?
2. Job B has a SKG stage followed by a Dataset. In the SKG stage I set the following properties:
Number of records =10
Generate Key From Last Highest Value=Y
The number of rows in the dataset is 20. Why is this happening when I have specified the number of records as 10? Does this mean 10 records per partition? The output I get is:
1 2 3 4 5 6 7 8 9 10
1001 1002 1003 ... 1010.
As per the explanation provided by the documentation the key generated (on 2 node configuration) ought to be 1 3 5 7 9 in one partition and 2 4 6 8 10 in another partition. Why is there a gap in the sequence of 1000?
Please help me in understanding this.
-Deepti
From what I understand, the stage has 'improvements' in the 8.x version over the 7.x version so perhaps nuances of how the stage works have changed as well. Pity about the lack of documentation, I'll have to check with our 8.x person and see what, if anything, I can find out.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Premium Member
- Posts: 1735
- Joined: Thu Mar 01, 2007 5:44 am
- Location: Troy, MI
1.
Yes. An empty surrogate key file. you can just use a touch command in before job subroutine to do the same.
and 2,4,6,...
In surrogate key generator stage it will generates like 1,2,3,.....
and 1001,1002,1003,.....
even on one node if the same file is being used twice at the same time result may be like you are getting. Because the first 1000 keys to be generated is reserved by the first instance and next 1000 is reserved by next instance to optimize the process.
Its only my observation and based on the tests performed by me. I never read that in documentaion possiblly its not there.
However you can try to change the default block size by choosing the option file block size to 1.
Test it and let us know. and don't forget to watch the performance. You may find any increase in run time while processing large number of records.
Job A has the SKG stage in Create mode with no i/p or o/p links. I assume this generates the state file.
Yes. An empty surrogate key file. you can just use a touch command in before job subroutine to do the same.
Only an empty file is generated with that. Intial value is set by the surrogate key generator stage with o/p link. There you can find out the intial value property.If this is case ideally, we must be able to set the initial value and the increment. But there are no properties available to set these values.
Because the file is empty.However, the job run successfully with this log message:
"Surrogate_Key_Generator_0,0: State file F:\IBM\InformationServer\Server\Projects\EE2SM_SDN_MSTR_DATA_DEV\Files\WorkingDir\BasicRetail\test_surrkey.txt is empty."
Yes.Does this mean that this job is only used for marking an already created file as the state file and the sequence number gets generated only after the first invocation of the state file through an SKG stage in another job?
10 records per node. If you want only 10 records restrict its execution by defining node map constraint.2. Job B has a SKG stage followed by a Dataset. In the SKG stage I set the following properties:
Number of records =10
Generate Key From Last Highest Value=Y
The number of rows in the dataset is 20. Why is this happening when I have specified the number of records as 10? Does this mean 10 records per partition?
In transformer the default block size is 1 where it will generate 1,3,5,..The output I get is:
1 2 3 4 5 6 7 8 9 10
1001 1002 1003 ... 1010.
As per the explanation provided by the documentation the key generated (on 2 node configuration) ought to be 1 3 5 7 9 in one partition and 2 4 6 8 10 in another partition. Why is there a gap in the sequence of 1000?
and 2,4,6,...
In surrogate key generator stage it will generates like 1,2,3,.....
and 1001,1002,1003,.....
even on one node if the same file is being used twice at the same time result may be like you are getting. Because the first 1000 keys to be generated is reserved by the first instance and next 1000 is reserved by next instance to optimize the process.
Its only my observation and based on the tests performed by me. I never read that in documentaion possiblly its not there.
However you can try to change the default block size by choosing the option file block size to 1.
Test it and let us know. and don't forget to watch the performance. You may find any increase in run time while processing large number of records.
Priyadarshi Kunal
Genius may have its limitations, but stupidity is not thus handicapped.
Genius may have its limitations, but stupidity is not thus handicapped.
-
- Participant
- Posts: 134
- Joined: Tue Jun 15, 2010 2:10 am
- Location: Bangalore
Use the block size of 1 if you require it, otherwise you should consider the default or even larger block sizes if you are assigning a lot of surrogate keys. Lower value block sizes introduce more overhead to the job due to needing to access the statefile more often. I've seen a block size of 1 bring a job to a literal crawl when it was used unnecessarily in a high-volume situation.
Regards,
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.
-
- Premium Member
- Posts: 536
- Joined: Thu Oct 11, 2007 1:48 am
- Location: Bangalore
Just i wanted to add my observation,even your block size is 1 and partition is not round robin,you will land up in getting gap in your sequence number.
Thanks
Prasoon
ETL Consultant
LinkedIn :- http://www.linkedin.com/profile/view?id ... ab_pro_top
Blog:- http://dsshar.blogspot.com/
Prasoon
ETL Consultant
LinkedIn :- http://www.linkedin.com/profile/view?id ... ab_pro_top
Blog:- http://dsshar.blogspot.com/