Surrogate Key Generator questions

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Surrogate Key Generator questions

Post by pandeesh »

prem84 wrote:I created a state file using UNIX touch command and in the surrogate key generator stage
......And the state file is also updated
what's the purpose of creating state file?
Last edited by pandeesh on Fri Jul 01, 2011 3:04 am, edited 1 time in total.
pandeeswaran
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

ray.wurlod wrote:Keys are allocated in blocks. If not all keys in a block are used (due to slight variations in the number of rows per partition) they are discarded.
does it mean that it's not at all possible to be contiguous?

thanks
pandeeswaran
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Since these are your questions, I split them out into your own thread rather than clutter up someone else's issue. And that way you can decide when your doubts have been resolved.

For the record, split from this post if you want to see the original context.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Of course it's possible to be contiguous. You could run just a single partition.

Beyond that you can not guarantee contiguity unless you can also guarantee precisely that there are precisely the same number of rows in every partition.

And even then, the Surrogate Key Generator stage is not really the preferred way to do it (you could set the block size to 1, but that is rather slow). You should instead use a Column Generator or Transformer stage to generate the unique values.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

So, the state file concept is not applicable in Datastage 7.5.x?

i could not find any option for that in SKG stage.
pandeeswaran
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Correct, the state file and/or support of a sequence object were added in the 8.x release from what I recall.
-craig

"You can never have too many knives" -- Logan Nine Fingers
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

so how i can achieve the same in 7.5.x?
pandeeswaran
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

What "same" would that be? The stage works the way it works in your version, not much you can do about that.
-craig

"You can never have too many knives" -- Logan Nine Fingers
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

chulett wrote:What "same" would that be? The stage works the way it works in your version, not much you can do about that.
My understanding is as given below:

in DS 7.5.x version, the SKG stage is used for key generations in one time load.

Example, in the first run 100 records are loaded. So SK will be generated from 1 to 100.

In the next day the same job runs, it will again generate from 1 to 100 and not from 101.

In DS 7.5.x it's not possible to achieve this using SKG stage.

Correct me ,if my understanding is not correct
pandeeswaran
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

You are essentially correct. However, you can achieve close to what you want by specifying the starting number for SKG and pass it in as a job parameter.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

if we are going to run the job manually then we can follow as you said.

But if it's a scheduled job, how to pass the last loaded SK in last run to current run as a parameter?

how to achieve this communication?

Thanks
pandeeswaran
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

You may have to create a job sequence which retrieves the max() value, writes it to a seq file and then bind this value to a job parm in the sequence. This is one possible way to do it, I am sure there are others.
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

mhester wrote:You may have to create a job sequence which retrieves the max() value, writes it to a seq file and then bind this value to a job parm in the sequence. This is one possible way to do it, I am sure there are others.
Yes!! i have achieved like this;

1)Initiallly I have created a dummy hash file with the value of 1 in that.

2)Then i have created the parallel job with source sequential file ,SKG stage and target sequential file. In the SKG stage i have passed a parameter for start value.

3)I have created one more server job in which the target file of the previous job as source. And i have used aggregator stage for fiinding max() for SK field.Next in the Transformer stage , i have calculated Max()+1 and ovewritten the value in the hash file.

4)In the main job sequence initially i have placed a user variable activity stage and get the value from the hassh file using GetHashValueByKey("","") and pass this value to the first job as a parameter for S.K start value.

In this way i have achieved this with a parallel job and server job in sequenece.

Any other ways or ideas welcome!
Thanks
pandeeswaran
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Well, if you're using hashed files, you may as well use the Key Management routines in the SDK for working with them.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

ray.wurlod wrote:Well, if you're using hashed files, you may as well use the Key Management routines in the SDK for working with them.
Is it possible to retrieve the value from sequential file and pass in the user varibale activity stage?(i have done this in hashed file.in that design,can the hashed file be replaced by sequential file)

Thanks
pandeeswaran
Post Reply