Use key management routines in parallel jobs??
Moderators: chulett, rschirm, roy
Use key management routines in parallel jobs??
Hi,
We have been asked to use the key management routines (KeyMgtGetNextValue,KeyMgtGetNextValueConcurrent) of DS for surrogate key generation over Oracle sequences. And we are using parallel jobs. Is BASIC transformer the only way to call these routines? How much of an over head will it be to use BASIC transformer in a parallel job?
Thanks,
We have been asked to use the key management routines (KeyMgtGetNextValue,KeyMgtGetNextValueConcurrent) of DS for surrogate key generation over Oracle sequences. And we are using parallel jobs. Is BASIC transformer the only way to call these routines? How much of an over head will it be to use BASIC transformer in a parallel job?
Thanks,
-
- Premium Member
- Posts: 892
- Joined: Thu Oct 16, 2003 5:18 am
-
- Participant
- Posts: 437
- Joined: Fri Oct 21, 2005 10:00 pm
Arnd, I am curious as to how you use the surrogate key generator. I create mostly star schema data marts and the surrogate keys are used for relationships between the dimension and fact tables. I have found the surrogate key generator to be too simple to really make it worth using. To use it I would need to keep track of the last surrogate key I used and then pass this into my job as a parameter. Where I use the basic transformer to create new surrogate keys. The job would look something like this:ArndW wrote:Yes, you can put both types of transformers into a shared container. But you can only call BASIC subroutines from PX Basic transformers, so it doesn't affect the previous answer about using the server key management calls.
Nevertheless, PX has a surrogate key generator stage that you can use.
Code: Select all
Dimension Database Stage
|
Source Database Stage -- Lookup Stage -- Funnel -- Upsert Database Stage
| |
CopyStage - Basic Xfm Stage
Select the data from the source. Perform a lookup against the existing datain the dimension. Use a reject link from the lookup stage. The copy stage exists only because the basic xfm cannot accept a reject link as it's input. Funnel the data existing data and the new data together. Then update/insert on the dimension based on the nonsurrogate key.
I've also seen developers use an Oracle sequence in a sparse lookup to create the key.
I would find the surrogate key generator to be more useful if it could track the surrogate keys itself like the KeyMgtGetNextValue,KeyMgtGetNextValueConcurrent routines do in the basic transformer.
Code: Select all
Keith Williams
keith@peacefieldinc.com
keith@peacefieldinc.com
I just did a test with the surrogate key generator stage in PX and can see the limitations - since the values aren't persistant across runs you are quite correct in asserting that the stage is of limited value! I think I'd prefer to use a buildop or a PX transform stage with logic instead of resorting to the BASIC functions; but I think that I'd most likely only use the surrogate key generation stage in a very limited number of cases.
Hi,
Don't know if I understood your problem well but here is something I often use on my project :
Seems that you're able to select the max key in your Dimension.
Seems that you're able to put a Surrogate Key Generator to generate a unique sequence (1->X) on each of your Source line.
Perform the Lookup to get the Max and Use a transformer to had the Key you generated (via SKG) with the max coming from your dimension.
You'll then get a Unique Key...
Don't know if I understood your problem well but here is something I often use on my project :
Seems that you're able to select the max key in your Dimension.
Seems that you're able to put a Surrogate Key Generator to generate a unique sequence (1->X) on each of your Source line.
Perform the Lookup to get the Max and Use a transformer to had the Key you generated (via SKG) with the max coming from your dimension.
You'll then get a Unique Key...
![Rolling Eyes :roll:](./images/smilies/icon_rolleyes.gif)
kwwilliams wrote:Arnd, I am curious as to how you use the surrogate key generator. I create mostly star schema data marts and the surrogate keys are used for relationships between the dimension and fact tables. I have found the surrogate key generator to be too simple to really make it worth using. To use it I would need to keep track of the last surrogate key I used and then pass this into my job as a parameter. Where I use the basic transformer to create new surrogate keys. The job would look something like this:ArndW wrote:Yes, you can put both types of transformers into a shared container. But you can only call BASIC subroutines from PX Basic transformers, so it doesn't affect the previous answer about using the server key management calls.
Nevertheless, PX has a surrogate key generator stage that you can use.
Code: Select all
Dimension Database Stage | Source Database Stage -- Lookup Stage -- Funnel -- Upsert Database Stage | | CopyStage - Basic Xfm Stage
Select the data from the source. Perform a lookup against the existing datain the dimension. Use a reject link from the lookup stage. The copy stage exists only because the basic xfm cannot accept a reject link as it's input. Funnel the data existing data and the new data together. Then update/insert on the dimension based on the nonsurrogate key.
I've also seen developers use an Oracle sequence in a sparse lookup to create the key.
I would find the surrogate key generator to be more useful if it could track the surrogate keys itself like the KeyMgtGetNextValue,KeyMgtGetNextValueConcurrent routines do in the basic transformer.Code: Select all
--------
GaNoU
--------
GaNoU
--------
-
- Participant
- Posts: 437
- Joined: Fri Oct 21, 2005 10:00 pm
Ganive,
I appreciate your response. I really didn't have a problem. Arnd had stated that the surrogate key generator could be used and I was pointing out what I thought were limitations of the surrogate key generator.
I would prefer not to have to select max on my surrogate key. That was the beauty about the KeyMgmt functionality in the basic transformer that it handles that for me. Another option would be to always store the last key used by the surrogate key generator in a sequential. Use a routine to read the sequential file and pass it as a parameter to your job. The surrogate key generator then uses this parameter + 1 as the starting point for generating new keys.
It's not that I don't understand how to use it, I just don't like having to do so much work for something that I believe a surrogate key generator should do out of the box. After reading Arnd's post I think I would have to agree that writing a custom routine would be the best way to go.
Thanks,
I appreciate your response. I really didn't have a problem. Arnd had stated that the surrogate key generator could be used and I was pointing out what I thought were limitations of the surrogate key generator.
I would prefer not to have to select max on my surrogate key. That was the beauty about the KeyMgmt functionality in the basic transformer that it handles that for me. Another option would be to always store the last key used by the surrogate key generator in a sequential. Use a routine to read the sequential file and pass it as a parameter to your job. The surrogate key generator then uses this parameter + 1 as the starting point for generating new keys.
It's not that I don't understand how to use it, I just don't like having to do so much work for something that I believe a surrogate key generator should do out of the box. After reading Arnd's post I think I would have to agree that writing a custom routine would be the best way to go.
Thanks,
Keith Williams
keith@peacefieldinc.com
keith@peacefieldinc.com
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The server (SDK) key management routines use persistent storage (a hashed file called SDKSequences) to store the next value. These routines can be thwarted by any non-DataStage process adding new surrogate key values to the target table without updating SDKSequences. If you were to implement something similar in parallel jobs you would still need persistent storage that could guarantee a stream of unique key values irrespective of the number of processing nodes (you want to preserve scalability, right?). There are two @PARTITION... system variables that will assist in generating the key values - read the initial value from your persistent storage (a sequential file will do) and save the final value back into the persistent storage. This solution mimics the server solution, while remaining vulnerable to non-DataStage updates of the target table.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.