Use key management routines in parallel jobs??

ds_user78 · Post by **ds_user78** » Thu Jan 05, 2006 6:21 am

Hi,
We have been asked to use the key management routines (KeyMgtGetNextValue,KeyMgtGetNextValueConcurrent) of DS for surrogate key generation over Oracle sequences. And we are using parallel jobs. Is BASIC transformer the only way to call these routines? How much of an over head will it be to use BASIC transformer in a parallel job?
Thanks,

balajisr · Post by **balajisr** » Thu Jan 05, 2006 6:27 am

Hi

Use of Basic Transformer will affect your performance and should be avoided. It will also not work in MPP systems.

--Balaji S.R

Sreenivasulu · Post by **Sreenivasulu** » Thu Jan 05, 2006 6:38 am

Hi,

I am not sure but i think we can put the normal transformer(not the basic transformer) in a shared container and use it in PX.

Regards
Sreeni

ArndW · Post by **ArndW** » Thu Jan 05, 2006 7:51 am

Yes, you can put both types of transformers into a shared container. But you can only call BASIC subroutines from PX Basic transformers, so it doesn't affect the previous answer about using the server key management calls.

Nevertheless, PX has a surrogate key generator stage that you can use.

kwwilliams · Post by **kwwilliams** » Thu Jan 05, 2006 9:23 am

ArndW wrote:Yes, you can put both types of transformers into a shared container. But you can only call BASIC subroutines from PX Basic transformers, so it doesn't affect the previous answer about using the server key management calls.

Nevertheless, PX has a surrogate key generator stage that you can use.

Arnd, I am curious as to how you use the surrogate key generator. I create mostly star schema data marts and the surrogate keys are used for relationships between the dimension and fact tables. I have found the surrogate key generator to be too simple to really make it worth using. To use it I would need to keep track of the last surrogate key I used and then pass this into my job as a parameter. Where I use the basic transformer to create new surrogate keys. The job would look something like this:

Code: Select all

                                  Dimension Database Stage
                                                |
Source Database Stage  --  Lookup Stage  --  Funnel  --  Upsert Database Stage
                                |              |
                           CopyStage   -   Basic Xfm Stage

Select the data from the source. Perform a lookup against the existing datain the dimension. Use a reject link from the lookup stage. The copy stage exists only because the basic xfm cannot accept a reject link as it's input. Funnel the data existing data and the new data together. Then update/insert on the dimension based on the nonsurrogate key.

I've also seen developers use an Oracle sequence in a sparse lookup to create the key.

I would find the surrogate key generator to be more useful if it could track the surrogate keys itself like the KeyMgtGetNextValue,KeyMgtGetNextValueConcurrent routines do in the basic transformer.

Code: Select all

ArndW · Post by **ArndW** » Thu Jan 05, 2006 9:35 am

I just did a test with the surrogate key generator stage in PX and can see the limitations - since the values aren't persistant across runs you are quite correct in asserting that the stage is of limited value! I think I'd prefer to use a buildop or a PX transform stage with logic instead of resorting to the BASIC functions; but I think that I'd most likely only use the surrogate key generation stage in a very limited number of cases.

ganive · Post by **ganive** » Thu Jan 05, 2006 10:12 am

Hi,

Don't know if I understood your problem well but here is something I often use on my project :
Seems that you're able to select the max key in your Dimension.
Seems that you're able to put a Surrogate Key Generator to generate a unique sequence (1->X) on each of your Source line.

Perform the Lookup to get the Max and Use a transformer to had the Key you generated (via SKG) with the max coming from your dimension.
You'll then get a Unique Key...

kwwilliams wrote:
ArndW wrote:Yes, you can put both types of transformers into a shared container. But you can only call BASIC subroutines from PX Basic transformers, so it doesn't affect the previous answer about using the server key management calls.

Nevertheless, PX has a surrogate key generator stage that you can use.
Arnd, I am curious as to how you use the surrogate key generator. I create mostly star schema data marts and the surrogate keys are used for relationships between the dimension and fact tables. I have found the surrogate key generator to be too simple to really make it worth using. To use it I would need to keep track of the last surrogate key I used and then pass this into my job as a parameter. Where I use the basic transformer to create new surrogate keys. The job would look something like this:
Code: Select all
                                  Dimension Database Stage
                                                |
Source Database Stage  --  Lookup Stage  --  Funnel  --  Upsert Database Stage
                                |              |
                           CopyStage   -   Basic Xfm Stage
Select the data from the source. Perform a lookup against the existing datain the dimension. Use a reject link from the lookup stage. The copy stage exists only because the basic xfm cannot accept a reject link as it's input. Funnel the data existing data and the new data together. Then update/insert on the dimension based on the nonsurrogate key.

I've also seen developers use an Oracle sequence in a sparse lookup to create the key.

I would find the surrogate key generator to be more useful if it could track the surrogate keys itself like the KeyMgtGetNextValue,KeyMgtGetNextValueConcurrent routines do in the basic transformer.
Code: Select all

kwwilliams · Post by **kwwilliams** » Fri Jan 06, 2006 10:52 am

Ganive,

I appreciate your response. I really didn't have a problem. Arnd had stated that the surrogate key generator could be used and I was pointing out what I thought were limitations of the surrogate key generator.

I would prefer not to have to select max on my surrogate key. That was the beauty about the KeyMgmt functionality in the basic transformer that it handles that for me. Another option would be to always store the last key used by the surrogate key generator in a sequential. Use a routine to read the sequential file and pass it as a parameter to your job. The surrogate key generator then uses this parameter + 1 as the starting point for generating new keys.

It's not that I don't understand how to use it, I just don't like having to do so much work for something that I believe a surrogate key generator should do out of the box. After reading Arnd's post I think I would have to agree that writing a custom routine would be the best way to go.

Thanks,

ray.wurlod · Post by **ray.wurlod** » Fri Jan 06, 2006 4:20 pm

The server (SDK) key management routines use persistent storage (a hashed file called SDKSequences) to store the next value. These routines can be thwarted by any non-DataStage process adding new surrogate key values to the target table without updating SDKSequences. If you were to implement something similar in parallel jobs you would still need persistent storage that could guarantee a stream of unique key values irrespective of the number of processing nodes (you want to preserve scalability, right?). There are two @PARTITION... system variables that will assist in generating the key values - read the initial value from your persistent storage (a sequential file will do) and save the final value back into the persistent storage. This solution mimics the server solution, while remaining vulnerable to non-DataStage updates of the target table.