Use key management routines in parallel jobs??

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ds_user78
Participant
Posts: 23
Joined: Thu Nov 11, 2004 5:39 pm

Use key management routines in parallel jobs??

Post by ds_user78 »

Hi,
We have been asked to use the key management routines (KeyMgtGetNextValue,KeyMgtGetNextValueConcurrent) of DS for surrogate key generation over Oracle sequences. And we are using parallel jobs. Is BASIC transformer the only way to call these routines? How much of an over head will it be to use BASIC transformer in a parallel job?
Thanks,
balajisr
Charter Member
Charter Member
Posts: 785
Joined: Thu Jul 28, 2005 8:58 am

Post by balajisr »

Hi

Use of Basic Transformer will affect your performance and should be avoided. It will also not work in MPP systems.

--Balaji S.R
Sreenivasulu
Premium Member
Premium Member
Posts: 892
Joined: Thu Oct 16, 2003 5:18 am

Post by Sreenivasulu »

Hi,

I am not sure but i think we can put the normal transformer(not the basic transformer) in a shared container and use it in PX.

Regards
Sreeni
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Yes, you can put both types of transformers into a shared container. But you can only call BASIC subroutines from PX Basic transformers, so it doesn't affect the previous answer about using the server key management calls.

Nevertheless, PX has a surrogate key generator stage that you can use.
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

ArndW wrote:Yes, you can put both types of transformers into a shared container. But you can only call BASIC subroutines from PX Basic transformers, so it doesn't affect the previous answer about using the server key management calls.

Nevertheless, PX has a surrogate key generator stage that you can use.
Arnd, I am curious as to how you use the surrogate key generator. I create mostly star schema data marts and the surrogate keys are used for relationships between the dimension and fact tables. I have found the surrogate key generator to be too simple to really make it worth using. To use it I would need to keep track of the last surrogate key I used and then pass this into my job as a parameter. Where I use the basic transformer to create new surrogate keys. The job would look something like this:



Code: Select all

                                  Dimension Database Stage
                                                |
Source Database Stage  --  Lookup Stage  --  Funnel  --  Upsert Database Stage
                                |              |
                           CopyStage   -   Basic Xfm Stage

Select the data from the source. Perform a lookup against the existing datain the dimension. Use a reject link from the lookup stage. The copy stage exists only because the basic xfm cannot accept a reject link as it's input. Funnel the data existing data and the new data together. Then update/insert on the dimension based on the nonsurrogate key.

I've also seen developers use an Oracle sequence in a sparse lookup to create the key.

I would find the surrogate key generator to be more useful if it could track the surrogate keys itself like the KeyMgtGetNextValue,KeyMgtGetNextValueConcurrent routines do in the basic transformer.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I just did a test with the surrogate key generator stage in PX and can see the limitations - since the values aren't persistant across runs you are quite correct in asserting that the stage is of limited value! I think I'd prefer to use a buildop or a PX transform stage with logic instead of resorting to the BASIC functions; but I think that I'd most likely only use the surrogate key generation stage in a very limited number of cases.
ganive
Participant
Posts: 18
Joined: Wed Sep 28, 2005 7:06 am

Post by ganive »

Hi,

Don't know if I understood your problem well but here is something I often use on my project :
Seems that you're able to select the max key in your Dimension.
Seems that you're able to put a Surrogate Key Generator to generate a unique sequence (1->X) on each of your Source line.

Perform the Lookup to get the Max and Use a transformer to had the Key you generated (via SKG) with the max coming from your dimension.
You'll then get a Unique Key... :roll:

kwwilliams wrote:
ArndW wrote:Yes, you can put both types of transformers into a shared container. But you can only call BASIC subroutines from PX Basic transformers, so it doesn't affect the previous answer about using the server key management calls.

Nevertheless, PX has a surrogate key generator stage that you can use.
Arnd, I am curious as to how you use the surrogate key generator. I create mostly star schema data marts and the surrogate keys are used for relationships between the dimension and fact tables. I have found the surrogate key generator to be too simple to really make it worth using. To use it I would need to keep track of the last surrogate key I used and then pass this into my job as a parameter. Where I use the basic transformer to create new surrogate keys. The job would look something like this:



Code: Select all

                                  Dimension Database Stage
                                                |
Source Database Stage  --  Lookup Stage  --  Funnel  --  Upsert Database Stage
                                |              |
                           CopyStage   -   Basic Xfm Stage

Select the data from the source. Perform a lookup against the existing datain the dimension. Use a reject link from the lookup stage. The copy stage exists only because the basic xfm cannot accept a reject link as it's input. Funnel the data existing data and the new data together. Then update/insert on the dimension based on the nonsurrogate key.

I've also seen developers use an Oracle sequence in a sparse lookup to create the key.

I would find the surrogate key generator to be more useful if it could track the surrogate keys itself like the KeyMgtGetNextValue,KeyMgtGetNextValueConcurrent routines do in the basic transformer.
--------
GaNoU
--------
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

Ganive,

I appreciate your response. I really didn't have a problem. Arnd had stated that the surrogate key generator could be used and I was pointing out what I thought were limitations of the surrogate key generator.

I would prefer not to have to select max on my surrogate key. That was the beauty about the KeyMgmt functionality in the basic transformer that it handles that for me. Another option would be to always store the last key used by the surrogate key generator in a sequential. Use a routine to read the sequential file and pass it as a parameter to your job. The surrogate key generator then uses this parameter + 1 as the starting point for generating new keys.

It's not that I don't understand how to use it, I just don't like having to do so much work for something that I believe a surrogate key generator should do out of the box. After reading Arnd's post I think I would have to agree that writing a custom routine would be the best way to go.

Thanks,
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The server (SDK) key management routines use persistent storage (a hashed file called SDKSequences) to store the next value. These routines can be thwarted by any non-DataStage process adding new surrogate key values to the target table without updating SDKSequences. If you were to implement something similar in parallel jobs you would still need persistent storage that could guarantee a stream of unique key values irrespective of the number of processing nodes (you want to preserve scalability, right?). There are two @PARTITION... system variables that will assist in generating the key values - read the initial value from your persistent storage (a sequential file will do) and save the final value back into the persistent storage. This solution mimics the server solution, while remaining vulnerable to non-DataStage updates of the target table.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply