Generating Surrogate keys under PX

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
MAT
Participant
Posts: 65
Joined: Wed Mar 05, 2003 8:44 am
Location: Montréal, Canada

Generating Surrogate keys under PX

Post by MAT »

Hi,

The project I am working on will integrate data coming from various sources not necessarily compatible with each other. For that reason, we need to associate every record we handle with a unique ID number. Since we are using PX, we can't access sdk functions like keymgt inside a job transformer. Therefore, I send my data through a column generator which adds a number to every row starting from 1 to x. I then pass the data inside a transformer adding to the ID column the current ID number (CurID) defined inside am evironment variable. The resulting ID's are something like 1+CurID to x+CurID. Finally, I add x to the CurID and move on the next incoming data file. I am encountering troubles using some functions provided with PX so I was wondering if any of you ever had to design something similar and if yes what was your design, I am afraid mine may be a little complicated. I need to use PX because I am handling millions of rows and can't wait ten times more for a server job.

Thanks for your thought.

MAT
bigpoppa
Participant
Posts: 190
Joined: Fri Feb 28, 2003 11:39 am

Post by bigpoppa »

MAT,

You are doing three things:

1. Using column generator (sequentially? or in parallel? Parallel will not work) to generate a unique row id.

2. Using a transformer to add the ID column the current ID number

3. Adding x to the CurID and move on the next incoming data file.

Which part specifically is not working?

Thanks,
BP
MAT
Participant
Posts: 65
Joined: Wed Mar 05, 2003 8:44 am
Location: Montréal, Canada

Post by MAT »

Hi BP,

Part 2 is not working [:(]. I think because of the type conversion function StringToDecimal that we discussed yesterday. I could not find the rtype options in my doc and it appears it was not included. Ascential sent me the function documentation which should be included in a future release of the PX developper's guide. Even using the right codes, my transformer does not compile. I opened a ticket with ascential about my problem and they should come back to me shortly.

My original question was in a more general sense tough. I was wondering if there was another way to associate records with a unique ID number under PX. I am not convinced that my design is well thought. If you (or anyone else...let's not discuss alone[:)]) have any new ideas about it, I'll be glad to hear them.

Thanks

MAT
bigpoppa
Participant
Posts: 190
Joined: Fri Feb 28, 2003 11:39 am

Post by bigpoppa »

Well, I guess it WOULD be helpful if ASC provided the full documentation for PX. [;)] Maybe you could post the rtypes for those that do not have the upcoming manual.

If I were doing what you have to do, I would use buildop. Buildop would allow you to keep your data in parallel. You want to avoid unnecessary sequential bottlenecks in PX jobs.

For part 1, see my posting ("did you know? You can create a unique index in parallel") on dsx->forums->PX. Although the posting gives you a way to create a surrogate key in transformer, you can use a similar technique in buildop.

For part 2, you have several options. You can store the CurId in a file and read/write it without having to convert it. Or you can write your own StrToDec function for the time being. Include it in a header file used by the buildop and use it to convert your CurId. (BTW, why is CurId a string to begin with?)

For part 3, pass back the updated CurID value to an env var(?) or store it back to the CurId file.

You can reuse a buildops within the same job.

- BP
MAT
Participant
Posts: 65
Joined: Wed Mar 05, 2003 8:44 am
Location: Montréal, Canada

Post by MAT »

Hi BP,

Thanks for the infos, I'll try what you suggest about using buildop. I will also post the rtypes in my topic related to the StringToDecimal function.

Also, to answer your question "why is CurId a string to begin with?". It is because it is an environment variables therefore string is the only possible type.

Regards

MAT
Post Reply