Page 1 of 1

Generating Surrogate keys under PX

Posted: Tue Apr 22, 2003 7:44 am
by MAT
Hi,

The project I am working on will integrate data coming from various sources not necessarily compatible with each other. For that reason, we need to associate every record we handle with a unique ID number. Since we are using PX, we can't access sdk functions like keymgt inside a job transformer. Therefore, I send my data through a column generator which adds a number to every row starting from 1 to x. I then pass the data inside a transformer adding to the ID column the current ID number (CurID) defined inside am evironment variable. The resulting ID's are something like 1+CurID to x+CurID. Finally, I add x to the CurID and move on the next incoming data file. I am encountering troubles using some functions provided with PX so I was wondering if any of you ever had to design something similar and if yes what was your design, I am afraid mine may be a little complicated. I need to use PX because I am handling millions of rows and can't wait ten times more for a server job.

Thanks for your thought.

MAT

Posted: Tue Apr 22, 2003 10:16 am
by bigpoppa
MAT,

You are doing three things:

1. Using column generator (sequentially? or in parallel? Parallel will not work) to generate a unique row id.

2. Using a transformer to add the ID column the current ID number

3. Adding x to the CurID and move on the next incoming data file.

Which part specifically is not working?

Thanks,
BP

Posted: Tue Apr 22, 2003 10:25 am
by MAT
Hi BP,

Part 2 is not working [:(]. I think because of the type conversion function StringToDecimal that we discussed yesterday. I could not find the rtype options in my doc and it appears it was not included. Ascential sent me the function documentation which should be included in a future release of the PX developper's guide. Even using the right codes, my transformer does not compile. I opened a ticket with ascential about my problem and they should come back to me shortly.

My original question was in a more general sense tough. I was wondering if there was another way to associate records with a unique ID number under PX. I am not convinced that my design is well thought. If you (or anyone else...let's not discuss alone[:)]) have any new ideas about it, I'll be glad to hear them.

Thanks

MAT

Posted: Tue Apr 22, 2003 11:19 am
by bigpoppa
Well, I guess it WOULD be helpful if ASC provided the full documentation for PX. [;)] Maybe you could post the rtypes for those that do not have the upcoming manual.

If I were doing what you have to do, I would use buildop. Buildop would allow you to keep your data in parallel. You want to avoid unnecessary sequential bottlenecks in PX jobs.

For part 1, see my posting ("did you know? You can create a unique index in parallel") on dsx->forums->PX. Although the posting gives you a way to create a surrogate key in transformer, you can use a similar technique in buildop.

For part 2, you have several options. You can store the CurId in a file and read/write it without having to convert it. Or you can write your own StrToDec function for the time being. Include it in a header file used by the buildop and use it to convert your CurId. (BTW, why is CurId a string to begin with?)

For part 3, pass back the updated CurID value to an env var(?) or store it back to the CurId file.

You can reuse a buildops within the same job.

- BP

Posted: Tue Apr 22, 2003 11:23 am
by MAT
Hi BP,

Thanks for the infos, I'll try what you suggest about using buildop. I will also post the rtypes in my topic related to the StringToDecimal function.

Also, to answer your question "why is CurId a string to begin with?". It is because it is an environment variables therefore string is the only possible type.

Regards

MAT