Sequence Generator

tunde · Post by **tunde** » Wed Jun 18, 2003 7:55 am

Can someone help me out on this. i want to assign surrogate keys ( unique sequence keys ) to my table. How do i go about this. Ds does not seem to have a sequence generator within a parallel Job.

Again, what is the difference btw a parallel job and a server job and in what scenerios can they be used?

Tunde

tonystark622 · Post by **tonystark622** » Wed Jun 18, 2003 9:05 am

tunde,

There is a transform under TransformsSDKKeyMgt called KeyMgtGetNextValueConcurrent

I can't see why this wouldn't work on a parallel job.

Good Luck,
Tony

ray.wurlod · Post by **ray.wurlod** » Wed Jun 18, 2003 6:00 pm

Sometime soon (as soon as we get legal clearance from ASCL) a White Paper on the differences between server jobs and parallel jobs will be published on the DataStageXChange (www.datastagexchange.com).
The main difference is the execution environment. Server jobs execute typically as single streams in single sets of processes on the server. Parallel jobs take advantage of Parallel Extender (the name ASCL gave to the Orchestrate technology they acquired with Torrent Systems last year) to allow simultaneous execution on multiple "processing nodes" - multiple CPUs in an SMP environment, or multiple notes in an MPP/cluser environment. Absolutely huge volumes of data can be processed in this way (see ASCL web site for benchmark information).

Not sure about what you mean in your assertion that "Ds does not seem to have a sequence generator within a parallel job".
You can use the Row Generator stage to generate rows, you still have access to system variables @INROWNUM and @OUTROWNUM for column derivation and output constraint expressions.
If you mean a job sequence generator, this is neither part of a server job nor a parallel job; it's a third kind of job (you can use File > New > Job Sequencer. A job sequencer can start either server or mainframe jobs, or even a mix of the two.

Transforms are not available in parallel jobs, since Transforms encapuslate BASIC expressions, and parallel jobs do not execute in a BASIC environment. Routines similarly are unavailable in parallel jobs, for the same reason. A different set of functions than those used in server jobs is provided for use in parallel jobs.

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518

nelc · Post by **nelc** » Thu Jun 19, 2003 12:02 am

Another add-on query relating to this thread:
@INROWNUM and @OUTROWNUM is with respect to the total number of the source record. Is there any way to group the sequence number such that it restarts with every new ID?

Example:
ID SEQ
1 1
1 2
1 3
2 1
2 2
2 3

Any help is much appreciated! [:)]

ray.wurlod · Post by **ray.wurlod** » Thu Jun 19, 2003 5:43 am

No, it's a strict sequence number counting the rows coming in on the input link or going out on an output link.
Why do you want to re-start the count? This is not the strict definintion of surrogate key (which only provides uniqueness, and has no inherent meaning).
You can do it, but instead of using system variables you use stage variables in a Transformer stage. One variable for the counter (initialize to zero), one to detect the change in whatever it is you're using for "ID", perhaps a third that is true if the ID value has changed, false otherwise. This assumes that the input data to the Transformer stage are sorted, which is easy to do in parallel jobs.
Presumably your data are partitioned on the same "ID" column, so that uniqueness is guaranteed across multiple data sets being processed.
Another white paper, this time on stage variables, is ready for publication on DSX; it, too, awaits legal clearance from Ascential.

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518

MAT · Post by **MAT** » Thu Jun 19, 2003 7:13 am

Hi Tunde,

PX has a stage called Column Generator, we used it for generating our surrogate keys. You can even make it assign keys in parallel. Since you can't use a routine, this was the best we found.

MAT

ray.wurlod · Post by **ray.wurlod** » Thu Jun 19, 2003 6:14 pm

Ah, learned something today!

(I'm still in the process of learning all the things that parallel jobs can do.)

nelc · Post by **nelc** » Tue Jun 24, 2003 1:17 am

Thanks Ray for the advice on the group seq generator.

I tried what you suggested by implementing the following stage variables:

Var Derivation
CheckID RowProcCompareWithPreviousValue(input.ID)
Seq If (CheckID) then (Seq+1) else 0

However I still get errors with unique key constraints as this seq no and ID is a composite key of the table.
Is there problem with my derivation column? Is the above implementation as what you had explained?

Thanks again! [:)]

ray.wurlod · Post by **ray.wurlod** » Tue Jun 24, 2003 5:35 am

Your Seq is generating lots of zeroes.
Should its derivation have been If CheckID Then (Seq + 1) Else Seq ?