Surrogate Key Generator, prefixing Surrogate Key

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
fmou
Participant
Posts: 124
Joined: Sat May 28, 2011 9:48 pm

Surrogate Key Generator, prefixing Surrogate Key

Post by fmou »

Hi,

How can I prefix the Surrogate Key to an existing column instead of adding another column in Surrogate Key Generator? I.e., I want the output field to be

Code: Select all

SK : "," : RecIn 
but seems to me that the output field is not editable in Surrogate Key Generator.

Reason behind this request,

I want to define columns, data types, and other format options in the *parallel* Column Import stage, not the Sequential stage, which can improve performance because it separates parsing process from sequential read process.

However, I need to assign a row number in parallel using the Surrogate Key Generator to the record read sequentially.

Please help.

Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Not possible in the way you describe. But you can interpolate a Column Generator stage in the flow to generate a constant (a Cycle with only one Value).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
fmou
Participant
Posts: 124
Joined: Sat May 28, 2011 9:48 pm

Re: Surrogate Key Generator, prefixing Surrogate Key

Post by fmou »

fmou wrote:I want to define columns, data types, and other format options in the *parallel* Column Import stage, not the Sequential stage, which can improve performance because it separates parsing process from sequential read process.

However, I need to assign a row number in parallel using the Surrogate Key Generator to the record read sequentially.
Actually, the reason that I'm asking was,

from the book that I am reading,

IBM WebSphere DataStage Advanced Enterprise Edition (DX436), Draft - 5/19/2005, on page 280

It says,
Column Import Examples
...
Using a Sequential File stage (running sequentially):
If the incoming file does not have a way of uniquely identifying a
given row, it may be necessary to read the file sequentially, and
assign a row number in parallel using the Surrogate Key
Generator
So I guess that the above saying is wrong then.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

You can always use a transformer to concatenate the surrogate key with whichever column. However, if your intent is to just parse it back out with Column Import, don't even bother with the concatenation. Just let the surrogate key column pass through to the output of Column Import....no need to waste time and CPU cycles concatenating and parsing the same generated column.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
fmou
Participant
Posts: 124
Joined: Sat May 28, 2011 9:48 pm

Post by fmou »

jwiles wrote:. . . if your intent is to just parse it back out with Column Import, don't even bother with the concatenation. Just let the surrogate key column pass through to the output of Column Import....
Thanks for the reply,

Actually that's the first method that I tried, however, due to my limited knowledge to DS, I wasn't able to -- didn't know how to have Column Import handle two fields. Could you elaborate a bit please?

Thanks
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

simply map the surrogate key column from the input link to the output link (as you would in most other stages) on the Output link Mapping tab.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

That is a very old reference!

The way some things work has changed over the years.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
fmou
Participant
Posts: 124
Joined: Sat May 28, 2011 9:48 pm

Post by fmou »

Could you elaborate a bit more, Ray, please?

I guess the principle to define columns, data types, and other format options in the *parallel* Column Import stage, not the Sequential stage still hold, just the practice how to add the surrogate key has changed, right?

thanks
fmou
Participant
Posts: 124
Joined: Sat May 28, 2011 9:48 pm

Post by fmou »

jwiles wrote:simply map the surrogate key column from the input link to the output link (as you would in most other stages) on the Output link Mapping tab.
Oh, just figured out why you think that way. No that doesn't work for my case because I'm using schema file as the Column Method of the Column Import stage. I.e., my Output link Mapping was empty. Guess the only way to make it work is to switch to Explicit Column Method then.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Why not just have the Sequential File stage generate the row number? That would serve as the key you seek.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
fmou
Participant
Posts: 124
Joined: Sat May 28, 2011 9:48 pm

Post by fmou »

ray.wurlod wrote:Why not just have the Sequential File stage generate the row number? That would serve as the key you seek. ...
Ah, good idea, didn't know that.
Type: Output Column
Adds an unsigned BigInt column with the specified name to the output that contains the row number.
thanks
fmou
Participant
Posts: 124
Joined: Sat May 28, 2011 9:48 pm

Post by fmou »

ray.wurlod wrote:Why not just have the Sequential File stage generate the row number? That would serve as the key you seek. ...
Ops, spoke too soon. Actually, the limitation is at the Column Import stage.

I'm using the schema file as the Column Method of the Column Import stage. I.e., my Output link Mapping was empty. Any columns other than the importing one can not be passed along with the imported ones, correct?

thanks
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

You've asked your sequential file stage to create a record number column. Is it defined in the file stage's output metadata grid (in the stage GUI)?

Is that column defined in your input link metadata grid? If so, you should be able to drag and drop it in the output link mapping tab.

If not, it won't show up as a mappable input column in the stage GUI. Do you have runtime column propagation enabled? If so, you can manually add that row count column to the metadata grid so that you can access it through the GUI.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
Post Reply