Parallel Tranformer & Surrogate key stages

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
srimitta
Premium Member
Premium Member
Posts: 187
Joined: Sun Apr 04, 2004 7:50 pm

Parallel Tranformer & Surrogate key stages

Post by srimitta »

Hi All,
When Transformer Stage or Surrogate Key Generator Stage is used to generate sequence numbers, first time you run it will generate sequnce numbers and stores the last value (in internal format) in a file if source type is Flat File and when you run DataStage job second time, third time and soforth the Transformer Stage and Surrogate Key Generator Stage reads stored value from file and keep-on increment sequnce numbers and writes back last value to file, this iteration follows whenever you run your job.

1). What happens if you let Transformer Stage or Surrogate Key Generator Stage to generate values and job aborts after processing some records, does DataStage stores last value generated from the current abort or does it rolls-back to last values which was stored from last sucessful run.

2). Can we let Transformer Stage or Surrogate Key Generator Stage generate sequnce number and from second run on-wards, is this right approach or letting Transformer Stage or Surrogate Key Generator Stage to generate numbers and don't refresh file by any other means, If this is not right approach, what would be the right approach to pass max vlue from warehouse table other than reading value in the file from sequncer job and passing it as a parameter as initial vaue to Transformer Stage or Surrogate Key Generator Stage.

Thanks
srimitta
Quality is never an accident; it is always the result of high intention, sincere effort, intelligent direction and skillful execution; it represents the wise choice of many alternatives.
By William A.Foster
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

1) Nothing is stored automatically.

2) The surrogates will start with whatever value you tell it to, the default starting key value is zero. Pass it in as a job parameter.
what would be the right approach to pass max vlue from warehouse table other than reading value in the file from sequncer job and passing it as a parameter as initial vaue to Transformer Stage or Surrogate Key Generator Stage.
Why 'other than'? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
srimitta
Premium Member
Premium Member
Posts: 187
Joined: Sun Apr 04, 2004 7:50 pm

Post by srimitta »

Thanks Craig,
  • 1) Nothing is stored automatically.
I didn't get you :? .
Quality is never an accident; it is always the result of high intention, sincere effort, intelligent direction and skillful execution; it represents the wise choice of many alternatives.
By William A.Foster
srimitta
Premium Member
Premium Member
Posts: 187
Joined: Sun Apr 04, 2004 7:50 pm

Post by srimitta »

Thanks Craig,
  • 1) Nothing is stored automatically.
I didn't get you :? .
Quality is never an accident; it is always the result of high intention, sincere effort, intelligent direction and skillful execution; it represents the wise choice of many alternatives.
By William A.Foster
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You asked "does DataStage stores last value generated from the current abort" and the short answer is "no". Meaning nothing about your surrogate keys are stored by DataStage. Every time the job runs, you need to tell it where to start numbering regardless of the outcome of the previous run.
-craig

"You can never have too many knives" -- Logan Nine Fingers
srimitta
Premium Member
Premium Member
Posts: 187
Joined: Sun Apr 04, 2004 7:50 pm

Post by srimitta »

Meaning nothing about your surrogate keys are stored by DataStage. Every time the job runs, you need to tell it where to start numbering
1. Not really you need to supply Initial value to start from.
2.Leave blank and run first time it starts value from 1 and stores last value in a file (path & file name have to supply to the stage if chose Flat File option) in un-readable format.
3.Second time you run the job (with out supplying intial value), surrogate key generator or Transformer stage start incrementing from last max value.
4.If you try to refresh the file and run job, the stage will start from incrementing from value 1.

We ran & tested job several times and observed above pattern.

What I noticed is:
1.Let surrogate key generator or Transformer stage manage generating sequnce numbers and don't mess (refresh) with file.
or
2).Pass initial value as parameter to the stage from a Sequncer job to start increment value from where you want.

Still my question remains same, what happens if you let stage take care of generating sequnce numbers and job fails or aborts after processing some records to output?

Thanks
srimitta
Quality is never an accident; it is always the result of high intention, sincere effort, intelligent direction and skillful execution; it represents the wise choice of many alternatives.
By William A.Foster
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

That behaviour must be new with the 8.x release, at least for the Surrogate Key stage. I don't see how you can claim the same behaviour with the Transformer, but whatever. :?

Why not answer the question yourself? Arrange a test job with the appropriate stage to abort part way through and observe the behaviour. Let us know.
-craig

"You can never have too many knives" -- Logan Nine Fingers
rleishman
Premium Member
Premium Member
Posts: 252
Joined: Mon Sep 19, 2005 10:28 pm
Location: Melbourne, Australia
Contact:

Post by rleishman »

The functionality of the Surrogate Key Stage is duplicated in a new "Surrogate Key" tab in the Parallel Transformer. It does nothing extra or special, it just permits you to have one less Stage on the design pane.

My understanding of the Surrogate Key generator is that it allocates irreversible blocks of values to each calling stage; when that block runs out, it gets another range. Values are never re-used or rolled back after a failure.

Be careful using the defaults for the size of block to allocate. I found that when using the allocator buil into the Parallel Transformer, it was allocating in blocks of 1 - lots of I/O!

I also found them to be kind of flakey. I was getting a lot of unexplained jobs crashes until I stopped using SK Generators.
Ross Leishman
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

rleishman wrote:The functionality of the Surrogate Key Stage is duplicated in a new "Surrogate Key" tab in the Parallel Transformer. It does nothing extra or special, it just permits you to have one less Stage on the design pane.
Interesting, thanks for clarifying that.
-craig

"You can never have too many knives" -- Logan Nine Fingers
vdr123
Participant
Posts: 65
Joined: Fri Nov 14, 2003 9:23 am

Post by vdr123 »

It allocates 1000 SK's for each partition.

Will there ever be a case when they overlap or will it keep track of the blocks (1000 SK's)?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I can feel an FAQ coming on.

In version 8.0 the last value can be stored, in a beast called a "state file". This is used in conjunction with a new variation on the Surrogate Key Generator stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply