Page 1 of 1

Parallel Tranformer & Surrogate key stages

Posted: Sun Dec 09, 2007 7:31 am
by srimitta
Hi All,
When Transformer Stage or Surrogate Key Generator Stage is used to generate sequence numbers, first time you run it will generate sequnce numbers and stores the last value (in internal format) in a file if source type is Flat File and when you run DataStage job second time, third time and soforth the Transformer Stage and Surrogate Key Generator Stage reads stored value from file and keep-on increment sequnce numbers and writes back last value to file, this iteration follows whenever you run your job.

1). What happens if you let Transformer Stage or Surrogate Key Generator Stage to generate values and job aborts after processing some records, does DataStage stores last value generated from the current abort or does it rolls-back to last values which was stored from last sucessful run.

2). Can we let Transformer Stage or Surrogate Key Generator Stage generate sequnce number and from second run on-wards, is this right approach or letting Transformer Stage or Surrogate Key Generator Stage to generate numbers and don't refresh file by any other means, If this is not right approach, what would be the right approach to pass max vlue from warehouse table other than reading value in the file from sequncer job and passing it as a parameter as initial vaue to Transformer Stage or Surrogate Key Generator Stage.

Thanks
srimitta

Posted: Sun Dec 09, 2007 7:48 am
by chulett
1) Nothing is stored automatically.

2) The surrogates will start with whatever value you tell it to, the default starting key value is zero. Pass it in as a job parameter.
what would be the right approach to pass max vlue from warehouse table other than reading value in the file from sequncer job and passing it as a parameter as initial vaue to Transformer Stage or Surrogate Key Generator Stage.
Why 'other than'? :?

Posted: Sun Dec 09, 2007 7:56 am
by srimitta
Thanks Craig,
  • 1) Nothing is stored automatically.
I didn't get you :? .

Posted: Sun Dec 09, 2007 7:56 am
by srimitta
Thanks Craig,
  • 1) Nothing is stored automatically.
I didn't get you :? .

Posted: Sun Dec 09, 2007 8:33 am
by chulett
You asked "does DataStage stores last value generated from the current abort" and the short answer is "no". Meaning nothing about your surrogate keys are stored by DataStage. Every time the job runs, you need to tell it where to start numbering regardless of the outcome of the previous run.

Posted: Sun Dec 09, 2007 7:14 pm
by srimitta
Meaning nothing about your surrogate keys are stored by DataStage. Every time the job runs, you need to tell it where to start numbering
1. Not really you need to supply Initial value to start from.
2.Leave blank and run first time it starts value from 1 and stores last value in a file (path & file name have to supply to the stage if chose Flat File option) in un-readable format.
3.Second time you run the job (with out supplying intial value), surrogate key generator or Transformer stage start incrementing from last max value.
4.If you try to refresh the file and run job, the stage will start from incrementing from value 1.

We ran & tested job several times and observed above pattern.

What I noticed is:
1.Let surrogate key generator or Transformer stage manage generating sequnce numbers and don't mess (refresh) with file.
or
2).Pass initial value as parameter to the stage from a Sequncer job to start increment value from where you want.

Still my question remains same, what happens if you let stage take care of generating sequnce numbers and job fails or aborts after processing some records to output?

Thanks
srimitta

Posted: Sun Dec 09, 2007 10:52 pm
by chulett
That behaviour must be new with the 8.x release, at least for the Surrogate Key stage. I don't see how you can claim the same behaviour with the Transformer, but whatever. :?

Why not answer the question yourself? Arrange a test job with the appropriate stage to abort part way through and observe the behaviour. Let us know.

Posted: Mon Dec 10, 2007 6:04 am
by rleishman
The functionality of the Surrogate Key Stage is duplicated in a new "Surrogate Key" tab in the Parallel Transformer. It does nothing extra or special, it just permits you to have one less Stage on the design pane.

My understanding of the Surrogate Key generator is that it allocates irreversible blocks of values to each calling stage; when that block runs out, it gets another range. Values are never re-used or rolled back after a failure.

Be careful using the defaults for the size of block to allocate. I found that when using the allocator buil into the Parallel Transformer, it was allocating in blocks of 1 - lots of I/O!

I also found them to be kind of flakey. I was getting a lot of unexplained jobs crashes until I stopped using SK Generators.

Posted: Mon Dec 10, 2007 8:24 am
by chulett
rleishman wrote:The functionality of the Surrogate Key Stage is duplicated in a new "Surrogate Key" tab in the Parallel Transformer. It does nothing extra or special, it just permits you to have one less Stage on the design pane.
Interesting, thanks for clarifying that.

Posted: Tue Apr 29, 2008 6:30 pm
by vdr123
It allocates 1000 SK's for each partition.

Will there ever be a case when they overlap or will it keep track of the blocks (1000 SK's)?

Posted: Tue Apr 29, 2008 8:10 pm
by ray.wurlod
I can feel an FAQ coming on.

In version 8.0 the last value can be stored, in a beast called a "state file". This is used in conjunction with a new variation on the Surrogate Key Generator stage.