Hash file creation & update clarification

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Hash file creation & update clarification

Post by PhilHibbs »

I'd just like to confirm a couple of assumptions about hash files.

If I have a single hash stage with an input and an output and the same hash name, the job first loads the hash file from the input link and then allows the reference link to be invoked.

If I have two hash file stages, with the same hash file name, and one is an output from a transformer and the other is a reference link to the same or a different transformer, the accessing of the reference link will not be delayed and both instances of the hash file will be accessed and updated simultaneously.

I can't find any documentation of this behaviour but is it true to say "If a single hash file stage is used as both an input and an output, then the output(s) will be disabled until the input(s) finish loading"?

Supplemental question: How do you create an empty hash file? My colleague has got round this by dangling an output link to a hash on a previous Transformer with a constraint of 0, but he's complaining that three copies of the hash file stage is excessive, and that it needs a Transformer that may serve no other purpose. Is there a way to omit this extra hash stage in a DataStage job?
Phil Hibbs | Capgemini
Technical Consultant
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Phil,

the hashed file stage does work as you described. Regarding the creation of an empty hashed file in this job, why don't you just click on the attributes in the hashed file stage to have it create the file for you?

Although a before-job routine or the job control tab could be used to execute the CREATE.FILE command, the better way would be to actually use the method your colleague has already implemented.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Since the order of processing is basically read a row from the input, do all references, then write all outputs, all references must exist prior to any output.

When a job starts, if the optimizer sees that some active stages (the second set) need to wait on passive references to be populated by other active stages (the first set), those second set active stage targets are not created/opened until the active stage starts. Therefore if a first set active stage references the target of a second set active stage, that target may not yet exist (create table/hashfile won't happen until the input link starts).

So, you dangle an output with an @FALSE constraint upstream of the reference active stage and that stage does the create.

Seems like a mess but that's the way it works. The cleaner approach could be to create a single-purpose job that runs ahead of this one to just create the hash file. But, there's no easier graphical solution. The before/after create hash file command is a worthy option.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Bala R
Participant
Posts: 66
Joined: Mon May 30, 2005 9:52 pm

Post by Bala R »

Think you could not have a single hashstage with the same hash file for read and write. It would not compile.
We have some couple of jobs that we have just to create a empty hashfile.

Code: Select all

        
SourceDB --- > xfm ---> HashStage
where the output from the sourcedb is set to return no rows.. like where 1 = 2.
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

ArndW wrote:Regarding the creation of an empty hashed file in this job, why don't you just click on the attributes in the hashed file stage to have it create the file for you?
Because the job fails, the hash does not exist when it tries to reference the hash before the first row is output.
Bala R wrote:Think you could not have a single hashstage with the same hash file for read and write. It would not compile.
You can have the same hash stage with the same hash file for read and write. You can't have the links coming from and going to the same data stream, though, as I understand it.
Phil Hibbs | Capgemini
Technical Consultant
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you validate the job this will bring the hashed file into existence. You may need temporarily to change the hashed file name in the stage that reads from it.

Or you can execute the CREATE.FILE or CREATE TABLE or mkdbfile command in a before-job or before-stage subroutine.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

I have discovered a better way to solve this situation. Add a dummy transformer that has a link going in to the Hashed File stage that is being used as the lookup. Set this dummy transformer up as a single row generator - @OUTROWNUM=1 as the constraint, and a Stage Variable that is not used but only exists so that a null program gets generated. This will cause the hashed file to be created prior to the main flow of the job execution - all in one place, no untidy initialiser job.

Code: Select all

     X---#
         :
 ------->X---------->
I'll probably post an image tomorrow.
Phil Hibbs | Capgemini
Technical Consultant
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

We've documented this technique before here. And all you need for the constraint is @FALSE as no rows need to be processed simply to create or clear the existing hashed file, the link just needs to be 'opened'. 8)
-craig

"You can never have too many knives" -- Logan Nine Fingers
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

Ah, of course, we want no rows generated in this context, not one row! Also, I hadn't seen this no-input transform used to create a hashed file in the same job that reads & updates it.
Phil Hibbs | Capgemini
Technical Consultant
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Stop by my house some time. :wink:

I think I found the earliest mention that I recall making here when discussing generating test data. It was then extended to solve this situation where you read and update a hashed file in the same job... and yah it's quite cool when the light-bulb goes off in your head and you make it work for the first time.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply