Hash file creation & update clarification
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 1044
- Joined: Wed Sep 29, 2004 3:30 am
- Location: Nottingham, UK
- Contact:
Hash file creation & update clarification
I'd just like to confirm a couple of assumptions about hash files.
If I have a single hash stage with an input and an output and the same hash name, the job first loads the hash file from the input link and then allows the reference link to be invoked.
If I have two hash file stages, with the same hash file name, and one is an output from a transformer and the other is a reference link to the same or a different transformer, the accessing of the reference link will not be delayed and both instances of the hash file will be accessed and updated simultaneously.
I can't find any documentation of this behaviour but is it true to say "If a single hash file stage is used as both an input and an output, then the output(s) will be disabled until the input(s) finish loading"?
Supplemental question: How do you create an empty hash file? My colleague has got round this by dangling an output link to a hash on a previous Transformer with a constraint of 0, but he's complaining that three copies of the hash file stage is excessive, and that it needs a Transformer that may serve no other purpose. Is there a way to omit this extra hash stage in a DataStage job?
If I have a single hash stage with an input and an output and the same hash name, the job first loads the hash file from the input link and then allows the reference link to be invoked.
If I have two hash file stages, with the same hash file name, and one is an output from a transformer and the other is a reference link to the same or a different transformer, the accessing of the reference link will not be delayed and both instances of the hash file will be accessed and updated simultaneously.
I can't find any documentation of this behaviour but is it true to say "If a single hash file stage is used as both an input and an output, then the output(s) will be disabled until the input(s) finish loading"?
Supplemental question: How do you create an empty hash file? My colleague has got round this by dangling an output link to a hash on a previous Transformer with a constraint of 0, but he's complaining that three copies of the hash file stage is excessive, and that it needs a Transformer that may serve no other purpose. Is there a way to omit this extra hash stage in a DataStage job?
Phil Hibbs | Capgemini
Technical Consultant
Technical Consultant
Phil,
the hashed file stage does work as you described. Regarding the creation of an empty hashed file in this job, why don't you just click on the attributes in the hashed file stage to have it create the file for you?
Although a before-job routine or the job control tab could be used to execute the CREATE.FILE command, the better way would be to actually use the method your colleague has already implemented.
the hashed file stage does work as you described. Regarding the creation of an empty hashed file in this job, why don't you just click on the attributes in the hashed file stage to have it create the file for you?
Although a before-job routine or the job control tab could be used to execute the CREATE.FILE command, the better way would be to actually use the method your colleague has already implemented.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Since the order of processing is basically read a row from the input, do all references, then write all outputs, all references must exist prior to any output.
When a job starts, if the optimizer sees that some active stages (the second set) need to wait on passive references to be populated by other active stages (the first set), those second set active stage targets are not created/opened until the active stage starts. Therefore if a first set active stage references the target of a second set active stage, that target may not yet exist (create table/hashfile won't happen until the input link starts).
So, you dangle an output with an @FALSE constraint upstream of the reference active stage and that stage does the create.
Seems like a mess but that's the way it works. The cleaner approach could be to create a single-purpose job that runs ahead of this one to just create the hash file. But, there's no easier graphical solution. The before/after create hash file command is a worthy option.
When a job starts, if the optimizer sees that some active stages (the second set) need to wait on passive references to be populated by other active stages (the first set), those second set active stage targets are not created/opened until the active stage starts. Therefore if a first set active stage references the target of a second set active stage, that target may not yet exist (create table/hashfile won't happen until the input link starts).
So, you dangle an output with an @FALSE constraint upstream of the reference active stage and that stage does the create.
Seems like a mess but that's the way it works. The cleaner approach could be to create a single-purpose job that runs ahead of this one to just create the hash file. But, there's no easier graphical solution. The before/after create hash file command is a worthy option.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Think you could not have a single hashstage with the same hash file for read and write. It would not compile.
We have some couple of jobs that we have just to create a empty hashfile. where the output from the sourcedb is set to return no rows.. like where 1 = 2.
We have some couple of jobs that we have just to create a empty hashfile.
Code: Select all
SourceDB --- > xfm ---> HashStage
-
- Premium Member
- Posts: 1044
- Joined: Wed Sep 29, 2004 3:30 am
- Location: Nottingham, UK
- Contact:
Because the job fails, the hash does not exist when it tries to reference the hash before the first row is output.ArndW wrote:Regarding the creation of an empty hashed file in this job, why don't you just click on the attributes in the hashed file stage to have it create the file for you?
You can have the same hash stage with the same hash file for read and write. You can't have the links coming from and going to the same data stream, though, as I understand it.Bala R wrote:Think you could not have a single hashstage with the same hash file for read and write. It would not compile.
Phil Hibbs | Capgemini
Technical Consultant
Technical Consultant
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
If you validate the job this will bring the hashed file into existence. You may need temporarily to change the hashed file name in the stage that reads from it.
Or you can execute the CREATE.FILE or CREATE TABLE or mkdbfile command in a before-job or before-stage subroutine.
Or you can execute the CREATE.FILE or CREATE TABLE or mkdbfile command in a before-job or before-stage subroutine.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 1044
- Joined: Wed Sep 29, 2004 3:30 am
- Location: Nottingham, UK
- Contact:
I have discovered a better way to solve this situation. Add a dummy transformer that has a link going in to the Hashed File stage that is being used as the lookup. Set this dummy transformer up as a single row generator - @OUTROWNUM=1 as the constraint, and a Stage Variable that is not used but only exists so that a null program gets generated. This will cause the hashed file to be created prior to the main flow of the job execution - all in one place, no untidy initialiser job.
I'll probably post an image tomorrow.
Code: Select all
X---#
:
------->X---------->
Phil Hibbs | Capgemini
Technical Consultant
Technical Consultant
Stop by my house some time.
I think I found the earliest mention that I recall making here when discussing generating test data. It was then extended to solve this situation where you read and update a hashed file in the same job... and yah it's quite cool when the light-bulb goes off in your head and you make it work for the first time.
I think I found the earliest mention that I recall making here when discussing generating test data. It was then extended to solve this situation where you read and update a hashed file in the same job... and yah it's quite cool when the light-bulb goes off in your head and you make it work for the first time.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers