Ensuring hashed file is created when the job executes

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
sbass1
Premium Member
Premium Member
Posts: 211
Joined: Wed Jan 28, 2009 9:00 pm
Location: Sydney, Australia

Ensuring hashed file is created when the job executes

Post by sbass1 »

Hi,

I've reviewed viewtopic.php?t=92114&highlight=CREATE+TABLE but still have this question...

I have a job like:

Code: Select all

DRS --> xfm --> DRS --> hashed_file (2)
         ^
         |
   hashed_file (1)
where hashed_file (1) and hashed_file (2) are the same physical file.

(sorry for the badly formatted ascii art, hashed_file (1) is a lookup for the xfm transformation)

hashed_file (1) is pre-loaded into memory
hashed_file (2) has create file / clear file ticked

This is all to support surrogate key generation, with code like:

If IsNull(Lookup_SKs.FooSK) Then KeyMgtGetNextValueConcurrent("source.foo") Else Lookup_SKs.FooSK

although I intend to change this to:

If (Lookup_SKs.NOTFOUND) Then ...

My issue is, after physical file cleanup, or migration to a new environment, hashed_file (1) does not exist, and the job aborts.

What is the best practice and syntax to implement the following pseudocode:

If (Directory Path): #$HashFileLocation# (File name): #DSStageName# does not exist, create an empty hashed file in #$HashFileLocation#/#DSStageName#.

This would then allow the job to run, generating new SK's for each input row, then update the hash.

The code should not generate an error if the hashed file already exists, should (hopefully) support the DS automatic variables noted above (I doubt this will be the case - would require a before routine to "peek into" the structure of the job), and should optionally allow me to write a warning to the log if the file does not exist.

I tried this command from the operating system, with the noted results:

/tmp > /opt2/ds/Ascential/DataStage/DSEngine/bin/create.file DATA TEST DYNAMIC
Unable to open VOC file.
*** Processing cannot continue. ***

Various permutations of this command resulted in the same error.

Thanks,
Scott
Last edited by sbass1 on Thu Mar 12, 2009 1:39 am, edited 1 time in total.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Put your ASCII art inside code tags.

Best practice to ensure that a hashed file exists is to create a separate job to create it. This needs only a transformer stage and a hashed file stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sbass1
Premium Member
Premium Member
Posts: 211
Joined: Wed Jan 28, 2009 9:00 pm
Location: Sydney, Australia

Post by sbass1 »

Our latest approach is:

Code: Select all

DRS (1) --> xfm --> DRS (2) --> hashed_file (2) 
             ^ 
             | 
DRS (2) -->hashed_file (1)
This:

1) Ensures hashed_file (1) always exists
2) Is always in sync with DRS (2) (the target table)
3) Ensures new SKs are derived correctly

Pros:
Ensures the hashed file is always in sync with the target table, such as after a database refresh (either truncating the table, or adding to the table, such as a refresh from UAT)

Cons:
Somewhat computationally inefficient, as it's unconditionally refreshing the hashed file when it might not need refreshing

I'm a bit ambivalent about this approach (suggested by a colleague) due to the cons above.

The reason we're using a hashed file instead of direct lookup from the target table is it's orders of magnitude faster doing the lookups.

Comments?

P.S.: I've seen no difference in behaviour whether I tick "Create file" or not. DS always seems to create the hashed file if it does not already exist.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

sbass1 wrote:P.S.: I've seen no difference in behaviour whether I tick "Create file" or not. DS always seems to create the hashed file if it does not already exist.
Correct. Ticking 'Create File' just gives you the option to override the defaults used at creation time, plus access to the 'delete before' option. As you've found, it will always attempt to create the hashed file if it doesn't exist regardless of that setting.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

If you don't wish to unconditionally refresh the hashed file, you can replace your DRS (2) with a transformer stage having a stage variable and an output constraint of @FALSE. That will create the hashed file if it does not exist.

Mike
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Relying on auto-create will prove counterproductive if the job design reads from and writes to the same hashed file.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sbass1
Premium Member
Premium Member
Posts: 211
Joined: Wed Jan 28, 2009 9:00 pm
Location: Sydney, Australia

Post by sbass1 »

Mike wrote:If you don't wish to unconditionally refresh the hashed file, you can replace your DRS (2) with a transformer stage having a stage variable and an output constraint of @FALSE. That will create the hashed file if it does not exist.

Mike
Thanks Mike, slick trick. The hashed file never gets updated since the constraint is false, but gets created in accordance with the metadata of the link if it does not yet exist.
Last edited by sbass1 on Mon Mar 16, 2009 12:42 am, edited 1 time in total.
sbass1
Premium Member
Premium Member
Posts: 211
Joined: Wed Jan 28, 2009 9:00 pm
Location: Sydney, Australia

Post by sbass1 »

Edit/Delete post is only letting me edit, not delete the extra post caused by hitting back in my browser. So sorry for the extra post (this one).
Last edited by sbass1 on Mon Mar 16, 2009 12:44 am, edited 1 time in total.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

IOW ?

Isn't that the noise a cat makes?
:?

Please refrain from all but the most common English abbreviations on DSXchange. Many of our readers don't have English as their first language, and it's tough enough as it is, without having to decode the mental processes behind arcane abbreviations.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

sbass1 wrote:Edit/Delete post is only letting me edit, not delete the extra post caused by hitting back in my browser. So sorry for the extra post (this one).
You can delete a post as long as no-one has replied to it yet, which is why I always check for duplicates right away if there's any delay in posting... delay = multiple postings.
-craig

"You can never have too many knives" -- Logan Nine Fingers
yserrano
Premium Member
Premium Member
Posts: 52
Joined: Thu Jan 31, 2008 1:23 pm
Location: Santo Domingo

Post by yserrano »

sbass1 wrote:Our latest approach is:

Code: Select all

DRS (1) --> xfm --> DRS (2) --> hashed_file (2) 
             ^ 
             | 
DRS (2) -->hashed_file (1)
This:

1) Ensures hashed_file (1) always exists
2) Is always in sync with DRS (2) (the target table)
3) Ensures new SKs are derived correctly

Pros:
Ensures the hashed file is always in sync with the target table, such as after a database refresh (either truncating the table, or adding to the table, such as a refresh from UAT)

Cons:
Somewhat computationally inefficient, as it's unconditionally refreshing the hashed file when it might not need refreshing

I'm a bit ambivalent about this approach (suggested by a colleague) due to the cons above.

The reason we're using a hashed file instead of direct lookup from the target table is it's orders of magnitude faster doing the lookups.

Comments?

P.S.: I've seen no difference in behaviour whether I tick "Create file" or not. DS always seems to create the hashed file if it does not already exist.
We have used anothe rmethod very successfully to create surrogate keys, we created a transform routine that receives parameters and perform a max(SK) on the target table, the routne is used as the initial value for an stage variable that gets incremented for eache insert.
yserrano
Premium Member
Premium Member
Posts: 52
Joined: Thu Jan 31, 2008 1:23 pm
Location: Santo Domingo

Post by yserrano »

sbass1 wrote:Our latest approach is:

Code: Select all

DRS (1) --> xfm --> DRS (2) --> hashed_file (2) 
             ^ 
             | 
DRS (2) -->hashed_file (1)
This:

1) Ensures hashed_file (1) always exists
2) Is always in sync with DRS (2) (the target table)
3) Ensures new SKs are derived correctly

Pros:
Ensures the hashed file is always in sync with the target table, such as after a database refresh (either truncating the table, or adding to the table, such as a refresh from UAT)

Cons:
Somewhat computationally inefficient, as it's unconditionally refreshing the hashed file when it might not need refreshing

I'm a bit ambivalent about this approach (suggested by a colleague) due to the cons above.

The reason we're using a hashed file instead of direct lookup from the target table is it's orders of magnitude faster doing the lookups.

Comments?

P.S.: I've seen no difference in behaviour whether I tick "Create file" or not. DS always seems to create the hashed file if it does not already exist.
We have used anothe rmethod very successfully to create surrogate keys, we created a transform routine that receives parameters and perform a max(SK) on the target table, the routne is used as the initial value for an stage variable that gets incremented for eache insert.
Post Reply