Page 1 of 1

Ensuring hashed file is created when the job executes

Posted: Wed Mar 11, 2009 10:25 pm
by sbass1
Hi,

I've reviewed viewtopic.php?t=92114&highlight=CREATE+TABLE but still have this question...

I have a job like:

Code: Select all

DRS --> xfm --> DRS --> hashed_file (2)
         ^
         |
   hashed_file (1)
where hashed_file (1) and hashed_file (2) are the same physical file.

(sorry for the badly formatted ascii art, hashed_file (1) is a lookup for the xfm transformation)

hashed_file (1) is pre-loaded into memory
hashed_file (2) has create file / clear file ticked

This is all to support surrogate key generation, with code like:

If IsNull(Lookup_SKs.FooSK) Then KeyMgtGetNextValueConcurrent("source.foo") Else Lookup_SKs.FooSK

although I intend to change this to:

If (Lookup_SKs.NOTFOUND) Then ...

My issue is, after physical file cleanup, or migration to a new environment, hashed_file (1) does not exist, and the job aborts.

What is the best practice and syntax to implement the following pseudocode:

If (Directory Path): #$HashFileLocation# (File name): #DSStageName# does not exist, create an empty hashed file in #$HashFileLocation#/#DSStageName#.

This would then allow the job to run, generating new SK's for each input row, then update the hash.

The code should not generate an error if the hashed file already exists, should (hopefully) support the DS automatic variables noted above (I doubt this will be the case - would require a before routine to "peek into" the structure of the job), and should optionally allow me to write a warning to the log if the file does not exist.

I tried this command from the operating system, with the noted results:

/tmp > /opt2/ds/Ascential/DataStage/DSEngine/bin/create.file DATA TEST DYNAMIC
Unable to open VOC file.
*** Processing cannot continue. ***

Various permutations of this command resulted in the same error.

Thanks,
Scott

Posted: Wed Mar 11, 2009 10:39 pm
by ray.wurlod
Put your ASCII art inside code tags.

Best practice to ensure that a hashed file exists is to create a separate job to create it. This needs only a transformer stage and a hashed file stage.

Posted: Thu Mar 12, 2009 1:54 am
by sbass1
Our latest approach is:

Code: Select all

DRS (1) --> xfm --> DRS (2) --> hashed_file (2) 
             ^ 
             | 
DRS (2) -->hashed_file (1)
This:

1) Ensures hashed_file (1) always exists
2) Is always in sync with DRS (2) (the target table)
3) Ensures new SKs are derived correctly

Pros:
Ensures the hashed file is always in sync with the target table, such as after a database refresh (either truncating the table, or adding to the table, such as a refresh from UAT)

Cons:
Somewhat computationally inefficient, as it's unconditionally refreshing the hashed file when it might not need refreshing

I'm a bit ambivalent about this approach (suggested by a colleague) due to the cons above.

The reason we're using a hashed file instead of direct lookup from the target table is it's orders of magnitude faster doing the lookups.

Comments?

P.S.: I've seen no difference in behaviour whether I tick "Create file" or not. DS always seems to create the hashed file if it does not already exist.

Posted: Thu Mar 12, 2009 11:21 am
by chulett
sbass1 wrote:P.S.: I've seen no difference in behaviour whether I tick "Create file" or not. DS always seems to create the hashed file if it does not already exist.
Correct. Ticking 'Create File' just gives you the option to override the defaults used at creation time, plus access to the 'delete before' option. As you've found, it will always attempt to create the hashed file if it doesn't exist regardless of that setting.

Posted: Thu Mar 12, 2009 2:13 pm
by Mike
If you don't wish to unconditionally refresh the hashed file, you can replace your DRS (2) with a transformer stage having a stage variable and an output constraint of @FALSE. That will create the hashed file if it does not exist.

Mike

Posted: Thu Mar 12, 2009 3:28 pm
by ray.wurlod
Relying on auto-create will prove counterproductive if the job design reads from and writes to the same hashed file.

Posted: Sun Mar 15, 2009 11:03 pm
by sbass1
Mike wrote:If you don't wish to unconditionally refresh the hashed file, you can replace your DRS (2) with a transformer stage having a stage variable and an output constraint of @FALSE. That will create the hashed file if it does not exist.

Mike
Thanks Mike, slick trick. The hashed file never gets updated since the constraint is false, but gets created in accordance with the metadata of the link if it does not yet exist.

Posted: Sun Mar 15, 2009 11:05 pm
by sbass1
Edit/Delete post is only letting me edit, not delete the extra post caused by hitting back in my browser. So sorry for the extra post (this one).

Posted: Sun Mar 15, 2009 11:24 pm
by ray.wurlod
IOW ?

Isn't that the noise a cat makes?
:?

Please refrain from all but the most common English abbreviations on DSXchange. Many of our readers don't have English as their first language, and it's tough enough as it is, without having to decode the mental processes behind arcane abbreviations.

Posted: Mon Mar 16, 2009 7:21 am
by chulett
sbass1 wrote:Edit/Delete post is only letting me edit, not delete the extra post caused by hitting back in my browser. So sorry for the extra post (this one).
You can delete a post as long as no-one has replied to it yet, which is why I always check for duplicates right away if there's any delay in posting... delay = multiple postings.

Posted: Wed Aug 26, 2009 4:41 pm
by yserrano
sbass1 wrote:Our latest approach is:

Code: Select all

DRS (1) --> xfm --> DRS (2) --> hashed_file (2) 
             ^ 
             | 
DRS (2) -->hashed_file (1)
This:

1) Ensures hashed_file (1) always exists
2) Is always in sync with DRS (2) (the target table)
3) Ensures new SKs are derived correctly

Pros:
Ensures the hashed file is always in sync with the target table, such as after a database refresh (either truncating the table, or adding to the table, such as a refresh from UAT)

Cons:
Somewhat computationally inefficient, as it's unconditionally refreshing the hashed file when it might not need refreshing

I'm a bit ambivalent about this approach (suggested by a colleague) due to the cons above.

The reason we're using a hashed file instead of direct lookup from the target table is it's orders of magnitude faster doing the lookups.

Comments?

P.S.: I've seen no difference in behaviour whether I tick "Create file" or not. DS always seems to create the hashed file if it does not already exist.
We have used anothe rmethod very successfully to create surrogate keys, we created a transform routine that receives parameters and perform a max(SK) on the target table, the routne is used as the initial value for an stage variable that gets incremented for eache insert.

Posted: Wed Aug 26, 2009 4:42 pm
by yserrano
sbass1 wrote:Our latest approach is:

Code: Select all

DRS (1) --> xfm --> DRS (2) --> hashed_file (2) 
             ^ 
             | 
DRS (2) -->hashed_file (1)
This:

1) Ensures hashed_file (1) always exists
2) Is always in sync with DRS (2) (the target table)
3) Ensures new SKs are derived correctly

Pros:
Ensures the hashed file is always in sync with the target table, such as after a database refresh (either truncating the table, or adding to the table, such as a refresh from UAT)

Cons:
Somewhat computationally inefficient, as it's unconditionally refreshing the hashed file when it might not need refreshing

I'm a bit ambivalent about this approach (suggested by a colleague) due to the cons above.

The reason we're using a hashed file instead of direct lookup from the target table is it's orders of magnitude faster doing the lookups.

Comments?

P.S.: I've seen no difference in behaviour whether I tick "Create file" or not. DS always seems to create the hashed file if it does not already exist.
We have used anothe rmethod very successfully to create surrogate keys, we created a transform routine that receives parameters and perform a max(SK) on the target table, the routne is used as the initial value for an stage variable that gets incremented for eache insert.