Writing to Hash file

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
nsm
Premium Member
Premium Member
Posts: 139
Joined: Mon Feb 09, 2004 8:58 am

Writing to Hash file

Post by nsm »

Hi,

Q1)I am working with Flat Files which can have duplicate rows.
So,what I am doing is I am writing the record to the hash file and to the Oracle at the same time.

In Oracle I am doing Commit after all the records in the file are processed, what happened is my job failed after processing 5602 records in a file because its broken some how.

so , It didn't write any records to the database but , it did to the Hash File.

Next time when I did the full load it didn't write some records because they are already in Hash File.

Is there any way to prevent this(except committing after each record)?
I mean I wanted to write the records that are written to the database only.


Q2)
I want to delete a file once after it has been processed by datastage(once the job is finished),is it possible in DataStage?

nsm.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Hash file writes are immediate, unless you've got Write Caching turned on when it may take a moment or two to actually make it to disk. In any case, there is no concept analagous to "commit levels" with hash files.

I don't really understand how you are using this hash. When you said:
Next time when I did the full load it didn't write some records because they are already in Hash File.
Does this mean you are trying to use the information in the hash for restarting the job stream from where it left off? This doesn't really seem to mix well with an all-or-nothing commit strategy. If it fails, start over from the beginning. If you want to do intermediate commits, use something like a MOD function in a constraint so that you only write a record out when a commit happens, not for every row.
I want to delete a file once after it has been processed by datastage(once the job is finished)
No, you don't. Trust me. :) Rename it, move it to another directory, archive it in some fashion but do not remove it. Use an after job subroutine of ExecSH to call the command directly or write a shell script and run it from there.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Write Cache will not overwrite a hash record so the first record in is the final record only on duplicate keys. Why not use the hash file to update Oracle? That way you do not update duplicate.

You can have the hash file deleted and recreated at the start of the job. There is a check box to do that. It is easier to do that than clear it.
Mamu Kim
raju_chvr
Premium Member
Premium Member
Posts: 165
Joined: Sat Sep 27, 2003 9:19 am
Location: USA

Post by raju_chvr »

check the box: 'Clear before writing'. In this case the Hash file is totally cleared and built from the start.

This is a safe option than having a way to delete the file at the end of the job. Unless you have hard disk space issues you don't want to delete a hash file after the job.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

kduke wrote:Write Cache will not overwrite a hash record so the first record in is the final record only on duplicate keys.
Going back in time a little, but this came up in conversation this morning...

This statement concerned me, so I wrote a little test case to see if it was really the case that overwrites don't happen with write caching turned on. It does still happen, meaning regardless of the cache setting I always get 'last in' survivorship.

Is this old behaviour that perhaps was 'fixed' or changed in more recent releases? I did my testing under 7.0.1, for what it's worth.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

chulett wrote:
kduke wrote:Write Cache will not overwrite a hash record so the first record in is the final record only on duplicate keys.
Going back in time a little, but this came up in conversation this morning...

This statement concerned me, so I wrote a little test case to see if it was really the case that overwrites don't happen with write caching turned on. It does still happen, meaning regardless of the cache setting I always get 'last in' survivorship.

Is this old behaviour that perhaps was 'fixed' or changed in more recent releases? I did my testing under 7.0.1, for what it's worth.
Back in DS 4 days there was a bug that surfaced on large hash files using write delayed caching if there were duplicates of the primary key. Under what appeared to be random occurrences the last-in row was not the row found in the hash file. I called in this bug and it took a lot of validation before engineering reproduced it. I believe it was fixed under 4 and has gone away on subsequent releases.

Other than that, the expected functionality is for the last-in row for a given primary key, no matter the caching, should be the final row in the hash file.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Thanks Ken.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

I know that bug was in release 5 as well. I have not checked it in 6 or 7.
Mamu Kim
Post Reply