Page 1 of 1

Writing to Hash file

Posted: Mon Feb 09, 2004 1:10 pm
by nsm
Hi,

Q1)I am working with Flat Files which can have duplicate rows.
So,what I am doing is I am writing the record to the hash file and to the Oracle at the same time.

In Oracle I am doing Commit after all the records in the file are processed, what happened is my job failed after processing 5602 records in a file because its broken some how.

so , It didn't write any records to the database but , it did to the Hash File.

Next time when I did the full load it didn't write some records because they are already in Hash File.

Is there any way to prevent this(except committing after each record)?
I mean I wanted to write the records that are written to the database only.


Q2)
I want to delete a file once after it has been processed by datastage(once the job is finished),is it possible in DataStage?

nsm.

Posted: Mon Feb 09, 2004 1:40 pm
by chulett
Hash file writes are immediate, unless you've got Write Caching turned on when it may take a moment or two to actually make it to disk. In any case, there is no concept analagous to "commit levels" with hash files.

I don't really understand how you are using this hash. When you said:
Next time when I did the full load it didn't write some records because they are already in Hash File.
Does this mean you are trying to use the information in the hash for restarting the job stream from where it left off? This doesn't really seem to mix well with an all-or-nothing commit strategy. If it fails, start over from the beginning. If you want to do intermediate commits, use something like a MOD function in a constraint so that you only write a record out when a commit happens, not for every row.
I want to delete a file once after it has been processed by datastage(once the job is finished)
No, you don't. Trust me. :) Rename it, move it to another directory, archive it in some fashion but do not remove it. Use an after job subroutine of ExecSH to call the command directly or write a shell script and run it from there.

Posted: Mon Feb 09, 2004 1:49 pm
by kduke
Write Cache will not overwrite a hash record so the first record in is the final record only on duplicate keys. Why not use the hash file to update Oracle? That way you do not update duplicate.

You can have the hash file deleted and recreated at the start of the job. There is a check box to do that. It is easier to do that than clear it.

Posted: Mon Feb 09, 2004 2:02 pm
by raju_chvr
check the box: 'Clear before writing'. In this case the Hash file is totally cleared and built from the start.

This is a safe option than having a way to delete the file at the end of the job. Unless you have hard disk space issues you don't want to delete a hash file after the job.

Posted: Fri May 21, 2004 9:04 am
by chulett
kduke wrote:Write Cache will not overwrite a hash record so the first record in is the final record only on duplicate keys.
Going back in time a little, but this came up in conversation this morning...

This statement concerned me, so I wrote a little test case to see if it was really the case that overwrites don't happen with write caching turned on. It does still happen, meaning regardless of the cache setting I always get 'last in' survivorship.

Is this old behaviour that perhaps was 'fixed' or changed in more recent releases? I did my testing under 7.0.1, for what it's worth.

Posted: Fri May 21, 2004 9:26 am
by kcbland
chulett wrote:
kduke wrote:Write Cache will not overwrite a hash record so the first record in is the final record only on duplicate keys.
Going back in time a little, but this came up in conversation this morning...

This statement concerned me, so I wrote a little test case to see if it was really the case that overwrites don't happen with write caching turned on. It does still happen, meaning regardless of the cache setting I always get 'last in' survivorship.

Is this old behaviour that perhaps was 'fixed' or changed in more recent releases? I did my testing under 7.0.1, for what it's worth.
Back in DS 4 days there was a bug that surfaced on large hash files using write delayed caching if there were duplicates of the primary key. Under what appeared to be random occurrences the last-in row was not the row found in the hash file. I called in this bug and it took a lot of validation before engineering reproduced it. I believe it was fixed under 4 and has gone away on subsequent releases.

Other than that, the expected functionality is for the last-in row for a given primary key, no matter the caching, should be the final row in the hash file.

Posted: Fri May 21, 2004 9:30 am
by chulett
Thanks Ken.

Posted: Fri May 21, 2004 9:46 am
by kduke
I know that bug was in release 5 as well. I have not checked it in 6 or 7.