Delete data from hash files

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Never tried that way. :wink:

We've taken two different approaches: One is to rebuild the hashed file such that the deleted rows are automagically removed. Another approach is to add a 'deleted' column and mark them as logically deleted by updating that indicator during the delete process. Your jobs then need to know that a hashed file hit is only a hit if that column isn't set.

Your mileage may vary.
-craig

"You can never have too many knives" -- Logan Nine Fingers
jozef
Participant
Posts: 7
Joined: Thu Jun 10, 2004 5:46 am
Contact:

Post by jozef »

chulett wrote:Never tried that way. :wink:

We've taken two different approaches: One is to rebuild the hashed file such that the deleted rows are automagically removed. Another approach is to add a 'deleted' column and mark them as logically deleted by updating that indicator during the delete process. Your jobs then need to know that a hashed file hit is only a hit if that column isn't set.

Your mileage may vary.
Hi Craig,

Thanks, I'm currently doing the rebuild function, but the amount of data grew from 3 mil records to 8mil plus. Causing my job to run an extra 2 hours.

The hash file normally stays the same with a few updates, inserts and some records that needs to be deleted. Rebuilding the hash file for 20 000 records seems to be a bit over the top, hence the search for a better solution.

Regards
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

You could always stream the data to an output link using a function in the constraint that isn't really a constraint but a does a delete on the hash file from within the function.

It's quite easy actually, just open the hash file on the first row passed and put into a COMMON file handle variable, then using the passed key issue a DELETE ON yourfilehandle, key statement ELSE ... statement. It's not pretty but it saves having to use the UV/ODBC stage and it's a whole lot faster.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Jozef,

Using SQL in your case to delete a couple of records is going to be very inefficient. You could write a routine used in a derivation to delete records. Here is an example (most of the error handling skipped) that is tuned for efficiency but only handles one HashedFile.

Code: Select all

Routine DeleteHashedRecord(Filename,Key)
EQUATE ProgramName TO 'DeleteHashedRecord'
COMMON/DeleteCommon/Initialized,FilePtr

   IF NOT(Initialized)
   THEN
      OPEN Filename TO FilePtr ON ERROR CALL DSLogFatal('Error, Unable to open "':Filename:'" hashed file with error ':STATUS(),ProgramName)
                                           ELSE  CALL DSLogFatal('Error, Unable to open "':Filename:'" hashed file with error ':STATUS(),ProgramName)
      Initialized = 1
   END ;** of if this is the first call to the routine in this session

   Ans = 0
   DELETE Key FROM FilePtr ELSE Ans =1
Post Reply