Flat File as intermediate vs Hash Table; which may be faster

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
lclapp
Premium Member
Premium Member
Posts: 21
Joined: Wed May 19, 2004 2:43 pm

Flat File as intermediate vs Hash Table; which may be faster

Post by lclapp »

I am reading and writing to the same mainframe/DB2 table using the DSDB2 stage. The only way I have been successful is by utilizing an intermediate file. I wondered if any users have tried using a hash table instead of a Flat file as in intermediate file. Is there any performance improvements using one vs the other?

leslie
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

lclapp,

normally the sequential file should be faster as there is much less processing going on in the background. The fastest method would be to not land the data on disk at all, going straight to the output stage writing to DB/2 --> what errors do you get when you try this?
lclapp
Premium Member
Premium Member
Posts: 21
Joined: Wed May 19, 2004 2:43 pm

Post by lclapp »

I agree with your reponse. I just walked into this account today and was told that was the reason for landing the data in the job. I will post some more information once I dig a bit deeper.

leslie
tcj
Premium Member
Premium Member
Posts: 98
Joined: Tue Sep 07, 2004 6:57 pm
Location: QLD, Australia
Contact:

Post by tcj »

lclapp wrote:I agree with your reponse. I just walked into this account today and was told that was the reason for landing the data in the job. I will post some more information once I dig a bit deeper.

leslie
Might have something to do with database record locks if you are reading and updating the same record in the same table. Just an idea.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

For staging, where you don't need individual key-based access, a sequential file will always be substantially faster both to write and to read than a hashed file.

The main reason for this is you can't sort the data by hashed file group number, since the hashing algorithms are not in the public domain. But even then the overheads (approx 14 bytes/record, and (100-SPLIT.LOAD) percent free space per group on average) in a hashed file mean that you end up writing a lot more anyway.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

ArndW wrote:The fastest method would be to not land the data on disk at all, going straight to the output stage writing to DB/2
You might be surprised how often that is not the case in a Server job. :wink:

Can't speak directly to DB2, but take an Oracle job with two OCI stages with a Transformer in between. Nothing fancy - read from one db, write to another. Split it into two jobs - one to land the data and a second to read from that flat file and write to the target - and it can run substantially faster. Not saying it absolutely will run faster, just that it may more often than you'd think.

Plus, you get the added bonus of having a static dataset to load, which can simplify any restart/recovery you may end up having to do.
-craig

"You can never have too many knives" -- Logan Nine Fingers
crouse
Charter Member
Charter Member
Posts: 204
Joined: Sun Oct 05, 2003 12:59 pm
Contact:

Post by crouse »

I agree with chulett

( plus, I just had to get a post in ) :wink:
Craig Rouse
Griffin Resouces, Inc
www.griffinresources.com
lclapp
Premium Member
Premium Member
Posts: 21
Joined: Wed May 19, 2004 2:43 pm

Flat File in between is better...???

Post by lclapp »

I am running a test as I type....The current 'best' appears to be to drop a FF between the read from DB2 and write to DB2. If I have say 5 million rows to process it will be faster to drop to a ff before beginning the load to DB2. I can't use the bulk loader because the DB2 is on the mainframe. At first blush I can't see the FF making things faster because I still have to write the 5 million records and then read them again....we will see.

Also the reason for my post turned into a bust. The Developers assumed that there would be problems reading and writing to the same DB2 table in the same job but never tested.

I will post the results.

leslie
crouse
Charter Member
Charter Member
Posts: 204
Joined: Sun Oct 05, 2003 12:59 pm
Contact:

Post by crouse »

How 'bout writing the FF, then FTP'ing or ConnectDirect'ing it to the MF to let the MF bulk load the file. Formating the FF on the server before transferring to the MF may be a bit of a hassle so that when it arrives on the MF it the correct format. But worth a look.
Craig Rouse
Griffin Resouces, Inc
www.griffinresources.com
Post Reply