Flat File as intermediate vs Hash Table; which may be faster

lclapp · Post by **lclapp** » Mon Jul 25, 2005 9:10 am

I am reading and writing to the same mainframe/DB2 table using the DSDB2 stage. The only way I have been successful is by utilizing an intermediate file. I wondered if any users have tried using a hash table instead of a Flat file as in intermediate file. Is there any performance improvements using one vs the other?

leslie

ArndW · Post by **ArndW** » Mon Jul 25, 2005 9:17 am

lclapp,

normally the sequential file should be faster as there is much less processing going on in the background. The fastest method would be to not land the data on disk at all, going straight to the output stage writing to DB/2 --> what errors do you get when you try this?

lclapp · Post by **lclapp** » Mon Jul 25, 2005 9:40 am

I agree with your reponse. I just walked into this account today and was told that was the reason for landing the data in the job. I will post some more information once I dig a bit deeper.

leslie

tcj · Post by **tcj** » Mon Jul 25, 2005 7:31 pm

lclapp wrote:I agree with your reponse. I just walked into this account today and was told that was the reason for landing the data in the job. I will post some more information once I dig a bit deeper.

leslie

Might have something to do with database record locks if you are reading and updating the same record in the same table. Just an idea.

ray.wurlod · Post by **ray.wurlod** » Mon Jul 25, 2005 9:51 pm

For staging, where you don't need individual key-based access, a sequential file will always be substantially faster both to write and to read than a hashed file.

The main reason for this is you can't sort the data by hashed file group number, since the hashing algorithms are not in the public domain. But even then the overheads (approx 14 bytes/record, and (100-SPLIT.LOAD) percent free space per group on average) in a hashed file mean that you end up writing a lot more anyway.

chulett · Post by **chulett** » Mon Jul 25, 2005 10:09 pm

ArndW wrote:The fastest method would be to not land the data on disk at all, going straight to the output stage writing to DB/2

You might be surprised how often that is not the case in a Server job.

Can't speak directly to DB2, but take an Oracle job with two OCI stages with a Transformer in between. Nothing fancy - read from one db, write to another. Split it into two jobs - one to land the data and a second to read from that flat file and write to the target - and it can run substantially faster. Not saying it absolutely will run faster, just that it may more often than you'd think.

Plus, you get the added bonus of having a static dataset to load, which can simplify any restart/recovery you may end up having to do.

crouse · Post by **crouse** » Tue Jul 26, 2005 8:53 am

I agree with chulett

( plus, I just had to get a post in )

lclapp · Post by **lclapp** » Tue Jul 26, 2005 9:09 am

I am running a test as I type....The current 'best' appears to be to drop a FF between the read from DB2 and write to DB2. If I have say 5 million rows to process it will be faster to drop to a ff before beginning the load to DB2. I can't use the bulk loader because the DB2 is on the mainframe. At first blush I can't see the FF making things faster because I still have to write the 5 million records and then read them again....we will see.

Also the reason for my post turned into a bust. The Developers assumed that there would be problems reading and writing to the same DB2 table in the same job but never tested.

I will post the results.

leslie

crouse · Post by **crouse** » Tue Jul 26, 2005 10:20 am

How 'bout writing the FF, then FTP'ing or ConnectDirect'ing it to the MF to let the MF bulk load the file. Formating the FF on the server before transferring to the MF may be a bit of a hassle so that when it arrives on the MF it the correct format. But worth a look.

DSXchange

Flat File as intermediate vs Hash Table; which may be faster

Flat File as intermediate vs Hash Table; which may be faster

Flat File in between is better...???