64bit hased file bad performance

gsym · Post by **gsym** » Thu Jun 14, 2007 8:31 am

Hi All,
i have created a 64 bit hashed file
RESIZE hashfilename 18 14765651 4 64BIT.
Iam loading data to this hashed file from a table which contains around 50 million records.
This job is running very very slow, after 24 hrs it loaded 10 million records only. Now its loading with 150 records per second.
How can i improve the performance of ths job?

Lemme know if u need more info...

Thanks

ray.wurlod · Post by **ray.wurlod** » Thu Jun 14, 2007 3:20 pm

Where is the slowness? - you need to verify that the slowness is in the Hashed File stage rather than in the extraction piece.

Much of the slowness loading large hashed files on disk is the randomness of the operation (about which there is nothing you can do). Each new write has to move the disk heads (seek to a new page), which is unproductive activity. That's why a write cache is so beneficial; random writes to memory carry no seek overhead, and the cache can be flushed to disk sequentially (no seek).

Based on your sizing the hashed file will be 30,240,053,248 bytes in size (assuming no overflow). Unfortunately this is too large to use a write cache, the upper memory limit for which is set in Administrator, but cannot exceed 999MB.

Therefore consider what you can do to make the hashed file smaller. Do you need ALL the columns you are loading into it? (A column that is un-needed will have no line coming out of it in the Transformer stage.) Do you need ALL the rows you are loading into it? (For example, for SCD management you only need current rows from the target table.)

gsym · Post by **gsym** » Thu Jun 14, 2007 3:29 pm

[quote="ray.wurlod"]Where is the slowness? - you need to verify that the slowness is in the Hashed File stage rather than in the extraction piece.

Much of the slowness loading large hashed files on disk is the randomness of the operation (about which there is nothing you can do). Each new write has to move the disk heads (seek to a new page), which is unproductive activity. That's why a write cache is so beneficial; random writes to memory carry no seek overhead, and the cache can be flushed to disk sequentially (no seek).

Based on your sizing the hashed file will be 30,240,053,248 bytes in size (assuming no overflow). Unfortunately this is too large to use a write cache, the upper memory limit for which is set in Administrator, but cannot exceed 999MB.

Therefore consider what you can do to make the hashed file smaller. Do you need ALL the columns you are loading into it? (A column that is un-needed will have no line coming out of it in the Transformer stage.) Do you need ALL the rows you are loading into it? (For example, for SCD management you only need [i]current [/i]rows from the target table.)[/quote]

Thanks Ray,

job design is like

DRS ----> IPC---->ROW MERGE ---->TRANS ---->HASH
120RPS 120RPS 120RPS 120RPS

Now its writing at 120 rows per second.
The problem is i need all the columns and all the rows from the source table.

Thanks

ray.wurlod · Post by **ray.wurlod** » Thu Jun 14, 2007 3:49 pm

Forget rows/sec - it is one of the most meaningless metrics ever devised.

What you are doing in writing to this hashed file is random writes to 14,765,651 different disk addresses. You can't guarantee that these are contiguous, either. Striping, RAID and other high-availability technologies only exacerbate matters.

The only way - with a hashed file this size - to get it to go faster is to sort the data by group number. Alas, IBM does not publish the algorithms, so you have no routine that can produce this for you. (It's available via the RECORD command, but this executes a read operation, so would be very slow also.)

On current rates it will take 3.8 days to load this hashed file. Can you put it on a faster and/or less used disk?

ArndW · Post by **ArndW** » Thu Jun 14, 2007 7:33 pm

I would also recommend you set the MINIMUM.MODULUS on the large hashed file to a very large number so that it doesn't have to enlarge the modulus continually while writing rows to the file.

ray.wurlod · Post by **ray.wurlod** » Thu Jun 14, 2007 8:00 pm

He did. It's static - type 18.

ArndW · Post by **ArndW** » Thu Jun 14, 2007 8:04 pm

I need glasses. Mea Culpa

Recommendation retracted.

ray.wurlod · Post by **ray.wurlod** » Thu Jun 14, 2007 8:08 pm

gsym, to make the Quote tags work, uncheck the "Disable BBCode in this post" check box when posting/editing a post.

kcbland · Post by **kcbland** » Thu Jun 14, 2007 10:13 pm

Devise a data partitioning strategy, create N copies of the hashed file using a multi-instance job with an expression in the constraint and a parameter in the hashed file name.

Instead of a huge hashed file which will be poorly performing for reference purposes, how about 10 hashed files of same structure each holding 5 million rows?

Now, use the same data partitioning strategy in the job that references the hashed file. You use N instances of the same referencing job with the expression in a constraint to limit the rows processed by that instance (prior to the hashed reference of course).

If you have 16 cpus, than partitioning your data and building 16 hashed files of 1/16 your reference data set will fully utilize all cpus and minimize your hashing and reference overhead. Each hashed file will have 1/16 your data, maybe even squeezing you back into 32 BIT hashed files, which can use read caching.

Partitioning your job that references the hashed lookup to 16 instances also maximizes your hardware utilization. Your throughput will go thru the roof.

This concept of partitioning reference data and aligning transformation w/lookups to that reference data is nothing new. It's what PX does automatically and elegantly. For Server, we do it more manually. This technique is simple and has been around for years.

ray.wurlod · Post by **ray.wurlod** » Thu Jun 14, 2007 11:23 pm

Good thinking that man!

(Obviously I can't brain today - I must have the dumb.)

If, for whatever reason, you still need to single-stream the data that are using the lookup, another technique called a Distributed File (a logical "umbrella" over the sixteen hashed files) will allow you to spread the load of lookups. But it would be better if you could similarly partition the stream data.

chulett · Post by **chulett** » Fri Jun 15, 2007 7:08 am

How much benefit is there to using a Distributed Hashed File in this situation? I've got a similar issue where multiple copies of a particular hashed file are being created by a MI job that mod's the input stream and distributes the result across the X hashed files.

However, the final job that consumes them doesn't currently follow the same rules for some reason, so ends up will all of them in it. Meaning they all get looked up against in spite of the known fact that only one will get a hit. Of course, it 'adversely impacts' the processing speed and the all important rows/second metric goes into the toilet.

I've been exploring re-arranging everything to do exactly what Ken stated (in my spare time, ha!) but with the reminder that DHF exist, I'm wondering if it might be a 'quickie' way to get this back to 'one lookup' in the interim. Is it worth considering? I haven't looked at them in detail for three years, not since I sat through a presentation at the 2004 Ascential World on the subject. Pros/Cons?

DSguru2B · Post by **DSguru2B** » Fri Jun 15, 2007 7:24 am

kcbland wrote: Now, use the same data partitioning strategy in the job that references the hashed file. You use N instances of the same referencing job with the expression in a constraint to limit the rows processed by that instance (prior to the hashed reference of course).

Ken, a follow up question.
How exactly will a developer know how to constraint the source records. The second hashed file might have a particular record that does not show up in the second instance, but might be present in the first partition. How to handle that? Do you collent the NOTFOUND's and merge it with each consecutive partition?

kcbland · Post by **kcbland** » Fri Jun 15, 2007 8:20 am

Let's say you have a very large customer dimension, for which you partition into 16 hashed files based on the customer natural key as the primary key.

Your order fact table needs to lookup the customer dimension by the customer natural key, which it must have.

Logic would dictate that the order fact job run as multi-instance and use the customer natural key as its partitioning value just the same as the customer dimension reference lookup. You must use the same partitioning strategy between a reference lookup and the process doing that reference lookup. The job with N references is a terrible design, sorry Craig. You simply partition the same way, just as in PX you would partition by customer natural key the primary stream to match the partitioning of the reference lookup on its customer natural key.

Just think about it a minute folks. In distributed processing, you must make sure the related rows are co-located on the same processing node at the same time. In PX you have to partition using the same column for non-GLOBAL references.

When you have two very large 'dimension' type lookups, you may have to repartition your primary stream (the 'fact' type). So you partition one way to match a partitioned lookup and spool to file, then concatenate all of the files from the multi-instances, then read the merged file and repartition for the second partitioned lookup. At any opportune time you can lookup the non-partitioned hashed files.

PX excels because it will "concatenate" without landing to disk. In Server, we have to use temporary sequential files when dealing with a repartitioning. For my current customer we have an advantage of using Syncsort for high-performance multi-threaded concatenations. I wrote a replacement for unix cat that uses the same syntax except invokes Syncsort to do it. A 30 minute cat of 10GB of data now takes about 1 minute using Syncsort.

DSguru2B · Post by **DSguru2B** » Fri Jun 15, 2007 8:29 am

Understood. The main point for this design is to ensure both the reference and source keys end up in the same partition. Beautiful.
Is you syncsort cat up on your website for public usage or its client propriety material

kcbland · Post by **kcbland** » Fri Jun 15, 2007 9:01 am

Can't share but here's the syncsort logic:

Code: Select all

syncsort /workspace /${TEMPSPACE} $FILES /statistics /${METHOD} /outfile ${OUTPUTFILE} overwrite /end

For $FILES we just listed all input files to concatenate. Syntax is /infile FILE1 64000 /infile FILE2 64000 /infile FILE3 64000 etc. The 64000 is just a maximum character row width.

${METHOD} is SUMMARIZE for distinct values and COPY for all

${OUTPUTFILE} is the output file.

I wrote a shell script to simulate cat, which takes the file specifications and expands into the $FILES variable all of the input files. I disallowed output to stdout and forced an output file specification.

Have fun!