processing large resultsets

seanc217 · Post by **seanc217** » Tue Jun 03, 2008 3:26 pm

Hi there,

I am processing a large resultset (6M + records) through the USPREP ruleset.

My question is what is the best way to run such a resultset through the standardize stage efficiently. Right now I'm processing at 1240 rec/sec.

It appears that the standardize stage is really slow. I can understand why, it's doing alot of work. Just wondering what are some of the best practices for making this run efficiently.

Thanks.

ray.wurlod · Post by **ray.wurlod** » Tue Jun 03, 2008 4:08 pm

If the system is not running out of resources, use a configuration file with more nodes.

It might be possible to write a more efficient rule set, but the benefit is probably not worth the cost.

seanc217 · Post by **seanc217** » Thu Jun 05, 2008 8:47 am

Understood.

Thanks for the reply.

Sean

emeri1md · Post by **emeri1md** » Tue Jun 17, 2008 2:38 pm

It might be worth taking out any columns that are not being processed and join them again later on. It adds a bit on complexity, but if that takes a lot of records out of the standardization stage, it might be worth it.

Matt

seanc217 · Post by **seanc217** » Fri Jun 20, 2008 1:28 pm

Good point. I was thinking the same thing.

Thanks!

seanc217 · Post by **seanc217** » Fri Jun 20, 2008 1:35 pm

Good point. I was thinking the same thing.

Thanks!