Page 1 of 1

processing large resultsets

Posted: Tue Jun 03, 2008 3:26 pm
by seanc217
Hi there,

I am processing a large resultset (6M + records) through the USPREP ruleset.

My question is what is the best way to run such a resultset through the standardize stage efficiently. Right now I'm processing at 1240 rec/sec.

It appears that the standardize stage is really slow. I can understand why, it's doing alot of work. Just wondering what are some of the best practices for making this run efficiently.

Thanks.

Posted: Tue Jun 03, 2008 4:08 pm
by ray.wurlod
If the system is not running out of resources, use a configuration file with more nodes.

It might be possible to write a more efficient rule set, but the benefit is probably not worth the cost.

Posted: Thu Jun 05, 2008 8:47 am
by seanc217
Understood.

Thanks for the reply.

Sean

Posted: Tue Jun 17, 2008 2:38 pm
by emeri1md
It might be worth taking out any columns that are not being processed and join them again later on. It adds a bit on complexity, but if that takes a lot of records out of the standardization stage, it might be worth it.

Matt

Posted: Fri Jun 20, 2008 1:28 pm
by seanc217
Good point. I was thinking the same thing.

Thanks!

Posted: Fri Jun 20, 2008 1:35 pm
by seanc217
Good point. I was thinking the same thing.

Thanks!