Simple Job Running for long time

raj_cipher · Post by **raj_cipher** » Mon Oct 04, 2004 10:59 am

Hi ,

I hv a job in whick both i/p and o/p are seq files.
I use three transformers and i also hv 5 odbc lookups.
The data vol. is around 9 million.Only two ODBC look ups have 2.5 million data and the rest are in thousands.This job takes 600 mins to run.
The server config. is very good and no other heavy load was running.

Any idea so that i can look into the job in detail and tune this one ?

chulett · Post by **chulett** » Mon Oct 04, 2004 11:13 am

Your 'simple job' is doing 45 million ODBC lookups over the course of its run.

All this one at a time and over the network, unless your lookup sources are co-resident with the DataStage server.

Much better idea to bring this data into hash files and do the lookups locally against the hashes.

raj_cipher · Post by **raj_cipher** » Mon Oct 04, 2004 10:46 pm

Hi ,

My ODBC look up has queries with <= and >= conditions in it.This condition is not possible in hash files.Anyway that i could improve it from the current set-up ?The network connection is 1gbps VLan.

chulett · Post by **chulett** » Mon Oct 04, 2004 11:06 pm

raj_cipher wrote:My ODBC look up has queries with <= and >= conditions in it.This condition is not possible in hash files.

Strictly speaking, this isn't true. Yes, it's not an "out of the box" in the GUI kind of thing and you do have to do some work to achieve something like that, but it can be done. See this post for some ideas that may get you walking down that path.

You could also play the old game of accessing the Hash file using a UV stage, which would allow you to use your SQL syntax on the data. With proper indexing, I would think this would beat the pants off your ODBC lookups. Maybe even without proper indexing.

I seem to recall some other posts on this topic recently, but couldn't turn them up. Anyone have any other good posts stashed away in their Favorites that might help? Either that or care to chip in?

tonystark622 · Post by **tonystark622** » Tue Oct 05, 2004 7:24 am

With proper indexing, I would think this would beat the pants off your ODBC lookups. Maybe even without proper indexing.

No, you pretty much have to have proper indexes when doing stuff like this unless the amount of data is very small. But it's pretty simple to add extra indexes.

Tony

kcbland · Post by **kcbland** » Tue Oct 05, 2004 7:35 am

Well, a single instance of a job reading a seq file, doing a couple of ODBC lookups, and writing to a seq file will not use multiple CPUs to run. By instantiating the job and using a constraint to partition the source data, you can run as many instances as you have CPUs and increase your net throughput.

Unfortunately, your ODBC looups will slow you down, and chances are that the database will struggle to run N times as many similar lookup queries so you won't see a linear improvement as you run more instances. You MUST move to using hash files, which have an in-memory caching feature.

Now, to use hash files with a ranged lookup, you must create a memory-cacheable construct. This requires that you build a hash file in a way that supports your efforts. This post viewtopic.php?t=84998&highlight=rollup describes my solution for doing such thing.

chulett · Post by **chulett** » Tue Oct 05, 2004 7:45 am

Funny, that was the very post that I linked to above.

And Tony, you left off my little winky guy when you quoted me. I was kidding... mostly.

tonystark622 · Post by **tonystark622** » Tue Oct 05, 2004 8:42 am

Yeah, I did. Sorry.