Server hit by lots of log queries

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Server hit by lots of log queries

Post by roy »

Hi Yall,

one of the projects I support has a mjor issue with DS server load problems.
they managed to narrow the problem to lots and lots of short log file sampling processes done by uv processess.
the unique thing about this project is that it uses DS for loading tables at least 5 times an hour (yes not pure DWH I know).

so does anyone have any idea what might cause this?
has anyone had this issue?
any ideas would be apreciated.
(on a course right now so will look in to it tomorw)

Thanks in advance,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Your situation is not quite clear. If what you are saying is that the uv processes (release 5 DataStage threads) are hitting the database with a lot of individual SQL statements, then yes, that is the way it works.

If you have 1 million rows in a source sequential file, that sends data to a transformer with a ODBC/OCI reference lookup, and outputs to a sequential file, then you will have 1 million distinct SQL queries against the database. Even if all 1 million queries are identical, you will have 1 statements issued. If you're lucky, query cache will kick in. If not, you undergo the query and all the labor involved. If you instantiated the job to divide and conquer the source data, then you have N jobs all issuing 1million/N queries, but it's still 1 million queries.

This is why I constantly labor to show the benefit of scanning your source data for natural keys, putting that into a work table in the target database, inner joining that work table to the target table, and unloading those rows and untimately putting into a hash file. That means 1 query, with the ability to parallelize, with a concentrated stream of spooling data, into a hash file that has read cache capability, shared cache capability, and scales no matter how many instantiated clones are using it.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Thanks Ken,
but my situation is not heavy load on DB instance it is with the DS engine machine that gets the load problem from many short queries to the DS log files :(
any ideas?
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

roy wrote:Thanks Ken,
but my situation is not heavy load on DB instance it is with the DS engine machine that gets the load problem from many short queries to the DS log files :(
any ideas?
Ooohhh, that's what you're talking about... There's nothing you can do about it from a DataStage perspective. Hash files get banged around a lot, so about the only thing you can do is make sure you put projects and work areas on file systems that aren't being traced or transactionally logged. In the past, I had situations with clients where DataStage ran at a snails pace because of this, so we made sure that our mount points were isolated and on dedicated disks and controllers.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Well,
Actually it might help to understand what makes theese log hits occur?
and is there a way to minimize them or at least try to make the log hits less often?
does anyone know anything about that?
or is it just how things work?

Thanks again.
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Having a Director (or more than one) open in log view doesn't help. Switch to Status view, or minimize Director, or (best) close Director.

As far as running jobs hitting the logs is concerned (exactly how did you establish that this is happening?), find out what kind of things are being logged and seek ways to minimize that. For example, do you have routines that call DSLogInfo/DSLogWarn but probably don't need to?
Jobs themselves only log a limited number of events, such as jobs and active stages starting and stopping and link reports.

Keeping your log files purged as much as possible (even if you archive them prior to purging) will also help.

Periodically clean out old entries from the &PH& and &SAVEDLISTS& directories on the server.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Thanks Ray,
The server actually has no director open usually since this is production and I can also set it to refresh in large intervals.
But come to think of it running many jobs each 10 minutes that read/write from DB to file and or back to DB might actually be the reason, in which case there is nothing really I can do.
I'll check to see about minimizing log write from the Basic code to a minimum.
I was info searching to get a head start on the visit I will make there tomorrow.
I'll post my findings.

Thanks for the quick replies :)
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Ken,
by:
make sure you put projects and work areas on file systems that aren't being traced or transactionally logged. In the past, I had situations with clients where DataStage ran at a snails pace because of this, so we made sure that our mount points were isolated and on dedicated disks and controllers.
do you mean to build projects on a storage disk space rather then directly on the DS Server's local disk?

basically it is a single project, so far, so several disks spreading projects among them might not solve my issue for now.

Thanks in advance,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

roy wrote: do you mean to build projects on a storage disk space rather then directly on the DS Server's local disk?
Always separate projects from data. As for the location of the project, this really shouldn't be an issue regarding the type of disks, as long as the project is local to the server (not on an NFS mounted filesystem).

For the life of me, I can't figure out how you chased this down to being job logs. Job logs are name RT_LOGxxx, so the physical file is located within the project as a subdirectory therein. Since each job has its own RT_LOGxxx file, no single job should be crippling the system. Even the cumulative effect of multiple instantiated jobs using the same master job log file shouldn't be a big issue. Now, of course, if that log has 2 million entries in it, I can see how it would be a pig on the system to maintain (dynamic resizing and all).

So, if you can tell me how you found out it is an RT_LOGxxx file I sure would appreciate it.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
Well Ken using ps I find the pid of my job processes.
then using truss I gain all that happens for them and their child processes.
this gives me the info : for 10 seconds worth of job run (for it and it's child processes) about 11,000 reads and about the same number of writes.
now if I watch the data going back and forth it has strings in the format of a logfile.
i.e. batch::batchname.invocationid setting nls enus.....
also requesting job run and waiting for job etc'..
this leads me to assume this is log related (unless my assumptions are wrong)
so I asked them to send me a smaple of this and once I get it it's going to Ascential,so they may explain and see what can( if can) be done about it.

the reads are done in 2k blocks, though I don't know if there is any need for them to be larger in size for the info the pass.

Thats All I know for now.
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

How are you doing job control? Do you use custom written BASIC logic, sequencers, or plain-old BATCH jobs? The brief sample you stated indicates that it is log messages. Now, I noticed you typed batchname.invocationid, so, are you having a zillion instantiations of the same job?

You know that the master job gets messages put into its log by all of the clones. If you are not purging that master log manually, it will fill up with a bunch of log messages. I do believe a log only clears after a job runs. If you run invocations/instances/clones of that job, then the master job actually never runs and I wonder if the master log never clears. That log could be massive by the end of a day. You may want to check that out.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Ken,
as you figured out (and the reason I mentioned that message portion) I use basic controll jobs and they are multi instance ones.
no sequencers involved only pure basic wraper jobs.
this is done to dynamically run the required job/s in order to populate the requested table, from source to target, making turn around time for adding a new table load to be ASAP (around 1 hour! from development starting time to tests/production available)
the log purge is set to purge older then 1 day old run entries.
and from what I've seen around 10:00 AM today the largest RG_LOG file was 32 megs in size, most were resonably small.
If this is how DS works then I guess there is not much I can do about it.
I just hope that the expantion planned for this system won't cause the server to stand poot.
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Okay, but you never really defined what the performance problem is. Are jobs with "DS server load problems" running slow you believe it's because of the job log interaction? How are you defining "slow"? How are you measuring that? A job of the nature seq --> xfm --> seq should 100% utilize a cpu (single-threaded design, one transformer). If you see the cpu pegged, then what's the problem? Are your designs mix/matched with db stages?
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Well the "customer" thought the process is not running on the production server as fast as they expected in acordance to the compared configuration enhancements that the productin system has vrs. the development system.
so they started checking what was going on and found this, which seems as heavy load on the server, that needs investigation and perhaps tuning.
so the question is do we have anything to do about it?
in case this is how DS works then the answer would be, we can do nothing except minimize,if possible, (as Ray siad) the number of DSLog calls from the basic code we initiate ourselves.
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

roy wrote:Well the "customer" thought the process is not running on the production server as fast as they expected in acordance to the compared configuration enhancements that the productin system has vrs. the development system.
so they started checking what was going on and found this, which seems as heavy load on the server, that needs investigation and perhaps tuning.
so the question is do we have anything to do about it?
in case this is how DS works then the answer would be, we can do nothing except minimize,if possible, (as Ray siad) the number of DSLog calls from the basic code we initiate ourselves.
Aiyee, I should have started this whole discussion by asking you how you measured that a job was underperforming. I have an expression for this type of troubleshooting: "Shooting ducks at night". What it means is that we are taking shots at sounds in the air at night. Hear a quack, fire a shot in the general direction.

You absolutely have to determine if a job is underperforming and why. You need to use "prstat -a" and watch a job like seq --> xfm --> seq. If that job totally consumes a cpu, you have no performance issue with your DataStage logs. I don't think you have any performance issue with DataStage. If you are comparing dev and production throughput as apples to apples, that expectation has to be adjusted. If your jobs have OCI/ODBC reference lookups and you're comparing job execution in dev against small datasets and production with large datasets, your team is making a mistake.

Roy, write a simple job that reads a large sequential file, maybe 200K rows of about 200 bytes per row, passes it thru a transformer trimming a few columns and UPCASEing a few others, and write it to another sequential file. Watch prstat as it runs. If the cpu % shows 1/Number of cpus on your server (uname -X), then the job is running at full speed and you have no issue other that proper analysis, tuning of jobs, and setting of performance expectations.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Post Reply