Memory Usage by data stage server

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
dhiraj
Participant
Posts: 68
Joined: Sat Dec 06, 2003 7:03 am

Memory Usage by data stage server

Post by dhiraj »

We were running 13 jobs concurrently. each processing about 10 Million records and running for several hours. These jobs read from sequential files, do multiple lookups in to ODBCs and hashfiles and finally write the result in to sequential files. The jobs run real slow and some times some of them abort after several hours of processing giving the following error.

ds_seqgetnext: Win32 error in ReadFile - Insufficient system
resources exist to complete the requested service.

in order to investigate the error we examined the memory usage and found that at the start of these 13 jobs around 11 gb of free memory is available and as the job progresses the free memory decreases steadily until the job completes. the final available memory just before the jobs completed was around 1.5 gb. after the jobs completed memory was returned to the system and free memory stood at 11 gb. i don't have the cpu usage yet, to analyze. Now i have the following questions about the memory usage.

1) why does the memory usage increase steadily as the job progresses should it not be constant, as we are processing record by record?

2) When sequential files are used as target stages, when is the data flushed to the disk from the memory? are there any parameters in datastage which control this?

And is there any thing else that i can try, to avoid the above mentioned error messages. I sometimes also get that error message when the system is not heavily loaded. Rebooting the server solves the problem. :)


Thanks

Dhiraj
peternolan9
Participant
Posts: 214
Joined: Mon Feb 23, 2004 2:10 am
Location: Dublin, Ireland
Contact:

Re: Memory Usage by data stage server

Post by peternolan9 »

Hi Dhiraj,
I know it is no help....but I am pretty sure you are in 'unknown' territory......a 12CPU box running windows?? There are not many of those around. Who knows what windows is doing internally?

Do you have any really compelling reason not to be using unix with so many processors??
dhiraj wrote:We were running 13 jobs concurrently. each processing about 10 Million records and running for several hours. These jobs read from sequential files, do multiple lookups in to ODBCs and hashfiles and finally write the result in to sequential files. The jobs run real slow and some times some of them abort after several hours of processing giving the following error.

ds_seqgetnext: Win32 error in ReadFile - Insufficient system
resources exist to complete the requested service.


Thanks

Dhiraj
Best Regards
Peter Nolan
www.peternolan.com
dhiraj
Participant
Posts: 68
Joined: Sat Dec 06, 2003 7:03 am

Post by dhiraj »

Hey Peter,

The only compelling reason is that our clients are particular about it.


Dhiraj
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

Do you have any really compelling reason not to be using unix with so many processors??
Peter,

Do you really think this developer has any say about this? (in most scenarios they do not). Rather than muddy the water with comments like this we should be focusing on what might be causing this perceived problem. I think the first place to look would be at the job design and then at the OS.

Dhiraj,

Could you please give us more information about your job streams? What types of stages are you using (both active and passive). Is this the same job running multiple instances or 13 different jobs? I have successfully run this number of jobs in parallel and have not witnessed what you describe so that leads me to believe there may be something in your job or job stream causing this behavior.

Regards,
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

As far as memory leaks, etc, the memory usage could be related to massive write-delay paging that was setup on the server. As far as DataStage is concerned, unless there's a bug, its memory requirements are rather small. Even hash files have read/write caching size limits.

I would start with the idea of not running jobs that take hours. You should slice your data into smaller increments, as a 2 hour job failing at the 1:59 mark incurs all of that time in running again. You might find that running 4 X 30 minutes mitigates your issues, as well as allows you to have logical "commits" in processing so that a single failure doesn't send you all the way to the beginning again.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
dhiraj
Participant
Posts: 68
Joined: Sat Dec 06, 2003 7:03 am

Post by dhiraj »

Michael,

The job looks like this

Seq-->Xmr-->seq-->Xmr-->IPC-->Xmr-->Seq

Each transformer does a couple of lookups in to hash files/odbcs. Inter process row buffering is enabled. All the 13 jobs are physically 13 different jobs to datastage , but they do exactly the same functionality.We are not using instances option in datastage.

Ken,
How do i view the value for write delay in windows. and what should the optimum value be?

Thanks for your time

Dhiraj
peternolan9
Participant
Posts: 214
Joined: Mon Feb 23, 2004 2:10 am
Location: Dublin, Ireland
Contact:

Post by peternolan9 »

Hi Michael/All,
of course, I would expect that a developer (very likely) has no say at all over hardware purchased....and that it is also likely that the customer (or someone) has mandated the environment.......but that's not to say the mandate is sensible/reasonable or can even be made to work.

And it might be worth asking the question, 'should we be doing this'?

Why?

About 30 years ago now Frederick P. Brooks, in The Mythical Man Month, wrote rather a lot of excellent suggestions for development projects. One of them being 'Use Proven Technologies'.

This is an often ignored piece of advice in Systems Development and a major cause of development project failure.... :-(

I just wonder if the environment Djiraj is working with could be put into the category of 'Proven Technologies'. (I know of no DS instance running on 12 CPUs on windows. ) And if not, perhaps Djiraj might need to treat the project environment as 'research' which can be significantly more expensive and, by definition, not guaranteed to work at the end of the day.

Someone has to pay the bill for 'research' for non-proven technologies...will it be Djirajs company or the customer?

I have seen a lot of customers spend a lot of time and money trying to get an environment/project to work which was NEVER EVER going to work. So my view on this is to ask the question, 'should we be doing this?' BEFORE a customer invests the money, and not continue to try and solve one problem after another for weeks, months, and in some cases, years.

But then again, that's just my opinion... ;-) ...

This advice could have saved a number of my customers multi-millions of dollars if they used it, but often they don't... ;-) ...such is life.
mhester wrote:
Do you have any really compelling reason not to be using unix with so many processors??
Peter,

Do you really think this developer has any say about this? (in most scenarios they do not). Rather than muddy the water with comments like this we should be focusing on what might be causing this perceived problem. I think the first place to look would be at the job design and then at the OS.
Best Regards
Peter Nolan
www.peternolan.com
Post Reply