Startup time high in jobs having a DB operation

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
chetan.c
Participant
Posts: 112
Joined: Tue Jan 17, 2012 2:09 am
Location: Bangalore

Startup time high in jobs having a DB operation

Post by chetan.c »

Hi,

Startup time of parallel jobs that contain Database lookups and/or inserts is high.
Like 40 seconds for startup and production time is 5 seconds.

I read the posts here,and came across &PH& directory cleanup.There were around 1600 files which we cleared.

This did not result in any change on the start up time.

I read about time involved in establishing session,authenticating Username and password.
The same jobs on the development database takes very little starup time like around 8 to 9 seconds.
So, this makes me think that there is something about the DB that is making this happen.

I would like to know if there is someway I can find out that DB is creating the incresed startup time or atleast rule out my assumption?(Unfortunately contacting the DBA is not an option as apparently, I have to be 100% sure about what im saying before I say it!! )

Or anything on the installation box that I need to check out for.
Also some more information on start up time would be really helpful to me.

Thanks,
Chetan.C
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Which Database connection method are you using?

What DBMS platform?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: Startup time high in jobs having a DB operation

Post by chulett »

chetan.c wrote:I read the posts here,and came across &PH& directory cleanup.There were around 1600 files which we cleared.

This did not result in any change on the start up time.
It wouldn't affect the "start up time" of the lookups.
chetan.c also wrote:The same jobs on the development database takes very little starup time like around 8 to 9 seconds.
I would imagine this is all about the volume of data in dev versus prod.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

Could you be on DB2 by any chance? If so, there is definitely a DB setting (maybe for other DBs too) as we had our Prod machine taking 2+ minutes to start up before the DB guys made some adjustments that sped up connection.

What would be nice is if DataStage created one connection for all queries to use, we have some jobs with many DB2 stages and we have to go through each one making a connection in turn
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

With the DB2 settings you might want to also check Validation of Column Type and Size mismatches (if you are confident that the table def won't change from under you).
chetan.c
Participant
Posts: 112
Joined: Tue Jan 17, 2012 2:09 am
Location: Bangalore

Post by chetan.c »

Hi Paul and Kryt0n ,

Sorry I dint mention the DB.
Its Oracle.

Look up Configuration.
Enable_partioned reads---Yes
Patitioned Read method--Rowid Hash
Record Count ---20000
Array size ---2000
Prefetch count---1
No before or after SQl statements.

Insert Configuration:
Write mode---Insert
Table Action --Append
Record count--2000
Array Size --1
Drop Unmatched Fields ---Yes
No before and after SQL statements.

Hi Craig,

Actually we are using the same load which is used in the Dev.No change absolutely in the load.
Also is there anything that needs to be monitored on the box which can lead to this?

Thanks,
Chetan.C
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

chetan.c wrote:Actually we are using the same load which is used in the Dev.No change absolutely in the load.
If by "load" you mean the data you are loading, that's not what I meant. I meant the amount of data in the target database, in other words the amount of data it needed to cache before the main pipleline can start delivering data.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chetan.c
Participant
Posts: 112
Joined: Tue Jan 17, 2012 2:09 am
Location: Bangalore

Post by chetan.c »

Hi Craig,

Actually we are just testing the job runs in this environment,so its not with actual live data.
The data that is maintained in the tables of both the environment is almost the same(Checked record counts).Infact in the new environment its lesser than the dev environment.

I understand its difficult to guess the reason,without actually seeing it.. :) but i'm open to all suggestions.Thanks for your patience.

Thanks,
Chetan.C
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

OK, good to know. Cross that one off your list. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
chetan.c
Participant
Posts: 112
Joined: Tue Jan 17, 2012 2:09 am
Location: Bangalore

Post by chetan.c »

chulett wrote:..Cross that one off your list. :wink:
:) Few more things which I can check and cross them off ..?

Thanks,
Chetan.C
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

If you haven't done so already, add $APT_STARTUP_STATUS and $APT_PM_PLAYER_TIMING to your job set to True in both environments. Compare and contrast.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chetan.c
Participant
Posts: 112
Joined: Tue Jan 17, 2012 2:09 am
Location: Bangalore

Post by chetan.c »

chulett wrote:...add $APT_STARTUP_STATUS and $APT_PM_PLAYER_TIMING to your job set to True in both environments. Compare and contrast. ...
Hi Craig,

I added the 2 environment varibales $APT_STARTUP_STATUS and $APT_PM_PLAYER_TIMING to the job and this is the pattern I found for all DB look up tables.

Environment One

Code: Select all

DIM_DATES,0: Operator completed. status: APT_StatusOk  elapsed: 0.73  user: 0.12  sys: 0.06 (total CPU: 0.18 )
DIM_DATES,1: Operator completed. status: APT_StatusOk  elapsed: 0.67  user: 0.09  sys: 0.08 (total CPU: 0.17)
Environment two

Code: Select all

DIM_DATES,0: Operator completed. status: APT_StatusOk  elapsed: 5.28  user: 0.14  sys: 0.06 (total CPU: 0.20)
DIM_DATES,1: Operator completed. status: APT_StatusOk  elapsed: 5.32  user: 0.14  sys: 0.06 (total CPU: 0.20)
The elapsed time appers to be higher for the second environment.
Is it like Datastage is not getting sufficient allocation of CPU because of some other processes?

Also in the second environment I see

Code: Select all

repartition(5),0: Operator completed. status: APT_StatusOk  elapsed: 4.73  user: 0.00  sys: 0.00 (total CPU: 0.00)
About 20 entries of it only in the second environment.
The jobs is the same one migrated from from Env1 to env2 .

What does the above information mean?

Thanks,
Chetan.C
Post Reply