Shell script takes longer execution time in DS than prompt

chetan.c · Post by **chetan.c** » Fri May 25, 2012 1:48 am

Hi,

I have a unix script which i am executing through datastage in a sequence job.
The script takes 16 seconds on the command prompt but takes around 55 seconds when the job is run.

In the sequence job I have only the Execute Command Stage.

What could be the reason for this behaviour?

Thanks,
Chetan.C

ray.wurlod · Post by **ray.wurlod** » Fri May 25, 2012 5:52 am

Startup time of the job, probably. Enable player timings and display of startup to confirm.

chulett · Post by **chulett** » Fri May 25, 2012 6:28 am

Not enough information for anyone to do anything other than guess. What does the script do?

chetan.c · Post by **chetan.c** » Fri May 25, 2012 7:13 am

ray.wurlod wrote:Startup time of the job, probably. Enable player timings and display of startup to confirm. ...

I'm not seeing startup time of the sequence job.In parallel jobs i can see it.
Will it be displayed in sequence job after Enabling Player timings.

Also what I did was to echo timestamp just before the code in script starts to a log file and and after the last line of the code is executed.

The execution time after taking difference ,there i got was also 55 seconds

Will do and check .

Thanks,
Chetan.C

chetan.c · Post by **chetan.c** » Fri May 25, 2012 7:15 am

chulett wrote:Not enough information for anyone to do anything other than guess. What does the script do?

Hi Craig,
The script is below.

Code: Select all

#!/bin/bash
zipped_file=Smaple.gz
Directory=$1
Archive=$1
Log_File_Path=$3
dir=`date +%Y%m%d`

cd $Directory
gzip *.xml

ls -1 *.gz>File_Names.txt

file="$Directory/File_Names.txt"

while read line
do
cat $line>>$zipped_file
done<"$file"

gzip -l *.xml.gz |awk '{print $1"^"$2"^"$4}'|sed '$d'>sample.txt

cd $Archive
if [ -d `date +%Y%m%d` ];then
mv $Directory/$zipped_file $Archive/$dir
else
mkdir $dir
mv $Directory/$zipped_file $Archive/$dir
fi

In the job i have only one execute command activity and nothing else.

Kindly let me know if any other information is required.

Thanks,
Chetan.C

ray.wurlod · Post by **ray.wurlod** » Fri May 25, 2012 4:47 pm

Add two more Execute Command stages to capture the time. This will give you the actual script execution time. Anything else is DataStage overhead, and the overhead of starting a new shell for each command.

chetan.c · Post by **chetan.c** » Mon Jul 09, 2012 7:35 am

Hi,
Its been a long time but we have a work around for this.

We raised a PMR for this, the reply that we got was its "normal" for Datastage to take the extra time to load XMETA libraries etc.

What we did was to use an external source stage for this and dumped a portion of the Unix script there with a dummy Dataset connected to it.

Its way faster than calling the script from the Execute command stage of a sequence job.

It may not be the right thing but its solving the purpose.

Thanks
Chetan.C

PaulVL · Post by **PaulVL** » Mon Jul 09, 2012 9:10 am

So during your tests you had:

date > logfile.txt; yourscript.sh parm1 parm2 parm3; date>>logfile.txt

and IBM said it executed longer due to xmeta libraries loading?

I have a really hard time believing that one.

When you executed on the command line, did you try it from your project directory?

(BTW: What happened to your Second parm? $2 )

battaliou · Post by **battaliou** » Mon Jul 09, 2012 2:12 pm

Include a pwd in your script to prove you are in the right directory when you run script from DS.