Page 1 of 1

Job Performance

Posted: Tue Nov 11, 2014 4:48 pm
by Nagac
Hi

I have Datastage Job which does Null Validation, Duplicate Validation and finally loads into Dataset. Which takes around 50MINS to complete(27 million Rows). I would like to improve the performance can some one advise on?

ODBC Stage --> Transformer(Validation) Sort--> Transformer(Duplicate Removal)-->Datastage.

Apart from these, i have validation fail links writes into Datastage.

We have 2 nodes

Below are the environment Variables used in project level:
Environment variable settings:
APT_COMPILEOPT=/D APT_USE_ANSI_IOSTREAMS /D _WIN32 /D _MBCS /nologo /W3 /WX- /Gm- /EHa /MD /GS- /fp:precise /Zc:wchar_t- /Zc:forScope /Gd /TP /Zi /Oy- /c
APT_COMPILER=cl
APT_CONFIG_FILE=E:/IBM/InformationServer/Server/Configurations/default.apt
APT_DEFAULT_TRANSPORT_BLOCK_SIZE=13107200
APT_DISABLE_COMBINATION=1
APT_ERROR_CONFIGURATION=severity, !vseverity, !jobid, moduleid, errorIndex, timestamp, !ipaddr, !nodeplayer, !nodename, opid, message
APT_IO_MAP=1
APT_IO_NOMAP=1
APT_LINKER=link
APT_LINKOPT=/INCREMENTAL:NO /NOLOGO /DLL /DEBUG /SUBSYSTEM:CONSOLE /DYNAMICBASE:NO /MACHINE:X86
APT_MAX_DELIMITED_READ_SIZE=204800
APT_MAX_TRANSPORT_BLOCK_SIZE=104857600
APT_MONITOR_MINTIME=10
APT_NO_IOCOMM_OPTIMIZATION=1
APT_NO_ONE_NODE_COMBINING_OPTIMIZATION=1
APT_OPERATOR_REGISTRY_PATH=K:\DS_Projects\projects\international\INTL_DEV\buildop
APT_ORCHHOME=E:/IBM/InformationServer/Server/PXEngine
APT_PHYSICAL_DATASET_BLOCK_SIZE=9000000
APT_USE_CRLF=1
APT_USE_IPV4=1
DB2INSTANCE=DB2
DS_ENABLE_RESERVED_CHAR_CONVERT=0
DS_OPERATOR_BUILDOP_DIR=buildop
DS_OPERATOR_WRAPPED_DIR=wrapped
DS_OPTIMIZE_FILE_BROWSE=0
DS_PX_RESET=1
DS_TDM_PIPE_OPEN_TIMEOUT=720
DS_TDM_TRACE_SUBROUTINE_CALLS=0
DS_USERNO=-4024
ISUSER=ChalamN
NUTCROOT=C:\PROGRA~2\MKSTOO~1
TMP=C:\Windows\TEMP
UNIVERSE_CONTROLLING_TERM=1
UNIVERSE_PARENT_PROCESS=13308
USER=DSTAGE\dsadm
USERDOMAIN=WORKGROUP
USERNAME=DSTAGE$

Posted: Tue Nov 11, 2014 6:40 pm
by ray.wurlod
Try allocating more memory in the Sort stage.

Posted: Wed Nov 12, 2014 8:49 am
by PaulVL
Look to improve your SQL on the ODBC entry.

Seek to use a native connector stage instead of ODBC if possible.

A partition read will give you more rows per second on that extract, because you will have more connections to the database.

Posted: Wed Nov 12, 2014 2:58 pm
by ArndW
Since the bottleneck portion of your job has not been identified, it is impossible to give specific tuning advice.

If you make a copy of your job and just have your source ODBC into a PEEK stage and nothing else, what speed does that run at? If the speed is similar to your normal job speed then your bottleneck is your source stage and you can look into tuning that. If the job copy is very fast then tuning ODBC will do nothing, since that isn't the source of your problems.

While the bottleneck is most likely your read speed, you will need to check to make sure that this is indeed true.

Posted: Thu Nov 13, 2014 11:24 am
by Nagac
Thanks Everyone,

I have increased the RMU and Job looks fine.

Arndw: ODBC Stage extracting the 21K/Sec and when it goes to the Sort stage it's come down to 16K/Sec. I need some help on increasing the throughput in Sort Stage.

I tried increasing the Buffer in Sort Stage but couldn't see any difference. I had gone through one Article and its says we achieve some performance by increasing readahead(Unix), Is there any equivalent parameter in windows?

Thanks
Naga

Posted: Fri Nov 14, 2014 1:01 am
by ArndW
Where are your temporary and scratch spaces located? If they aren't on local disks you might want to change your configuration to increase speed.

Posted: Fri Nov 14, 2014 4:29 am
by Nagac
Arndw

Both are in Local disk(s)

Posted: Fri Nov 14, 2014 6:56 am
by battaliou
Try sorting in the ODBC stage.

Posted: Sat Nov 15, 2014 2:09 am
by ArndW
I agree with Martin, if you sort in the database that might be faster than doing so in DataStage and if the database is on another machine then it would transfer CPU usage from the DataStage server to that box.