I am debugging one of the job running in single node (DS 11.5 / windows server) that is taking "about 5 minutes of production run time " .my goal is to identify bottlenecks and performance tuning of job to make sure it runs efficiently (lesser time).
Job design contains reading oracle source data (connector stages-record range 0.1M to 4M), datasets (record range 500 to 100000) being used as reference data for join,lookup operations also several other stages sort ,copy,aggregator ,funnel and transformer stages ..target its writing to 2 data sets .(overall 50 and above stages)
Transformer LOGIC
Code: Select all
**timestamp to varchar conversion ***
If IsNull(DATE_CREATED) Then '' Else If IsValidTimestamp(DATE_CREATED) Then DateToString(TimestampToDate(DATE_CREATED),"%mm/%dd/%yyyy")
Else ''
**Decimal conversion **
DecimalToString(val,"fix_zero,suppress_zero")
10:16:34 AM Starting Job ..
$APT_CONFIG_FILE = C:\IBM\InformationServer\Server\Configurations\1Node.apt
$APT_DISABLE_COMBINATION = True
$APT_SORT_INSERTION_CHECK_ONLY = False
$APT_STRING_PADCHAR = 0x0 (From value file)
$APT_PM_PLAYER_TIMING = False (From value file)
$APT_PM_PLAYER_MEMORY = True (From value file)
$APT_RECORD_COUNTS = True (From value file)
$OSH_PRINT_SCHEMAS = True (From value file)
$APT_DUMP_SCORE = True (From value file)
$APT_STARTUP_STATUS = False (User supplied)
10:16:34 AM Attached Message Handlers:
10:16:37 AM Project message handler: SortStageResort
10:16:37 AM Environment variable settings:
...
Parallel job initiated
OSH script ..
# OSH / orchestrate script for Job
main_program: orchgeneral: loaded
10:16:39 AM orchsort: loaded
10:16:39 AM orchstats: loaded
**oracle connector extract **(elapsed time =14 sec)
10:16:53 AM ext_CONTACTS_plus: The connector connected to Oracle server oradb1.
**dataset folder**
10:17:54 AM is/isdata/tgt1_ds,0: Heap growth during runLocally(): 86024192 bytes
***Transformer stage Messages (elapsed time =9 sec)
10:18:47 AM xfm_standardize,0 :Input 0 consumed 520749 records.
10:18:47 AM xfm_standardize,0: Output 0 produced 520749 records.
10:18:56 AM xfm_standardize,0: Heap growth during runLocally(): 85639168 bytes
***LOOKUP stage(elapsed time =4 sec)
10:19:06 AM lkp_ob_co_bond_obi,0: Heap growth during runLocally(): 133578752 bytes
10:19:10 AM buffer(1),0: Heap growth during runLocally(): 85590016 bytes
10:19:10 AM buffer(0),0: Heap growth during runLocally(): 85708800 bytes
***funnel stage
10:19:10 AM funl_inv_fiv,0: Heap growth during runLocally(): 85143552 bytes
***sort
10:19:11 AM srt_bond,0: Heap growth during runLocally(): 85983232 bytes
10:19:12 AMbuffer(10),0: Heap growth during runLocally(): 83468288 bytes
**oracle connector stage (elapsed time =22 sec)
10:19:12 AM ext_plus,0: Number of rows fetched on the current node: 1151731.
10:19:12 AM ext_plus,0: Output 0 produced 1151731 records.
10:19:34 AM ext_plus,0: Heap growth during runLocally(): 82477056 bytes
***copy stage **(elapsed time =1 minute ,10 sec)
10:19:38 AM cpy_stage,0: Output 0 produced 1151731 records.
10:19:38 AM cpy_stage,0: Input 0 consumed 1151731 records.
10:20:48 AM cpy_stage,0: Input 0 consumed 1151731 records.
***join stage ** (Let outer join --Input links -Auto partition )(elapsed time =1 minute ,4 sec)
10:21:33 AM Jnr_Bond_id,0: Input 0 consumed 1151731 records.
10:21:33 AM Jnr_Bond_id,0: Output 0 produced 1151731 records.
10:22:45AM Jnr_Bond_id,0: Heap growth during runLocally(): 80654336 bytes
10:22:47 AM buffer(23),0: Heap growth during runLocally(): 80687104 bytes
10:22:47 AM buffer(23),0: Heap growth during runLocally(): 80687104 bytes
***dataset ** (partition/collection --Auto )(elapsed time =6 sec)
10:22:49 AM ds_tgt1,0: Input 0 consumed 1207442 records.
10:22:49 AM ds_tgt1,0: Input 0 produced 1207442 records.
10:22:55 AM ds_tgt1,0: Heap growth during runLocally(): 79831040 bytes
10:22:55 AM main_program: Step execution finished with status = OK.
10:22:58 AM main_program: Startup time, 1:18; production run time, 4:57.
10:22:58 AM Parallel job reports successful completion
10:22:59 AM Finished Job .
Redesign :Split the original job into 1 jobs.
job1 :extract oracle data and write to 2 datasets
job2 :use 2 datasets as sources for original job ..it does n't help either rather it becomes worse.
main_program: Startup time, 0:37; production run time, 13:42.
Please give me some insights to performance tune this job.
Thanks.