Oracle to Oracle performance
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 5
- Joined: Thu Jul 29, 2004 1:28 pm
- Location: USA
Oracle to Oracle performance
I have a 7.5.2 AIX parallel job that reads 10 million rows from one Oracle database, does minor transformations and does a truncate and load into another Oracle database and it takes 15 hours.
I have split the job into two jobs. First job is the Oracle to transformer to sequential file and that job reads and writes the 10 million rows in 1 hour. The next job reads the sequential file and does the truncate and load into Oracle for 10 million rows and takes 8 minutes which is over 20,000 rows/sec.
All the Ora stages are Oracle Enterprise. Has anyone else run into this? What is the reason for such a difference? Is there a better solution?
Thanks,
Eric
I have split the job into two jobs. First job is the Oracle to transformer to sequential file and that job reads and writes the 10 million rows in 1 hour. The next job reads the sequential file and does the truncate and load into Oracle for 10 million rows and takes 8 minutes which is over 20,000 rows/sec.
All the Ora stages are Oracle Enterprise. Has anyone else run into this? What is the reason for such a difference? Is there a better solution?
Thanks,
Eric
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Premium Member
- Posts: 151
- Joined: Fri Feb 13, 2009 4:19 pm
-
- Participant
- Posts: 5
- Joined: Thu Jul 29, 2004 1:28 pm
- Location: USA
The job queries one Oracle database and loads into a separate Oracle database. The query is against a view:
SELECT SDATE, LEVEL1, LEVEL2, LEVEL3, LEVEL4, LEVEL5, LEVEL6,CAST(C_PRED AS DECIMAL(38,10)) C_PRED
FROM BIEO_SL_BASELINE_FCST
The view definition has some grouping and joins:
I didn't check for any locks. I have run the 'ora query -> xfm -> seq' job several times and it's consistently taking around 45 to 50 minutes for the above to execute. The seq file is 1.5 GB.
The transformer stage is straight copying the 8 columns, adding 1 column that is set to @NULL, and adding 2 columns that are set to DSJobStartTimestamp.
Thanks,
Eric
SELECT SDATE, LEVEL1, LEVEL2, LEVEL3, LEVEL4, LEVEL5, LEVEL6,CAST(C_PRED AS DECIMAL(38,10)) C_PRED
FROM BIEO_SL_BASELINE_FCST
The view definition has some grouping and joins:
Code: Select all
CREATE OR REPLACE VIEW bieo_sl_baseline_fcst (sdate,
level1,
level2,
level3,
level4,
level5,
level6,
c_pred,
record_type
)
AS
SELECT sdate, level1, level2, level3, level4, level5, level6, c_pred,
record_type
FROM (SELECT MIN (t_dates_list.from_sales_date) AS sdate,
(CASE
WHEN ( MIN (t_dates_list.from_sales_date)
- TO_DATE ('05-16-2010', 'MM-DD-YYYY')
) <= 0
THEN (CASE
WHEN ( MAX (t_dates_list.from_sales_date)
- TO_DATE ('05-16-2010',
'MM-DD-YYYY'
)
) <= 0
THEN 1
ELSE 3
END
)
ELSE 2
END
) AS record_type,
MAX (comb_id) AS comb_id, t_comb_list.level_id1,
t_comb_list.level_id2, t_comb_list.level_id3,
t_comb_list.level_id4, t_comb_list.level_id5,
t_comb_list.level_id6,
NVL
(SUM
( DECODE
(branch_data.enter_fore,
NULL, (NVL
(branch_data.col_for_over,
( ( ( NVL
(branch_data.manual_stat,
NVL
(branch_data.sim_val_1,
branch_data.fore_2
* 1
)
)
* ( 1.00
+ NVL
(branch_data.manual_fact,
0
)
)
+ NVL
(branch_data.fixed_prom,
0
)
)
* ( 1
- NVL
(branch_data.rule_coll,
0
)
)
)
+ ( NVL (branch_data.int_cost,
0
)
* NVL (branch_data.rule_coll,
0
)
)
)
)
),
branch_data.ff
)
* 1
),
0
) AS c_pred,
(CASE
WHEN MAX (branch_data.note_id) > 0
THEN 1
ELSE 0
END
) AS note_id,
(CASE
WHEN MAX (branch_data.is_promotion) > 0
THEN 1
ELSE 0
END
) AS is_promotion,
(CASE
WHEN MAX (branch_data.is_supply_plan) > 0
THEN 1
ELSE 0
END
) AS is_supply_plan,
(CASE
WHEN MAX (branch_data.is_scenario_resource) > 0
THEN 1
ELSE 0
END
) AS is_scenario_resource,
1 AS is_base_0
FROM sales_data branch_data,
mdp_matrix,
t_ep_item,
t_ep_e1_parent_ad_num,
t_ep_ebs_tp_zone,
t_ep_e1_cust_cat_5,
t_ep_ebs_account,
t_ep_site,
int_comb_602_389 t_comb_list,
int_date_602_389 t_dates_list
WHERE branch_data.sales_date >= t_dates_list.from_sales_date
AND branch_data.sales_date <= t_dates_list.to_sales_date
AND branch_data.item_id = mdp_matrix.item_id
AND branch_data.location_id = mdp_matrix.location_id
AND mdp_matrix.t_ep_item_ep_id = t_ep_item.t_ep_item_ep_id
AND mdp_matrix.t_ep_e1_parent_ad_num_ep_id =
t_ep_e1_parent_ad_num.t_ep_e1_parent_ad_num_ep_id
AND mdp_matrix.t_ep_ebs_tp_zone_ep_id =
t_ep_ebs_tp_zone.t_ep_ebs_tp_zone_ep_id
AND mdp_matrix.t_ep_e1_cust_cat_5_ep_id =
t_ep_e1_cust_cat_5.t_ep_e1_cust_cat_5_ep_id
AND mdp_matrix.t_ep_ebs_account_ep_id =
t_ep_ebs_account.t_ep_ebs_account_ep_id
AND mdp_matrix.t_ep_site_ep_id = t_ep_site.t_ep_site_ep_id
AND t_comb_list.level_id1 = mdp_matrix.t_ep_item_ep_id
AND t_comb_list.level_id2 =
mdp_matrix.t_ep_e1_parent_ad_num_ep_id
AND t_comb_list.level_id3 = mdp_matrix.t_ep_ebs_tp_zone_ep_id
AND t_comb_list.level_id4 =
mdp_matrix.t_ep_e1_cust_cat_5_ep_id
AND t_comb_list.level_id5 = mdp_matrix.t_ep_ebs_account_ep_id
AND t_comb_list.level_id6 = mdp_matrix.t_ep_site_ep_id
GROUP BY t_comb_list.level_id1,
t_comb_list.level_id2,
t_comb_list.level_id3,
t_comb_list.level_id4,
t_comb_list.level_id5,
t_comb_list.level_id6,
t_dates_list.from_sales_date) int_table,
int_comb_602_389
WHERE int_comb_602_389.level_id1 = int_table.level_id1
AND int_comb_602_389.level_id2 = int_table.level_id2
AND int_comb_602_389.level_id3 = int_table.level_id3
AND int_comb_602_389.level_id4 = int_table.level_id4
AND int_comb_602_389.level_id5 = int_table.level_id5
AND int_comb_602_389.level_id6 = int_table.level_id6;
I didn't check for any locks. I have run the 'ora query -> xfm -> seq' job several times and it's consistently taking around 45 to 50 minutes for the above to execute. The seq file is 1.5 GB.
The transformer stage is straight copying the 8 columns, adding 1 column that is set to @NULL, and adding 2 columns that are set to DSJobStartTimestamp.
Thanks,
Eric
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 5
- Joined: Thu Jul 29, 2004 1:28 pm
- Location: USA
I tried a job using the stage variable to set DSJobStartTimestamp. Don't know why but it aborts after 4 million rows. Ran it again, same abort. Then I re-ran the previous job that queries -> xfm -> seq file and it ran 10 million rows fine. Must be a bug in DS 7.5.2. It sounds like a good tip but I think the main overall issue must have something to do with landing the data to a text file vs. not landing it. I wonder where DS stores the data when it's not landed using a seq file stage.
Thanks,
Eric
Thanks,
Eric
I was wondering along that same line: Is it possible that the CPU/memory being used is maxing out its resources? The data has to "live" somewhere when it's not being landed to a file, and buffer-to-file is normally going to be faster than buffer to Oracle, or so I assume. Maybe this is a hardware performance issue.DiscGolfer5000 wrote:...I think the main overall issue must have something to do with landing the data to a text file vs. not landing it. I wonder where DS stores the data when it's not landed using a seq file stage.
Thanks,
Eric
A number of years ago, Ken Bland posted several items here about landing data, including this thread. If you search for "pressure" with "kcbland" as the author, you'll also find a number of discussions on the concept of "back-pressure" and why landing between the extract and load addresses the issue that you are seeing.
Ken is one of our Premium Posters here, so not much of them will be accessable, I'm afraid, but did want to let you know they are out there. A search may turn up more, I don't recall off the top of my head.
Ken is one of our Premium Posters here, so not much of them will be accessable, I'm afraid, but did want to let you know they are out there. A search may turn up more, I don't recall off the top of my head.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers