Join Stage Dynamics
Posted: Tue May 12, 2015 1:20 am
Hi,
Scenario -
The left link(stream) has 100 records with just 5 keys and the right link(reference) has around 10000 records with 50 keys.
I'm looking to do an Inner Join. The data is key partitioned and sorted.
Question 1 -
Will DataStage bring in all 10000 records into the server? Or will it stop pulling data as soon as it has found data for all 5 keys from the stream link?
Question 2 -
The idea is to bring in data from the reference table only for matching keys in the stream. The reference data is huge and I'm wondering if bringing all the data into DataStage to do a join would be optimal or not. What are some of the other approaches/strategies to achieve optimum performance to do such key based extraction?
I was thinking on the lines of creating a temporary table that would hold all the keys from the stream link. I'd use this temp table and do an in-database join to get only the required data from my reference table.
Appreciate any inputs on this.
Thank you
Scenario -
The left link(stream) has 100 records with just 5 keys and the right link(reference) has around 10000 records with 50 keys.
I'm looking to do an Inner Join. The data is key partitioned and sorted.
Question 1 -
Will DataStage bring in all 10000 records into the server? Or will it stop pulling data as soon as it has found data for all 5 keys from the stream link?
Question 2 -
The idea is to bring in data from the reference table only for matching keys in the stream. The reference data is huge and I'm wondering if bringing all the data into DataStage to do a join would be optimal or not. What are some of the other approaches/strategies to achieve optimum performance to do such key based extraction?
I was thinking on the lines of creating a temporary table that would hold all the keys from the stream link. I'd use this temp table and do an in-database join to get only the required data from my reference table.
Appreciate any inputs on this.
Thank you