PERFORMANCE ???????
Moderators: chulett, rschirm, roy
PERFORMANCE ???????
Hi friends,
I need a some suggestions from the Kingkongs of DataStage please.
My
Sources: two tables (oracle)
target: 1 table(oracle)
My source is having 19 matching columns in both tables(sources) and want to populate data to target if the values of 19 matching columns are equal.
I have tried
oracle to oracle(loading) doing a lookup
oracle-flatfile flatfile to oracle doing a lookup
both are showing very very less performance than the expected.
my source would be around 4 million rows per day.
Using Data stage is must here.
will appreciate your inputs friends.
Thanks.
I need a some suggestions from the Kingkongs of DataStage please.
My
Sources: two tables (oracle)
target: 1 table(oracle)
My source is having 19 matching columns in both tables(sources) and want to populate data to target if the values of 19 matching columns are equal.
I have tried
oracle to oracle(loading) doing a lookup
oracle-flatfile flatfile to oracle doing a lookup
both are showing very very less performance than the expected.
my source would be around 4 million rows per day.
Using Data stage is must here.
will appreciate your inputs friends.
Thanks.
PERFORMANCE ???????
No that wont give me my requiremnt.ShaneMuir wrote:Can you do a user defined SQL and do the join in your query to give the required output?
anyother way please.
Re: PERFORMANCE ???????
Can I ask why that won't give you your requirement. I would think that some sort of inner join in SQL would be quicker?karry450 wrote: No that wont give me my requiremnt.
anyother way please.
Are all 19 fields key fields in each table?
Re: PERFORMANCE ???????
They are not the key fields and I want to use data stage here.ShaneMuir wrote:Can I ask why that won't give you your requirement. I would think that some sort of inner join in SQL would be quicker?karry450 wrote: No that wont give me my requiremnt.
anyother way please.
Are all 19 fields key fields in each table?
The 19 columns, what are their datatypes?
Also, Are you matching or comparing? Matching means that these 19 columns are the key, whereas comparing means there's a primary key column in both tables that matches and you're comparing the 19 columns to see if the row is different.
Please clarify because I think your English explanation is not clear enough.
Also, Are you matching or comparing? Matching means that these 19 columns are the key, whereas comparing means there's a primary key column in both tables that matches and you're comparing the 19 columns to see if the row is different.
Please clarify because I think your English explanation is not clear enough.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
The fact that there are 19 columns that has to be matched in itself is an indicator that the query is not that simple and straight forward. It will have some performance issues. I assume there will be an index on those 19 columns on both the tables.
Here I what I would try.
IF INDEX EXISTS ON BOTH TABLES
(1) Join 2 source tables by 19 columns and then bulk load the results in
target. I would prefer manual SQL to be executed from the source
because I will be able to run explain on my SQL and make sure it is
using all the required index on both source tables.
IF NO INDEX EXISTS ON ANY TABLE ON 19 COLUMNS
(1) Dump data from Table A into Text File
(2) Dump data from Table B into Text File
(3) Join two datafile by those 19 columns and then bulk load into target.
Here I what I would try.
IF INDEX EXISTS ON BOTH TABLES
(1) Join 2 source tables by 19 columns and then bulk load the results in
target. I would prefer manual SQL to be executed from the source
because I will be able to run explain on my SQL and make sure it is
using all the required index on both source tables.
IF NO INDEX EXISTS ON ANY TABLE ON 19 COLUMNS
(1) Dump data from Table A into Text File
(2) Dump data from Table B into Text File
(3) Join two datafile by those 19 columns and then bulk load into target.
-
- Premium Member
- Posts: 99
- Joined: Tue Aug 17, 2004 7:50 am
- Location: Boulder, Colorado
Datastage SE Oracle performance
Hi,
One thing we find that works very well and is easy is some non-intuitive parameters for Oracle stage. Whatever you are doing this helps a lot with SE Oracle. I'm not saying that it will do a 4 mill row table operation in 10 minutes but these things help over the defaults of Oracle SE stage.
For Oracle reads:
set Tranaction Isolation to "Read Committed"
set Array Size to 500
For Oracle writes additionally
set Rows per Transaction to 500
... Flash Gordon
One thing we find that works very well and is easy is some non-intuitive parameters for Oracle stage. Whatever you are doing this helps a lot with SE Oracle. I'm not saying that it will do a 4 mill row table operation in 10 minutes but these things help over the defaults of Oracle SE stage.
For Oracle reads:
set Tranaction Isolation to "Read Committed"
set Array Size to 500
For Oracle writes additionally
set Rows per Transaction to 500
... Flash Gordon
Flash Gordon
Hyperborean Software Solution
Hyperborean Software Solution
Realizing you said you want to use DataStage for this process, I'm going to make this suggestion.
I am not sure whether it is clear yet that the 19 fields are a business key or not (that seems quite high to me from a database engineer's viewpoint), but I'm going to assume that the 19 fields in question are what determine whether or not something exists in both tables or rows need to be inserted to target. You may need to adjust my suggestion if it even seems relevant.
Extract the 19 fields (and any other pertinent fields) from each table into separate sequential files, ordered by the 19 fields. Then create a script to parse the file for you using whatever scripting language you are proficient with (Perl would do this pretty efficiently and you have it standard on Unix). Have the script compare the files, ignoring dupes and outputting new rows to a separate file. Then bulk insert the new rows. If there are updates, then you just add logic in the script to output rows with differing values for existing keys to an update file, and have a separate job to update target rows.
Call the script as an after job routine from the second extract job (if you run them sequentially) and that should be very efficient in getting just what you want.
I am not sure whether it is clear yet that the 19 fields are a business key or not (that seems quite high to me from a database engineer's viewpoint), but I'm going to assume that the 19 fields in question are what determine whether or not something exists in both tables or rows need to be inserted to target. You may need to adjust my suggestion if it even seems relevant.
Extract the 19 fields (and any other pertinent fields) from each table into separate sequential files, ordered by the 19 fields. Then create a script to parse the file for you using whatever scripting language you are proficient with (Perl would do this pretty efficiently and you have it standard on Unix). Have the script compare the files, ignoring dupes and outputting new rows to a separate file. Then bulk insert the new rows. If there are updates, then you just add logic in the script to output rows with differing values for existing keys to an update file, and have a separate job to update target rows.
Call the script as an after job routine from the second extract job (if you run them sequentially) and that should be very efficient in getting just what you want.
Bestest!
John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services
"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services
"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 132
- Joined: Wed Mar 01, 2006 11:12 pm
- Location: Pune
Re: PERFORMANCE ???????
Try using a link collector to collect te data from both the sources and then using the Sort and Agregator stages to merge the data and load only those rows which have a count > 1.karry450 wrote:Hi friends,
I need a some suggestions from the Kingkongs of DataStage please.
My
Sources: two tables (oracle)
target: 1 table(oracle)
My source is having 19 matching columns in both tables(sources) and want to populate data to target if the values of 19 matching columns are equal.
I have tried
oracle to oracle(loading) doing a lookup
oracle-flatfile flatfile to oracle doing a lookup
both are showing very very less performance than the expected.
my source would be around 4 million rows per day.
Using Data stage is must here.
will appreciate your inputs friends.
Thanks.
I dont know whether it is the most efficient way to achieve your target but its worth giving a try.
![Smile :)](./images/smilies/icon_smile.gif)
SMB