Page 1 of 1

PERFORMANCE ???????

Posted: Thu Feb 28, 2008 11:00 pm
by karry450
Hi friends,

I need a some suggestions from the Kingkongs of DataStage please.


My
Sources: two tables (oracle)
target: 1 table(oracle)

My source is having 19 matching columns in both tables(sources) and want to populate data to target if the values of 19 matching columns are equal.

I have tried
oracle to oracle(loading) doing a lookup
oracle-flatfile flatfile to oracle doing a lookup

both are showing very very less performance than the expected.

my source would be around 4 million rows per day.

Using Data stage is must here.

will appreciate your inputs friends.

Thanks.

Posted: Thu Feb 28, 2008 11:15 pm
by ShaneMuir
Can you do a user defined SQL and do the join in your query to give the required output?

PERFORMANCE ???????

Posted: Thu Feb 28, 2008 11:18 pm
by karry450
ShaneMuir wrote:Can you do a user defined SQL and do the join in your query to give the required output?
No that wont give me my requiremnt.

anyother way please.

Re: PERFORMANCE ???????

Posted: Fri Feb 29, 2008 12:18 am
by ShaneMuir
karry450 wrote: No that wont give me my requiremnt.

anyother way please.
Can I ask why that won't give you your requirement. I would think that some sort of inner join in SQL would be quicker?

Are all 19 fields key fields in each table?

Re: PERFORMANCE ???????

Posted: Fri Feb 29, 2008 1:08 am
by karry450
ShaneMuir wrote:
karry450 wrote: No that wont give me my requiremnt.

anyother way please.
Can I ask why that won't give you your requirement. I would think that some sort of inner join in SQL would be quicker?

Are all 19 fields key fields in each table?
They are not the key fields and I want to use data stage here.

Posted: Fri Feb 29, 2008 1:05 pm
by deva
ShaneMuir wrote:Can you do a user defined SQL and do the join in your query to give the required output?
Hi,
My suggestion is splits that 19 matches in to 5 or 6 stages then join each other using hash files. It makes some what good performance....

If it possible increase the array size.


Deva...

Posted: Fri Feb 29, 2008 1:17 pm
by kcbland
The 19 columns, what are their datatypes?

Also, Are you matching or comparing? Matching means that these 19 columns are the key, whereas comparing means there's a primary key column in both tables that matches and you're comparing the 19 columns to see if the row is different.

Please clarify because I think your English explanation is not clear enough.

Posted: Fri Feb 29, 2008 1:33 pm
by shamshad
The fact that there are 19 columns that has to be matched in itself is an indicator that the query is not that simple and straight forward. It will have some performance issues. I assume there will be an index on those 19 columns on both the tables.

Here I what I would try.

IF INDEX EXISTS ON BOTH TABLES

(1) Join 2 source tables by 19 columns and then bulk load the results in
target. I would prefer manual SQL to be executed from the source
because I will be able to run explain on my SQL and make sure it is
using all the required index on both source tables.

IF NO INDEX EXISTS ON ANY TABLE ON 19 COLUMNS

(1) Dump data from Table A into Text File
(2) Dump data from Table B into Text File
(3) Join two datafile by those 19 columns and then bulk load into target.

Datastage SE Oracle performance

Posted: Fri Feb 29, 2008 5:25 pm
by flashgordon
Hi,

One thing we find that works very well and is easy is some non-intuitive parameters for Oracle stage. Whatever you are doing this helps a lot with SE Oracle. I'm not saying that it will do a 4 mill row table operation in 10 minutes but these things help over the defaults of Oracle SE stage.

For Oracle reads:
set Tranaction Isolation to "Read Committed"
set Array Size to 500

For Oracle writes additionally
set Rows per Transaction to 500

... Flash Gordon

Posted: Wed Mar 05, 2008 5:55 pm
by jdmiceli
Realizing you said you want to use DataStage for this process, I'm going to make this suggestion.

I am not sure whether it is clear yet that the 19 fields are a business key or not (that seems quite high to me from a database engineer's viewpoint), but I'm going to assume that the 19 fields in question are what determine whether or not something exists in both tables or rows need to be inserted to target. You may need to adjust my suggestion if it even seems relevant.

Extract the 19 fields (and any other pertinent fields) from each table into separate sequential files, ordered by the 19 fields. Then create a script to parse the file for you using whatever scripting language you are proficient with (Perl would do this pretty efficiently and you have it standard on Unix). Have the script compare the files, ignoring dupes and outputting new rows to a separate file. Then bulk insert the new rows. If there are updates, then you just add logic in the script to output rows with differing values for existing keys to an update file, and have a separate job to update target rows.

Call the script as an after job routine from the second extract job (if you run them sequentially) and that should be very efficient in getting just what you want.

Posted: Wed Mar 05, 2008 6:27 pm
by ray.wurlod
Well, at least now we know who the "Kingkongs of DataStage" are!
:lol:

Posted: Wed Mar 05, 2008 6:29 pm
by ray.wurlod
Well, at least now we know who the "Kingkongs of DataStage" are!
:lol:

Posted: Thu Mar 06, 2008 8:25 am
by jdmiceli
Ooo Ooo Ahh Ahh

(imagine hands reaching under to scratch my own armpits :roll: )

Re: PERFORMANCE ???????

Posted: Mon Mar 10, 2008 2:21 am
by baglasumit21
karry450 wrote:Hi friends,

I need a some suggestions from the Kingkongs of DataStage please.


My
Sources: two tables (oracle)
target: 1 table(oracle)

My source is having 19 matching columns in both tables(sources) and want to populate data to target if the values of 19 matching columns are equal.

I have tried
oracle to oracle(loading) doing a lookup
oracle-flatfile flatfile to oracle doing a lookup

both are showing very very less performance than the expected.

my source would be around 4 million rows per day.

Using Data stage is must here.

will appreciate your inputs friends.

Thanks.
Try using a link collector to collect te data from both the sources and then using the Sort and Agregator stages to merge the data and load only those rows which have a count > 1.

I dont know whether it is the most efficient way to achieve your target but its worth giving a try. :)