PERFORMANCE ???????

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
karry450
Participant
Posts: 201
Joined: Wed Nov 14, 2007 11:31 pm
Location: HYD

PERFORMANCE ???????

Post by karry450 »

Hi friends,

I need a some suggestions from the Kingkongs of DataStage please.


My
Sources: two tables (oracle)
target: 1 table(oracle)

My source is having 19 matching columns in both tables(sources) and want to populate data to target if the values of 19 matching columns are equal.

I have tried
oracle to oracle(loading) doing a lookup
oracle-flatfile flatfile to oracle doing a lookup

both are showing very very less performance than the expected.

my source would be around 4 million rows per day.

Using Data stage is must here.

will appreciate your inputs friends.

Thanks.
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

Can you do a user defined SQL and do the join in your query to give the required output?
karry450
Participant
Posts: 201
Joined: Wed Nov 14, 2007 11:31 pm
Location: HYD

PERFORMANCE ???????

Post by karry450 »

ShaneMuir wrote:Can you do a user defined SQL and do the join in your query to give the required output?
No that wont give me my requiremnt.

anyother way please.
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Re: PERFORMANCE ???????

Post by ShaneMuir »

karry450 wrote: No that wont give me my requiremnt.

anyother way please.
Can I ask why that won't give you your requirement. I would think that some sort of inner join in SQL would be quicker?

Are all 19 fields key fields in each table?
karry450
Participant
Posts: 201
Joined: Wed Nov 14, 2007 11:31 pm
Location: HYD

Re: PERFORMANCE ???????

Post by karry450 »

ShaneMuir wrote:
karry450 wrote: No that wont give me my requiremnt.

anyother way please.
Can I ask why that won't give you your requirement. I would think that some sort of inner join in SQL would be quicker?

Are all 19 fields key fields in each table?
They are not the key fields and I want to use data stage here.
deva
Participant
Posts: 104
Joined: Fri Dec 29, 2006 1:54 pm

Post by deva »

ShaneMuir wrote:Can you do a user defined SQL and do the join in your query to give the required output?
Hi,
My suggestion is splits that 19 matches in to 5 or 6 stages then join each other using hash files. It makes some what good performance....

If it possible increase the array size.


Deva...
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

The 19 columns, what are their datatypes?

Also, Are you matching or comparing? Matching means that these 19 columns are the key, whereas comparing means there's a primary key column in both tables that matches and you're comparing the 19 columns to see if the row is different.

Please clarify because I think your English explanation is not clear enough.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
shamshad
Premium Member
Premium Member
Posts: 147
Joined: Wed Aug 25, 2004 1:39 pm
Location: Detroit,MI

Post by shamshad »

The fact that there are 19 columns that has to be matched in itself is an indicator that the query is not that simple and straight forward. It will have some performance issues. I assume there will be an index on those 19 columns on both the tables.

Here I what I would try.

IF INDEX EXISTS ON BOTH TABLES

(1) Join 2 source tables by 19 columns and then bulk load the results in
target. I would prefer manual SQL to be executed from the source
because I will be able to run explain on my SQL and make sure it is
using all the required index on both source tables.

IF NO INDEX EXISTS ON ANY TABLE ON 19 COLUMNS

(1) Dump data from Table A into Text File
(2) Dump data from Table B into Text File
(3) Join two datafile by those 19 columns and then bulk load into target.
flashgordon
Premium Member
Premium Member
Posts: 99
Joined: Tue Aug 17, 2004 7:50 am
Location: Boulder, Colorado

Datastage SE Oracle performance

Post by flashgordon »

Hi,

One thing we find that works very well and is easy is some non-intuitive parameters for Oracle stage. Whatever you are doing this helps a lot with SE Oracle. I'm not saying that it will do a 4 mill row table operation in 10 minutes but these things help over the defaults of Oracle SE stage.

For Oracle reads:
set Tranaction Isolation to "Read Committed"
set Array Size to 500

For Oracle writes additionally
set Rows per Transaction to 500

... Flash Gordon
Flash Gordon
Hyperborean Software Solution
jdmiceli
Premium Member
Premium Member
Posts: 309
Joined: Wed Feb 22, 2006 10:03 am
Location: Urbandale, IA

Post by jdmiceli »

Realizing you said you want to use DataStage for this process, I'm going to make this suggestion.

I am not sure whether it is clear yet that the 19 fields are a business key or not (that seems quite high to me from a database engineer's viewpoint), but I'm going to assume that the 19 fields in question are what determine whether or not something exists in both tables or rows need to be inserted to target. You may need to adjust my suggestion if it even seems relevant.

Extract the 19 fields (and any other pertinent fields) from each table into separate sequential files, ordered by the 19 fields. Then create a script to parse the file for you using whatever scripting language you are proficient with (Perl would do this pretty efficiently and you have it standard on Unix). Have the script compare the files, ignoring dupes and outputting new rows to a separate file. Then bulk insert the new rows. If there are updates, then you just add logic in the script to output rows with differing values for existing keys to an update file, and have a separate job to update target rows.

Call the script as an after job routine from the second extract job (if you run them sequentially) and that should be very efficient in getting just what you want.
Bestest!

John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services


"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Well, at least now we know who the "Kingkongs of DataStage" are!
:lol:
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Well, at least now we know who the "Kingkongs of DataStage" are!
:lol:
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jdmiceli
Premium Member
Premium Member
Posts: 309
Joined: Wed Feb 22, 2006 10:03 am
Location: Urbandale, IA

Post by jdmiceli »

Ooo Ooo Ahh Ahh

(imagine hands reaching under to scratch my own armpits :roll: )
Bestest!

John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services


"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
baglasumit21
Participant
Posts: 132
Joined: Wed Mar 01, 2006 11:12 pm
Location: Pune

Re: PERFORMANCE ???????

Post by baglasumit21 »

karry450 wrote:Hi friends,

I need a some suggestions from the Kingkongs of DataStage please.


My
Sources: two tables (oracle)
target: 1 table(oracle)

My source is having 19 matching columns in both tables(sources) and want to populate data to target if the values of 19 matching columns are equal.

I have tried
oracle to oracle(loading) doing a lookup
oracle-flatfile flatfile to oracle doing a lookup

both are showing very very less performance than the expected.

my source would be around 4 million rows per day.

Using Data stage is must here.

will appreciate your inputs friends.

Thanks.
Try using a link collector to collect te data from both the sources and then using the Sort and Agregator stages to merge the data and load only those rows which have a count > 1.

I dont know whether it is the most efficient way to achieve your target but its worth giving a try. :)
SMB
Post Reply