Adavantages of using db2 query or data stage??????

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ahmedwaseem2000
Premium Member
Premium Member
Posts: 145
Joined: Fri Sep 15, 2006 12:23 pm

Adavantages of using db2 query or data stage??????

Post by ahmedwaseem2000 »

Hi,


Could any one of you let me know what is more advantageous? joining in the query itself, or using a lookup in data stage? having multiple case statements in DB2 or if conditions using transformers???

Please support your answers with factual information or do let me know the right place to dig this information.

Many Thanks,
Waseem
stavneg
Participant
Posts: 1
Joined: Fri May 27, 2005 9:29 am
Location: Dublin

Post by stavneg »

Its difficult to generalise about this question, it depends on the query, data demographics, and the flavour and set-up of the database etc. Generally I'd say its better to do joins and lookups in the RDBMS where possible. Its what they're designed for and it means there's less moving around of data between environments etc. That's providing the data you want to join or use as a lookup reference is in the DB already. If its in a file or input stream then you would have to weigh up the overheads of loading data to a temp table and joining in the DB versus using a transformer stage. If its just a few rows then a transformer may well be quicker, while if its a lot then the row-by-row lookup could be very slow and it might be better to do a bulk-load and inner join.
------------------
John Thompson
ahmedwaseem2000
Premium Member
Premium Member
Posts: 145
Joined: Fri Sep 15, 2006 12:23 pm

Post by ahmedwaseem2000 »

how about for the data of 10 million records and joining 8-10 tables, and the DB is DB2
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Again it depends. Ask your DBA. Depending upon how heavily the database server is loaded and/or depending upon the temp space your DBA will give you and Ok or not. If the DBA agrees, nothing like it. You can pass your sql join and get the result for further processing. Else do it in datastage.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
csrazdan
Participant
Posts: 127
Joined: Wed May 12, 2004 6:03 pm
Location: Chicago IL

Post by csrazdan »

Well it is difficult it generilize and would depend on case to case basis. Following is some guideline which I follow:
If you are joing table A and table B and the ratio of records in between the tables:

1. Is 1:1 definitly do the job in database.
2. Is 1000:1 consider doing the join in DataStage.
3. Is less than 1000:1 consider doing the join in database.

Hope it helps....
Assume everything I say or do is positive
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Anything that can leverage indexes do in the database. This can include restrictions, joins, some groupings and sorting (if there is a B-tree index on the column being sorted).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ahmedwaseem2000
Premium Member
Premium Member
Posts: 145
Joined: Fri Sep 15, 2006 12:23 pm

Post by ahmedwaseem2000 »

Well, I know my question is vague and not precise. However, I would still want to compare if it is better to have 50 case statements in one query or to have a transformer where i can define those 50 IF conditions??? and what would be the max case statements should be used incase, if the data is of high volume.
snt_ds
Premium Member
Premium Member
Posts: 280
Joined: Wed Oct 18, 2006 11:53 am
Location: Pune, India

For 8-10 million rows

Post by snt_ds »

In ideal world, it would be better to use a lookup stage in such scenario.
But based on how your database/table is loaded, you can also code it in the SQL statement.

However do not use a transformer as it has no specific advantage over a lookup.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

I dislike any transformations in the SQL select statement as they are mostly invisible to metadata reporting and can be hard to document and maintain. I would argue both the 50 case statement and 50 IF statement are both flawed. You should be looking at creating some type of business rule mapping table on your ETL server that transposes the cases to values stored in a table. If you really need 50 lines of code for a single transformation then consider a custom coded operator in C++ that can be properly built and tested.

I do support joining tables on a SQL select in order to reduce the volume of rows being brought onto the ETL server. Just as good is putting a layer of database views over the source tables that have been properly designed and optimised.
Post Reply