Page 1 of 1

Join

Posted: Tue May 11, 2004 10:08 am
by Mark j
HI
How can i join between 2 tables one is from db2 and one is sqlserver (driver file) which stage should i use to do this???

thanks
Mark

Posted: Tue May 11, 2004 4:06 pm
by ray.wurlod
ODBC stage to extract from SQL Server to feed the stream input.

ODBC stage or DB2 stage to extract from DB2 to feed the reference input.

Perhaps better (from a performance point of view) would be to pre-load a hashed file with only the needed columns and rows from DB2 and to use the hashed file stage to feed the reference input.

SQL Server is a trademark of Microsoft Corporation.
DB2 is a registered trademark of International Business Machines Corporation.

Posted: Wed May 12, 2004 8:11 am
by Mark j
but, if i preload to a hash file with 30 million records then its taking around 5 hrs to populate the hash file... which i guess is not improving the performance that is very much required in my case as, this job run on daily basis.....

so loading to hash file is 5 hrs and processing the lookup is another 3 hrs with million records from driver file , so altogether its 8 hrs which is the same time if i use the db2 stage for lookup directly instead of using preloading to hash file....

since we are capturing CCDs data in the driver file and hash file always changes on the daily basis..

so can anyone suggest the ways to improve performance


Thanks in advance
Mark

Posted: Wed May 12, 2004 10:19 am
by Mark j
and how to join between 2 tables if they are from different databases
main file is from mssqlserver and reference file is from db2udb ... i need to do this using sql.... is it possible

Posted: Wed May 12, 2004 4:33 pm
by ray.wurlod
Yes, it's possible, but you're going to need to spend a lot more money on gateway software. :cry:

It's not possible with DataStage. One stage connects to one data source.

Posted: Wed May 12, 2004 7:58 pm
by vmcburney
Some thoughts on improving performance:
- Do a DB2 fast export of your 30 million rows to a sequential file, find a fast way to get it to your ETL server such as ftp and process it into a hash file from there. You should find loading 30 million records from a local sequential file is a lot faster then from a DB2 stage.
- Do parallel loads of your hash file using multiple instance jobs.
- Do delta loads of your 30 million row hash file. You got a date modified field on that source table?
- Play around with the hash file settings using the hash file configuration tool on your unsupported utilities folder on the DataStage CD. There are a lot of threads on this site regarding configuring large hash files.
- Can you preload your hash file independantly at an earlier time? Does it need to be loaded daily or can it be loaded weekly?

Posted: Thu May 13, 2004 7:18 am
by Mark j
But its daily process , then how can i automate using db2fast export onto a sequential file...

and what i wanted to do is get the mssqlserver driver file into db2 and then do a join ..... i haven't tested the performance but i think it should imporve the performance...


Thanks
Mark

Posted: Thu May 13, 2004 7:00 pm
by vmcburney
A join between two tables in a database is usually faster then a reference lookup in a DataStage job. You can use a DataStage job to load your SQL Server table straight into DB2. I believe there are some features in DB2 that let you view or attach a table from another database type so it appears in the DB2 schema. Have you spoken to your DB2 support team about this? The data would remain in SQL Server but could be joined in DB2. You still get a network overhead from this join but it might be fast enough.