Join

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Mark j
Participant
Posts: 20
Joined: Tue Apr 20, 2004 9:26 am

Join

Post by Mark j »

HI
How can i join between 2 tables one is from db2 and one is sqlserver (driver file) which stage should i use to do this???

thanks
Mark
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

ODBC stage to extract from SQL Server to feed the stream input.

ODBC stage or DB2 stage to extract from DB2 to feed the reference input.

Perhaps better (from a performance point of view) would be to pre-load a hashed file with only the needed columns and rows from DB2 and to use the hashed file stage to feed the reference input.

SQL Server is a trademark of Microsoft Corporation.
DB2 is a registered trademark of International Business Machines Corporation.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Mark j
Participant
Posts: 20
Joined: Tue Apr 20, 2004 9:26 am

Post by Mark j »

but, if i preload to a hash file with 30 million records then its taking around 5 hrs to populate the hash file... which i guess is not improving the performance that is very much required in my case as, this job run on daily basis.....

so loading to hash file is 5 hrs and processing the lookup is another 3 hrs with million records from driver file , so altogether its 8 hrs which is the same time if i use the db2 stage for lookup directly instead of using preloading to hash file....

since we are capturing CCDs data in the driver file and hash file always changes on the daily basis..

so can anyone suggest the ways to improve performance


Thanks in advance
Mark
Mark j
Participant
Posts: 20
Joined: Tue Apr 20, 2004 9:26 am

Post by Mark j »

and how to join between 2 tables if they are from different databases
main file is from mssqlserver and reference file is from db2udb ... i need to do this using sql.... is it possible
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Yes, it's possible, but you're going to need to spend a lot more money on gateway software. :cry:

It's not possible with DataStage. One stage connects to one data source.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Some thoughts on improving performance:
- Do a DB2 fast export of your 30 million rows to a sequential file, find a fast way to get it to your ETL server such as ftp and process it into a hash file from there. You should find loading 30 million records from a local sequential file is a lot faster then from a DB2 stage.
- Do parallel loads of your hash file using multiple instance jobs.
- Do delta loads of your 30 million row hash file. You got a date modified field on that source table?
- Play around with the hash file settings using the hash file configuration tool on your unsupported utilities folder on the DataStage CD. There are a lot of threads on this site regarding configuring large hash files.
- Can you preload your hash file independantly at an earlier time? Does it need to be loaded daily or can it be loaded weekly?
Mark j
Participant
Posts: 20
Joined: Tue Apr 20, 2004 9:26 am

Post by Mark j »

But its daily process , then how can i automate using db2fast export onto a sequential file...

and what i wanted to do is get the mssqlserver driver file into db2 and then do a join ..... i haven't tested the performance but i think it should imporve the performance...


Thanks
Mark
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

A join between two tables in a database is usually faster then a reference lookup in a DataStage job. You can use a DataStage job to load your SQL Server table straight into DB2. I believe there are some features in DB2 that let you view or attach a table from another database type so it appears in the DB2 schema. Have you spoken to your DB2 support team about this? The data would remain in SQL Server but could be joined in DB2. You still get a network overhead from this join but it might be fast enough.
Post Reply