Page 1 of 1

Join in server job

Posted: Thu Jul 09, 2009 7:40 am
by keshav0307
I have to join five large tables (8 millions and about 4-5 GB data in each), and then insert records into another table.

currently we are doing it using procedures and the procedures takes almost 1 day for processing; so now wanted to do it in Datastage.

i am new to server job. can't find any stage like join, lookup.
hash file also has limit of 2 GB....

any suggestion please.

Posted: Thu Jul 09, 2009 7:51 am
by chulett
If these "five large tables" are all in the same database then your join is a sql join in the source db stage. Otherwise, yes a hashed file can be used to effect a join. That 2GB limit is only for 32bit (default) hashed files, they can be created as 64bit so there is no such limit.

Would need more details about the nature of this "join" you are doing.

Posted: Thu Jul 09, 2009 7:55 am
by Sainath.Srinivasan
Server join are from static files.

You can do lookup, but you can join in database and use DataStage for other transformation activities.

Posted: Thu Jul 09, 2009 8:00 am
by chulett
The 'static files' comment would be a Server merge operation.

Posted: Thu Jul 09, 2009 8:14 am
by keshav0307
currently the tables are left outer joined with master table, and inserted into a temporay table,
then join this temporary table to another table and load another temporary table,
then join this second temporary table to another table and load into third temporary table, .................like wise...

a cursor is used to select and insert records, because of larger volume a single SQL Select .......insert fail for rollback segment..

Posted: Thu Jul 09, 2009 8:19 am
by chulett
Okay... has anyone looked into doing all that in a single query? Should be perfectly feasible.

Posted: Thu Jul 09, 2009 8:22 am
by Sainath.Srinivasan
Can you split the rows into multiple groups - something like if tableCustomer returns any rows then follow tableCustomerDetails else follow tableInternalCustomer etc.

If you can split by either logic or subject area, you can break the SQL into multiple jobs with simpler logic.

This way it will also be easy for testing.