Lookup job hanging up

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
iq_etl
Premium Member
Premium Member
Posts: 105
Joined: Tue Feb 08, 2011 9:26 am

Lookup job hanging up

Post by iq_etl »

I've got a job that starts by reading a table via Oracle Connector. This table has 171217 rows.

There are then about 15 tables of various sorts used as lookups. Two have about 500,000 rows and two have about 6.5 million. The job is hanging up whether I'm reading 200 or 20 records from the source table (limited stress for initial testing). I mean running for over 24hours with no result (all links go blue, and target table has rows listed in the 'in' link, but no rows loaded).

So, I have two questions.

1) I understand that reference tables with large rows should be split out into seperate lookup stages. With that, I'm looking up the two tables with 6.5 million rows with their own lookup stage with all the others in one lookup stage (so three lookup stages total). Should they be in an entirely different job? That is, should I load the target table first referencing the low count lookup tables, then go back in a second job and reference those large tables? What's the best approach here?

2) Sort of related, our clients have given us some codes and values that aren't on a table, so I've created a txt file for them (three of these) and using a sequential file to input them in a lookup stage. Should I use a Lookup File Set for these? I understand that those are written to only. If that's the case, would one job write the file and a later job read them as sequential file? If that's the case, I can leave it as is.

Thanks!
ssnegi
Participant
Posts: 138
Joined: Thu Nov 15, 2007 4:17 am
Location: Sydney, Australia

Reply

Post by ssnegi »

First run the source oracle stage query from the database and verify that it retreives rows. Then check the metadata defined for the query columns so that they conform with the source table. Then do a "view data" to verify that the source oracle stage is retreiving rows.
1> Use join stage instead of lookup for large tables (6.5 million rows)
2> Using sequential file stage for these lookups is not a problem. If they contain large number of rows then use join stage.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

When you say "large" rows, do you mean "wide" rows (many columns) or large number of rows?

It is a fundamental requirement for the Lookup stage that the whole volume of reference data must be able to be loaded into memory. Therefore you should avoid selecting any rows that you will not require, and you should also avoid selecting any columns that you will not require.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
iq_etl
Premium Member
Premium Member
Posts: 105
Joined: Tue Feb 08, 2011 9:26 am

Post by iq_etl »

We have removed columns not referenced, rows not needed, and use outer joins to reference the tables that have millions of rows.

The job no longer hangs up and takes only about a minute to execute.
ssnegi
Participant
Posts: 138
Joined: Thu Nov 15, 2007 4:17 am
Location: Sydney, Australia

Reply

Post by ssnegi »

While using join stage please make sure that you use hash partition sorted on the join keys. Sometimes it can give incorrect results if hash partitioing and sorting is not done.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

iq_etl wrote:We have removed columns not referenced, rows not needed, and use outer joins to reference the tables that have millions of rows.

The job no longer hangs up and takes only about a minute to execute.
You have been well advised. :wink:
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply