Lookup job hanging up

iq_etl · Post by **iq_etl** » Fri Feb 14, 2014 8:44 am

I've got a job that starts by reading a table via Oracle Connector. This table has 171217 rows.

There are then about 15 tables of various sorts used as lookups. Two have about 500,000 rows and two have about 6.5 million. The job is hanging up whether I'm reading 200 or 20 records from the source table (limited stress for initial testing). I mean running for over 24hours with no result (all links go blue, and target table has rows listed in the 'in' link, but no rows loaded).

So, I have two questions.

1) I understand that reference tables with large rows should be split out into seperate lookup stages. With that, I'm looking up the two tables with 6.5 million rows with their own lookup stage with all the others in one lookup stage (so three lookup stages total). Should they be in an entirely different job? That is, should I load the target table first referencing the low count lookup tables, then go back in a second job and reference those large tables? What's the best approach here?

2) Sort of related, our clients have given us some codes and values that aren't on a table, so I've created a txt file for them (three of these) and using a sequential file to input them in a lookup stage. Should I use a Lookup File Set for these? I understand that those are written to only. If that's the case, would one job write the file and a later job read them as sequential file? If that's the case, I can leave it as is.

Thanks!

ssnegi · Post by **ssnegi** » Sun Feb 16, 2014 6:13 pm

First run the source oracle stage query from the database and verify that it retreives rows. Then check the metadata defined for the query columns so that they conform with the source table. Then do a "view data" to verify that the source oracle stage is retreiving rows.
1> Use join stage instead of lookup for large tables (6.5 million rows)
2> Using sequential file stage for these lookups is not a problem. If they contain large number of rows then use join stage.

ray.wurlod · Post by **ray.wurlod** » Sun Feb 16, 2014 9:17 pm

When you say "large" rows, do you mean "wide" rows (many columns) or large number of rows?

It is a fundamental requirement for the Lookup stage that the whole volume of reference data must be able to be loaded into memory. Therefore you should avoid selecting any rows that you will not require, and you should also avoid selecting any columns that you will not require.

iq_etl · Post by **iq_etl** » Tue Feb 18, 2014 1:13 pm

We have removed columns not referenced, rows not needed, and use outer joins to reference the tables that have millions of rows.

The job no longer hangs up and takes only about a minute to execute.

ssnegi · Post by **ssnegi** » Tue Feb 18, 2014 11:25 pm

While using join stage please make sure that you use hash partition sorted on the join keys. Sometimes it can give incorrect results if hash partitioing and sorting is not done.

ray.wurlod · Post by **ray.wurlod** » Tue Feb 18, 2014 11:45 pm

iq_etl wrote:We have removed columns not referenced, rows not needed, and use outer joins to reference the tables that have millions of rows.

The job no longer hangs up and takes only about a minute to execute.

You have been well advised.

DSXchange

Lookup job hanging up

Lookup job hanging up

Reply

Reply