I've got a job that starts by reading a table via Oracle Connector. This table has 171217 rows.
There are then about 15 tables of various sorts used as lookups. Two have about 500,000 rows and two have about 6.5 million. The job is hanging up whether I'm reading 200 or 20 records from the source table (limited stress for initial testing). I mean running for over 24hours with no result (all links go blue, and target table has rows listed in the 'in' link, but no rows loaded).
So, I have two questions.
1) I understand that reference tables with large rows should be split out into seperate lookup stages. With that, I'm looking up the two tables with 6.5 million rows with their own lookup stage with all the others in one lookup stage (so three lookup stages total). Should they be in an entirely different job? That is, should I load the target table first referencing the low count lookup tables, then go back in a second job and reference those large tables? What's the best approach here?
2) Sort of related, our clients have given us some codes and values that aren't on a table, so I've created a txt file for them (three of these) and using a sequential file to input them in a lookup stage. Should I use a Lookup File Set for these? I understand that those are written to only. If that's the case, would one job write the file and a later job read them as sequential file? If that's the case, I can leave it as is.
Thanks!
Lookup job hanging up
Moderators: chulett, rschirm, roy
Reply
First run the source oracle stage query from the database and verify that it retreives rows. Then check the metadata defined for the query columns so that they conform with the source table. Then do a "view data" to verify that the source oracle stage is retreiving rows.
1> Use join stage instead of lookup for large tables (6.5 million rows)
2> Using sequential file stage for these lookups is not a problem. If they contain large number of rows then use join stage.
1> Use join stage instead of lookup for large tables (6.5 million rows)
2> Using sequential file stage for these lookups is not a problem. If they contain large number of rows then use join stage.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
When you say "large" rows, do you mean "wide" rows (many columns) or large number of rows?
It is a fundamental requirement for the Lookup stage that the whole volume of reference data must be able to be loaded into memory. Therefore you should avoid selecting any rows that you will not require, and you should also avoid selecting any columns that you will not require.
It is a fundamental requirement for the Lookup stage that the whole volume of reference data must be able to be loaded into memory. Therefore you should avoid selecting any rows that you will not require, and you should also avoid selecting any columns that you will not require.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You have been well advised.iq_etl wrote:We have removed columns not referenced, rows not needed, and use outer joins to reference the tables that have millions of rows.
The job no longer hangs up and takes only about a minute to execute.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.