Dears,
I'm a new practitioner to server jobs with parallel background. I'm using the transformer stage to perform a lookup operation. There're two inputs (DB2 stages) for transformer stage and one output (DB2 stage also). Although there's a match between inputs on the key column, the output column from the second link is zero which means that there's no match.
For example,
Data from stream is:
ID Name
1 XxX
2 YyY
Data from reference is:
ID Country_Name
1 Egypt
2 US
All data go to the reject link as follows:
ID Name Country_Name
1 XxX 0
2 YyY 0
Any Suggestions regarding this behaviour?
Regards.
Lookup Transformer produces Zeros
Moderators: chulett, rschirm, roy
So... a direct database lookup rather than a hashed file? The latter is preferred and much more consistant but you can do it direct like that as long as you understand the performance considerations. And since it's a sql query, it behaves the same as if you had run the "select x from y where z" in your favorite tool. That and you can structure it to return whatever you like, hit or miss.
Not sure what you're asking - why is the lookup failing? We'd have to see the sql you are using and know your key v. non-key columns to be of any help there.
Not sure what you're asking - why is the lookup failing? We'd have to see the sql you are using and know your key v. non-key columns to be of any help there.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
That's exactly what I'm asking. The expected result should be:chulett wrote: why is the lookup failing?
ID Name Country_Name
1 XxX Egypt
2 YyY US
What I didn't mention that the key column of lookup is 'ID' with data type Integer and 'Name' and 'Country_Name' are non-key values and their data type is VarChar(50), not as Numeric, Sainath.
Sql statements are:
SELECT ID,Country_Name, FROM <table_name>;
SELECT ID, Name FROM <table_name>;
Each table has 2 records only. When I used a hashed file at reference link, it worked out. I'm also wondering if my reference has millions of records, should I go to direct database lookup or hashed file instead?
Regards.
SQL Statements, plural? Do you have two lookup stages?
Have you run the job through the debugger to see what happens? The Server product actually has a pretty nice one, something the [cough] other one doesn't. Or at least didn't the last time I was using it.
Just becase you have "millions of rows" in your lookup table doesn't mean you need to use all of them each job run, I assume? A common technique (which works very well, I might add) is what I've heard called a "key bucket table" or some such - basically you take the natural keys from your current input data and "bucket" them in a work table, then join that table to the "lookup" to drive the hashed file build. Essentially, each run the hashed file has "only what it needs" to fulfill its obligations for that run. Process 10,000 records and it there are up to 10,000 records in the hashed file. 5 records? No more than five. Etc. Part of the "Only What You Need" mantra.
Me, I don't recall the last time I did a direct database lookup in a server job... maybe for a very small volume where speed was not an issue. Otherwise it's Hashed Files all the way. As I've said here many times, hashed files are (IMHO) the Heart And Soul of the Server product. Learn to love them as they are your best friend in that world.![Wink :wink:](./images/smilies/icon_wink.gif)
![Confused :?](./images/smilies/icon_confused.gif)
Have you run the job through the debugger to see what happens? The Server product actually has a pretty nice one, something the [cough] other one doesn't. Or at least didn't the last time I was using it.
Just becase you have "millions of rows" in your lookup table doesn't mean you need to use all of them each job run, I assume? A common technique (which works very well, I might add) is what I've heard called a "key bucket table" or some such - basically you take the natural keys from your current input data and "bucket" them in a work table, then join that table to the "lookup" to drive the hashed file build. Essentially, each run the hashed file has "only what it needs" to fulfill its obligations for that run. Process 10,000 records and it there are up to 10,000 records in the hashed file. 5 records? No more than five. Etc. Part of the "Only What You Need" mantra.
Me, I don't recall the last time I did a direct database lookup in a server job... maybe for a very small volume where speed was not an issue. Otherwise it's Hashed Files all the way. As I've said here many times, hashed files are (IMHO) the Heart And Soul of the Server product. Learn to love them as they are your best friend in that world.
![Wink :wink:](./images/smilies/icon_wink.gif)
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers