Lookup Trannsformer column dropped

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Lookup Trannsformer column dropped

Post by roy »

Hi Yall,
I have a job with:

Code: Select all

                          DB
                           |
lookup_file - -> lookup_Transformer
                           |
                        Transformer
      several output links
I'm geting for a column derived from the lookup file the following warning:
APT_CombinedOperatorController,0: Field 'field name' from input dataset '0' is NULL. Record dropped.
my constraint is continue anyway and the column is null able
the strange thing is that I have, in another job, a lookup file with 2 columns where it seems to work and in this one
it fails to pass rows where no lookup was found.
any ideas on what I'm missing?
I've rebuilt the lookup transformer and the output link.
same result.

my problem is that in this case as mentioned in bold above no rows are returned from the lookup transformer for ones that no lookup was found for in the lookup file.

Am I missing anything? :roll:

Thanks in advance,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Did you add a reject link?
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
Adding a reject link captured the rows.
But is this the way it should be done?
can't I pass it along the same link as regular lookups that were found and figure it out later?
to function like an outer join?
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
rsrikant
Participant
Posts: 58
Joined: Sat Feb 28, 2004 12:35 am
Location: Silver Spring, MD

Post by rsrikant »

Hi Roy,
APT_CombinedOperatorController,0: Field 'field name' from input dataset '0' is NULL. Record dropped.
Whenever the key columns in the lookup is null this warning comes. It's quite common and we used to get this warning in our jobs.
can't I pass it along the same link as regular lookups that were found and figure it out later?
AFAIK this is not possible in datastage.
The option continue anyway will only continue the lookup process with the next row. When you select Output option the row will be rejected in to some output stage. If not both, the job will be aborted.

Regards,
Srikanth
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Use 'Continue'. Also make sure that the data from the lookup matches the field formats. Otherwise the output object will reject them.
T42
Participant
Posts: 499
Joined: Thu Nov 11, 2004 6:45 pm

Post by T42 »

EVERYONE!

Pay attention.
APT_CombinedOperatorController,0: Field 'field name' from input dataset '0' is NULL. Record dropped.
If you are getting the above message, you are NOT programming correctly in DataStage EE.

This is NOT a case of...
It's quite common and we used to get this warning in our jobs.
NEVER EVER EVER EVER EVER EVER EVER EVER EVER EVER get used to this message.

You are DROPPING records due to this message. Records that your company is most likely needing for their future processing.

NEVER EVER EVER EVER... *deep breaths*... EVER IGNORE THIS MESSAGE!

:evil:

* * *

Okay... now that I am done ranting... Pay attention:

The most common cause of this job is due to the Transformer stage. When you try to handle some data in a certain way, and it have NULL as a value, DataStage take one look at you and go, 'Hell no if I am going to handle your bad programming. RECORD DROPPED!'

Why is it bad programming? Lets assume here that you have an input field: Foo. You want to transform it to Bar, with the following function:

Code: Select all

DecimalToString(input.Foo)
The function expects that there be an actual data there. However, if it is NULL... how can the function handle that?

Simply, it doesn't. Absolutely NO functions within DataStage handle NULL values (except, well... see below). It is fully expected that the Transformer pay attention to the NULL values. So the Transformer follows these rules:

1. If you have a derivative, check to see if the input data have NULL.
2. If the input data is NULL, well... see the entire record say "good bye!" on its way to the /dev/null in the sky.

So what do we do if we REALLY want that record? We HANDLE the NULL ourselves.

Code: Select all

If IsNull(input.Foo) Then SetNull() Else DecimalToString(input.Foo)
Yes, we have to explicitly tell it, "Hey, if this is a NULL value... pass it along as... a Null, otherwise, do the derivative."

Of course, if the output field is Not Nullable, you will need to supply an actual value.

* * *

Why is Null so special in DataStage. That is because the actual data field CAN be Null while the data field's metadata is NOT Null. You can actually pass an Ascii(0), and have DataStage consider it as a VALID value. The Null flag is set elsewhere within that field's metadata.

So, thus, unlike C/C++ where you can be lazy (sometimes), in DataStage, you must handle the Null field whenever it's possible and you're doing a derivative.

* * *

Is NULL BAD? No. In Kimball, however, you should minimize the possibility of NULL fields. Not to the point where you must provide a dummy Null value, mind you. But don't give EVERY database field the ability to contain Null if you know it will never have Null.

* * *

What is:
APT_CombinedOperatorController,0
You may ask?

It is DataStage's way of reducing the amount of processing by combining several stages together into one big stream all handled in memory. This allows for less disk I/O, and yield better performance. Woo woo!

But when you debug, it sure is a pain to identify the cause, eh? That's where $APT_DISABLE_COMBINATION comes in play. Set the variable in your Job parameter as "True", recompile, and let it rip. Suddenly, the offending stage's name will pop up where APT_CombinedOperatorController is.

Also, ",0" means the first node.

* * *

Now in summary, what have you learned today?

1. NEVER ignore Null messages.
2. NEVER ignore messages that says "dropping records."
3. Null values must be explicitly handled when you do derivatives.
4. APT_DISABLE_COMBINATION works pretty well for debugging.

Please. The warning messages are there for a reason. Learn what those reasons are by experimenting with your job. Above all, some warning messages are pretty mild, but other can mean your job.

After all, what is there to stop your company from firing you for dropping 25% of their production data for several months simply because it have a NULL somewhere that is not handled?

That's it for today. Have fun!
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Thanks for the lesson :),
first I Wouldn't think of ignoring this warning message if the entire world told me to do so.

OK now for solving this:
Sorry All :oops: Wrong answer was:
it seems that I had a column/s with the same names in the delta and lookp files not all being part of the actual key.
after rebuilding the lookup file to have different coloumn names for none key columns it doesn't give the dropped the rows warning anymore
right answer is:
I rebuilt the lookup file with another name, by the way also renamed the columns, and changed the table definition save/load to match the one used in the job making the lookup file and it seems to work I have no idea what was wrong to begin with ; (I guess it one of those that a simple rebuild is faster then anything)

I wonder what we could do with complex sql columns that DS complains about decimal(38,10) being put inside a int32 type simply since the column is a computed one (even if integer by nature).
I'm trying to find the best method to handle it so no warnings will pop out.
any ideas would help.

Thanks in advance,
Last edited by roy on Tue Mar 08, 2005 11:55 pm, edited 1 time in total.
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
T42
Participant
Posts: 499
Joined: Thu Nov 11, 2004 6:45 pm

Post by T42 »

If you have a metadata of "Number" -- it defaults to (38,0) (or in some versions of DataStage - (38,10)). You will need to do a manual conversion (DecimalToString() - and handle the null if there's one) if you want to use it as an integer.

It's just the way things are with certain databases (Oracle, I presume?)
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
Yes oracle it is.
No I'm talking of warnings on columns that have things like case, decode in the select list and get theese warnings.
If I use strait forward select a,b,c ... no warnings are logged.



Any Ideas?
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
T42
Participant
Posts: 499
Joined: Thu Nov 11, 2004 6:45 pm

Post by T42 »

Aha!

Hmm.

Something tells me that Oracle is switching metadata on you. I am not sure why, or how. I do not have access to Oracle at the moment to play around with it. Best solution - Switch the metadata to whatever Oracle forces DataStage to use, and the warnings will go away.
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
If only it was that simple,
when I specify a decimal data type for a, lets say, real type of decimla(x,0) or numeric(x,0) DS forces a dot along with the number in my data files messing lookup and such. :(
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
T42
Participant
Posts: 499
Joined: Thu Nov 11, 2004 6:45 pm

Post by T42 »

Oh great. Heh. I wish I have access to Oracle to help. Sorry.
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Actually I think it is expected since when you impot a table definition of such columns you get an Integer type and in comments Decimla(10,0) ?
so I guess this is how it works.
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
Post Reply