Strategy for Null handling for sequential file processing

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pxr87
Premium Member
Premium Member
Posts: 16
Joined: Thu Oct 27, 2005 9:19 am

Strategy for Null handling for sequential file processing

Post by pxr87 »

I am trying to create some EE parallel Job after working in server edition for some time. I am looking for a Null handling strategy for a our datawarehouse which basically processes sequential files. I thought this approach could do well. but want to confirm.
----
Create a separate parallel Job ( doesn't need to be separate Job, but I decided to go with for other reasons ) with sequential file input stage and use modify stage to create the output sequential file or dataset. with the following command in the moddify stage.
outputcolumn = handle_null ( inputcolumn, value)
value is 01-01-2999 for date field, -9999999999 for integer field, '' for character. Hope that data doesn't contain these values.

If I use this approach, in the subsequent jobs I need to define all the columns are 'Not Null' columns'and then I doesnt need to give 'Null Field Values' or hit my mind whether this is null field or not ( , as our data suppliers never follow their interface protocol). do I doesnt need to bother about the 'Null' issues in the subsequent jobs.
But I still have doubts about the new fields I create based on stages like aggregators etc. Do I still need to handle null for a output field that comes out of aggregate stage from a Not null input fields ?.
I appreciate your comments/suggestions.
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
Your null handling should concider the business rules of your organization.
In most cases there are default values you can use, some in the future and some in the past for dates;
In some cases a zero would do and in others an empty string.
In short it varies depending on each columns individual sircumstances and what works for one field doesn't have to work for another.

Bare in mind that when reading/writing to sequential files your actually writing strings, which may not be the case when you process the read values later on in your jobs (conversions may be needed).

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

For the not null input field in aggregator output doesnt require null handling. But most likely to have a data type conversion. :wink:
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
trobinson
Participant
Posts: 208
Joined: Thu Apr 11, 2002 6:02 am
Location: Saint Louis
Contact:

Post by trobinson »

This is free advice and worth every penny but... Could you not define the Null Handling for each and every column in the Sequential File Input stage columns tab =>properties and eliminate the modify stage?
Post Reply