I am trying to create some EE parallel Job after working in server edition for some time. I am looking for a Null handling strategy for a our datawarehouse which basically processes sequential files. I thought this approach could do well. but want to confirm.
----
Create a separate parallel Job ( doesn't need to be separate Job, but I decided to go with for other reasons ) with sequential file input stage and use modify stage to create the output sequential file or dataset. with the following command in the moddify stage.
outputcolumn = handle_null ( inputcolumn, value)
value is 01-01-2999 for date field, -9999999999 for integer field, '' for character. Hope that data doesn't contain these values.
If I use this approach, in the subsequent jobs I need to define all the columns are 'Not Null' columns'and then I doesnt need to give 'Null Field Values' or hit my mind whether this is null field or not ( , as our data suppliers never follow their interface protocol). do I doesnt need to bother about the 'Null' issues in the subsequent jobs.
But I still have doubts about the new fields I create based on stages like aggregators etc. Do I still need to handle null for a output field that comes out of aggregate stage from a Not null input fields ?.
I appreciate your comments/suggestions.
Strategy for Null handling for sequential file processing
Moderators: chulett, rschirm, roy
Hi,
Your null handling should concider the business rules of your organization.
In most cases there are default values you can use, some in the future and some in the past for dates;
In some cases a zero would do and in others an empty string.
In short it varies depending on each columns individual sircumstances and what works for one field doesn't have to work for another.
Bare in mind that when reading/writing to sequential files your actually writing strings, which may not be the case when you process the read values later on in your jobs (conversions may be needed).
IHTH,
Your null handling should concider the business rules of your organization.
In most cases there are default values you can use, some in the future and some in the past for dates;
In some cases a zero would do and in others an empty string.
In short it varies depending on each columns individual sircumstances and what works for one field doesn't have to work for another.
Bare in mind that when reading/writing to sequential files your actually writing strings, which may not be the case when you process the read values later on in your jobs (conversions may be needed).
IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
![Image](http://www.worldcommunitygrid.org/images/logo.gif)
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
![Image](http://www.worldcommunitygrid.org/images/logo.gif)