Reading sequential file empty columns NULL vs. Empty String

sbass1 · Post by **sbass1** » Wed Feb 18, 2009 5:32 pm

Hi,

I'm reading a tilde delimited file like so:

Col1~Col2~Col3 << header,
A~~C

So, Col2 is missing.

I need to validate the data in Col2 with data in another table (SQL Server, I'll call this table Lookup, and the lookup column Col2_NK). So, I use a reference lookup in the transformer.

Both columns have the format char(6), which is the column type of Col2_NK in the Lookup table.

My problem is a blank value is allowed in the incoming data. My data validation rule is "a blank value is allowed only if the data in table blah has a blank value", which it does in this case. Strange that blank is allowed, but there you go.

My problem is the lookup never gets a hit, and all blank values from the input data get rejected.

I know the problem by walking thru the code in the debugger: Col2 = NULL when read from the sequential file, and Col2_NK = 6 spaces in the lookup table.

So, my questions:

1. What is the best practice for dealing with this issue? Random (untested) thoughts:

* Proper configuration in the format tab of the sequential file
* Cast both columns to varchar
* Fiddling with TRIM functions in the lookup

I was hoping someone could recommend best practice rather than me hacking around until I get something to "work".

2. In the format tab of the sequential file output link, what's the difference between "Map empty string" vs. "Pad with empty string"?

3. What's the default padding field used for, and does it apply here?

(I have read the help file on all these tabs and skimmed the doc before posting. Either I'm not finding the proper hits or just not getting what the help file is trying to say.)

Thanks,
Scott

ray.wurlod · Post by **ray.wurlod** » Wed Feb 18, 2009 8:13 pm

If you need to handle "" differently from NULL, you need to specify something other than "" as the representation of NULL when reading from the sequential file. I suggesst the character string "NULL" unless that is likely to occur in your data. Don't worry about padding for now.

sbass1 · Post by **sbass1** » Wed Feb 18, 2009 11:05 pm

Update:

1. The doc is woefully inadequate in defining the difference between "Map empty string" and "Pad with empty string". I've done a search on all documents under C:\Program Files\Ascential\DataStage7.5.1\Docs and only get one hit in the Server Job Developer's Guide.

Here is an excerpt:

Missing columns action. Allows you to specify the action to
take when a column is missing from the input data. Choose
Pad with SQL null, Map empty string, or Pad with empty string
from the drop-down list.

Ummm, yeah...and??? So what does each choice do??? This is very much a circular definition.

I don't get any additional information using Google.

2. After much hacking, this is what works:

Leave both columns defined as char(6)

On the sequential file output link, format tab:

All checkboxes are unticked
Missing columns action: Pad with SQL NULL
Delimiter: ~
Quote character: "
Spaces between columns: N/A (grayed out)
Default NULL string: 020 (hex code for space) <<< this is the key
Default padding: blank

In fact, it appears I can put any non-blank characters in here: NULL, asdfasdf, FUBAR, 009, etc. all work. Perhaps the hex code isn't even that at all, but "020" instead?

The setting of Missing columns action appears to be irrelevant - all three selections work as long as Default NULL string is not blank.

The help for Default NULL string states:

Contains characters which, when encountered in a sequential file being read, are interpreted as the SQL null value (this can be overridden for individual column definitions in the Columns tab).

TBH, I have no idea why the above settings work :-/ ? I would have thought Missing columns action and Default padding would have been the relevant fields to set.

sbass1 · Post by **sbass1** » Wed Apr 08, 2009 9:34 am

Revisiting this topic...

If my tilde delimited sequential file has data like:

Code: Select all

FOO~ ~BAR~ ~ ~BLAH

are there settings in the format tab of the sequential file stage that will convert the embedded spaces to empty string, i.e.

Code: Select all

'' rather than ' '

In the format tab of my sequential stage, I'm trying:

LHS: Map empty string, and on the RHS:

~
000
0
NULL
#

Thanks,
Scott

ray.wurlod · Post by **ray.wurlod** » Wed Apr 08, 2009 12:25 pm

No. You can represent NULL as a space if you like, but transformation is done in the Transformer stage.

sbass1 · Post by **sbass1** » Wed Apr 08, 2009 5:16 pm

Thanks. Does DS have the equivalent of:

Code: Select all

s/~\s+~/~~/g

i.e. replace tilde-one or more spaces-tilde to tilde-tilde, in a transformer? Otherwise I'd use a sed script to pre-process the file.

ray.wurlod · Post by **ray.wurlod** » Wed Apr 08, 2009 6:18 pm

Not directly, but the extended form of the Trim() function can shrink multiple spaces to single space.

sbass1 · Post by **sbass1** » Wed Apr 08, 2009 6:48 pm

Several approaches:

1. Process the input line as one long string. Which I'm already doing to validate that the source file contains required number of columns. For this conversion, it needs to be exactlytilde-space{1,}-tilde, to tilde-tilde. Other fields may have one or more embedded spaces which need to be preserved. Oh, for OOTB regex support in DS

2. After parsing the columns downstream, trim all columns with Trim(column," ","E"). Con: I'd have to add this code to numerous output columns.

3. Convince the provider of the source file to produce a "trimmed" version.

4. Pre-process the file using sed.

I'll take option #4 and mark this thread resolved unless I hear otherwise.

FYI, the reason I'm doing this is I need to save the columns trimmed for later lookup validation against hashed files where the values are stored trimmed. I wish there was a way to convince a lookup that "FOO " = "FOO" (with minimal coding of multiple lookups).

Thanks for the help, much appreciated.

P.S.: So in the sequential file stage, what exactly does "Missing column action: Map empty string" mean/do?

ray.wurlod · Post by **ray.wurlod** » Wed Apr 08, 2009 9:14 pm

2. Derivation substitution is your friend.

Select all the column derivations affected, right click and choose Derivation Substitution from the menu. The rest is intuitive for someone who lusts after regexps (if that's the correct plural).