Padding sequential file data with trailing spaces

sbass1 · Post by **sbass1** » Fri Feb 27, 2009 12:52 am

Say you have a sequential file as follows:

Col1 << name of column
ABC
DEF
GHI

The column is specified as char(6) in the columns tab

Say you have a database table "Lookup" as follows:

Col1 <<< name of column
ABC
GHI
JKL

The column is specified as char(6) in the columns tab

My job is using table Lookup as a reference lookup to validate the data in the sequential file. The transform code looks like:

StageVar1 = If IsNull(Lookup.Col1) Then "Invalid Col1" Else ""
Audit = StageVar1:StageVar2:...:StageVarN
InvalidRecord = Len(Audit) ne 0

So in the example above I would accept row 1 & 3 and reject row 2 from the sequential file.

My problem is that the data at runtime is:

Seq.Col1 = "ABC"

and

Lookup.Col1 = "ABC "

DS treats trailing spaces as significant (I sure wish I could set some option to force trailing spaces to be insignificant), so the lookups always fail.

Now, I know I can fiddle with TrimB to get this to work, but 1) it takes more coding to set this up, and 2) would likely affect performance.

I specified that Col1 was char(6) in the columns tab for the sequential file. So what magic incantation do I need in the Format tab to actually make the data comply with my datatype, i.e. be padded with spaces for the width of the column?

Please address answers to the question asked, not "use TrimB" or "recast your data to VarChar". If there's no way to get the sequential stage to pad the data with spaces, just let me know that.

Thanks,
Scott

ray.wurlod · Post by **ray.wurlod** » Fri Feb 27, 2009 3:38 am

The Sequential File stage can not read spaces that aren't there in the data. Simply specifying Char(6) in the metadata does not magically make it so. If you want the trailing spaces you need to add them either in the file itself or in a Transformer stage within your job - a Fmt() function with "6L" as the second argument.

sbass1 · Post by **sbass1** » Tue Mar 03, 2009 12:49 am

ray.wurlod wrote:The Sequential File stage can not read spaces that aren't there in the data. Simply specifying Char(6) in the metadata does not magically make it so.

Hi Ray,

Thanks for the reply. I'm not wanting DS to "magically make it so." However, I was hoping it would honour the datatype and length settings I'd set in the sequential file stage.

In another ETL product I've used, defining the metadata characteristics for a sequential file determine the internal memory buffers that get defined at runtime. So, if I read "ABC" into a numeric buffer, I'll get a null value and error message (unless I suppress it). If I read "ABC" into a char(6) buffer, I'll get "ABC^^^". Since a sequential file is essentially just text without datatype, the datatype is determined by the metadata you associate with the columns. This strict data typing makes my life as the programmer easier, as I don't have to write code (i.e. Fmt(blah,"6L") ) to enforce data typing, I just define the correct (or desired) metadata and the product "does my work for me."

I did some experimentation/hacking, and determined that, for sequential files, the metadata is useless. I imported a combination of char, int, and float data into any type of columns - num, int, char, varchar, binary. I even defined length = 0. Nothing made a difference. Same thing on output to sequential file.

I did brief testing on output to a DRS stage (SQL Server), and there DS did a better job on enforcing data type based on the metadata. For example, I would get warning messages if I tried to write char data to an int output, and the message was definitely coming from DS and not SQL Server.

It seems the only thing that matters for a sequential file is the column name, and that's really just a convenience to map your column position to a "friendly name" for mapping.

I wish DS enforced stricter data typing for sequential files, as I feel it would make my job as a programmer easier. And, if nothing else, DS should just gray out the columns in the column definition window (i.e. key, SQL type, Length, etc) for the sequential stage if they are just ignored anyway.

ray.wurlod · Post by **ray.wurlod** » Tue Mar 03, 2009 4:01 am

If it's any consolation, a lot of server developers making the transition hate the way that parallel jobs strictly enforce data types!