Copying VARCHAR columns into CHAR columns in target database

TonyInFrance · Post by **TonyInFrance** » Wed Mar 27, 2013 10:20 am

Hi guys

I have a generic job that loads datasets into target tables. The issue I have is that the source file contains VARCHARS which are loaded into datasets using schema files. Thus in the schema files I have no choice but to define these fields as string[max=n]. however the target table has the same column defined as CHAR[n]
Thus when the dataset is loaded into the target table (DB2) using a DB2 connector it is (not surprisingly) throwing a warning:

When writing column VARCHAR(min=0,max=n) into database column CHAR(min=n,max=n), truncation, loss of precision or data corruption can occur.

Is there anyway I can avoid this other than defining a message handler? I tried the environment variable APT_STRING_PADCHAR but (as I read Craig's post in another thread) this won't help me since this variable is used only to pad characters and NOT VARCHARS.

Any other environment variable similar to this?
FYI.
I tried putting in a TRANSFORMER stage but to no avail.

Thanks guys

Tony

prasannakumarkk · Post by **prasannakumarkk** » Wed Mar 27, 2013 10:30 am

In the schema file if you have defined the datatype as char(n) then the APT_STRING_PADCHAR would have got appended.
Or in transformer right pad the column with spaces

priyadarshikunal · Post by **priyadarshikunal** » Wed Mar 27, 2013 11:18 am

Instead of defining string[max=20] in schema file you can define string[20] to make the column treated as Char.

TonyInFrance · Post by **TonyInFrance** » Thu Mar 28, 2013 3:48 am

prasannakumarkk wrote:In the schema file if you have defined the datatype as char(n) then the APT_STRING_PADCHAR would have got appended.
Or in transformer right pad the column with spaces

The APT_STRING_PADCHAR does not pad varchars even if it is defined in the job which uses a schema file to parse a line. Just tested and the line containing 2 characters (instead of the required 3 as per schema file) is rejected. here's the example:

213;213;205;TK08007537590;2012-09-30;N;;2012-09-30 00:00:00.000000
213;213;53;978C0000032409579280;2012-09-30;N;;2012-09-30 00:00:00.000000

The corresponding schema file is:

record {delim=';',final_delim= 'end_of_record' } (
COETF:nullable string[3] { null_field=''};
COETB:nullable string[3] { null_field=''};
CTCGE:nullable string[3] { null_field=''};
COCO:nullable string[max=20] { null_field=''};
DAEXTI:nullable date {date_format='%yyyy-%mm-%dd', null_field=' '};
COIDT:nullable string[max=12] { null_field=''};
LICPT:nullable string[25] { null_field=''};
ZTSXTI:nullable timestamp {timestamp_format='%yyyy-%mm-%dd %hh:%nn:%ss.6', null_field=''};
)

Please note that the 3rd column 'CTCGE' is defined as 3 characters whereas the second line of my data above has only 2 characters for this column. Thus this second line gets rejected despite setting APT_STRING_PADCHAR to 0x20.

The transformer solution is not an option since I have to use RCP as this job is generic and processes around 500 files in a loop each containing different columns as per respective schema files.

priyadarshikunal wrote:Instead of defining string[max=20] in schema file you can define string[20] to make the column treated as Char.

If I do this as I have done in the example above the line is rejected which is what I'm trying to avoid.

prasannakumarkk · Post by **prasannakumarkk** » Thu Mar 28, 2013 5:12 am

are you getting the same warning again?

When writing column VARCHAR(min=0,max=n) into database column CHAR(min=n,max=n), truncation, loss of precision or data corruption can occur.

TonyInFrance · Post by **TonyInFrance** » Thu Mar 28, 2013 5:21 am

You mean if I include APT_STRING_PADCHAR in my parsing job? NO. If I do that then the lines are rejected with the appropriate warning which is undesirable.

BI-RMA · Post by **BI-RMA** » Thu Mar 28, 2013 8:26 am

Hi Toni,

first of all I guess Your not loading datasets to a database, but flatfiles (that is why you can specify a schema-file, which you can't on a dataset (the dataset carries the information in its descriptor-file, but you should not change the information therein).

Of course, when You specify string[3] for a column containing only two bytes of information, DataStage will consider your string-column fixed length and this costs you one of your column-delimiters, which in turn leads to the row being rejected.

Otherwise You get type conversion warnings, because a variable-length string of length 3 may actually be longer than 3 bytes. This is not a problem in your case and without runtime-column-propagation this conversion actually works fine.

Did you consider using a modify-stage and to generate the type-conversion command? Example command: COETB :string[3] = COETB
The problem is that You have to use at least one command on it, so you would have to generate a keep-command for all columns when there is nothing else to change...

TonyInFrance · Post by **TonyInFrance** » Thu Mar 28, 2013 8:34 am

Hello Roland,

Many thanks for that.
I could go through the MODIFY stage workaround but can I do that without specifying any column names inside? Because my job is generic and COETB is not always present as a column. These columns are of different names depending on my file.

In the file they are all VARCHAR with a maximum length as specified in the schema file. In the database the same column is CHAR with the same length.

So could I just insert a modify stage in preferably the job that loads the dataset into the table?

To give you finer details I am loading a flat file using a schema file and a column import stage into a dataset. Then this dataset is loaded into a table.

Thanks & regards

Tony

BI-RMA · Post by **BI-RMA** » Thu Mar 28, 2013 8:50 am

Hi Tony,
I already played the scenario through with the data provided in your post (schema-file and data). I first reproduced your warnings and then added a modify stage using RCP with no column-list.
You have to pass the modify-command to the job by parameter. You could generate the command string in a server-job running before the copy-job and write it to the User-Status-variable. You'll need some metadata-storage to provide the information which columns to modify.
As already stated: The modify-stage expects at least one command, so if there is nothing to change, you'll have to provide a keep-command to keep all columns.

TonyInFrance · Post by **TonyInFrance** » Thu Mar 28, 2013 9:00 am

Hi Roland

While I understand what to do I confess I do not know how to implement the same.

I understand that I have to add a modify stage in between my dataset and DB2 stage? Fyi. right now its the design is DATASET -> COPY STAGE -> DB2 Connecter. So Can I just replace the COPY stage by a MODIFY stage?
Since I have to use RCP I cannot specify any columns inside. So under Options -> Specifications how can I specify that ALL columns should be retained?

I confess I do not quite understand:
how to pass the modify-command to the job by parameter OR
how to generate the command string in a server-job running before the copy-job and write it to the User-Status-variable.

BI-RMA · Post by **BI-RMA** » Thu Mar 28, 2013 9:42 am

Hi Tony,

You understood quite correctly. Replace the Copy-Stage by the Modify-Stage and activate RCP on the Output-Tab of the Stage. Do not specify any columns on the stage. RCP will move the complete input to the output-link if you do not specify a keep or a drop command.

The change-datatype command I gave you an example for will perform an explicit conversion of the input-column without changing the column-name
COETB :string[3] = COETB
Input-columnname :new_datatype = Output-Columnname

This is documented. You can concatenate as many change-datatype commands as you need in a specific job by using a semicolon as a separator. Try it. Works just fine.

BI-RMA · Post by **BI-RMA** » Thu Mar 28, 2013 10:21 am

Sorry Tony,

I just read your final questions: Create a user-defined server-function with the following code and give it a name that suits you:

Code: Select all

$include DSINCLUDE JOBCONTROL.H
Equate RoutineName To "LogUserStatus"

Deffun DSRMessage(A1, A2, A3) Calling "*DataStage*DSR_MESSAGE"

InputMsg = "UserStatus: ":UserStatus

If Isnull(InputMsg) Then
InputMsg = " "
End
Message = DSRMessage("DSTAGE_TRX_E_0017", "USER INFO: %1 JOB=> %2 STAGE=> %3 LINK=> %4", InputMsg:@FM:DSJobName:@FM:DSStageName:@FM:DSLinkName)
Call DSLogInfo(Message,RoutineName)
Call DSSetUserStatus(UserStatus)
Ans=0

Then create a server job that reads the columns and the datatypes you need to convert to from a metadata-table.
You will have to create this table yourself and insert the metadata you need. Concatenate the commands in the format I gave you.
If there is no column to convert, generate a keep command: keep column_name1, column_name2 [,column_name3...]
List all columns from your input.
Provide the generated string as input to the function you created within a transformer. This sets the UserStatus-variable of the job.
Add a parameter MODIFY_COMMAND to your Copy-Job and set this to the Activity Variable ServerJobName.$UserStatus.

TonyInFrance · Post by **TonyInFrance** » Thu Mar 28, 2013 10:47 am

Thanks Roland.
I will try and implement this for my next delivery.
The current release was urgent and thus I had to go through the 'message handler' route.

Cheers for the help. I really appreciate your time.

T