Copying VARCHAR columns into CHAR columns in target database

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
TonyInFrance
Premium Member
Premium Member
Posts: 288
Joined: Tue May 27, 2008 3:42 am
Location: Luxembourg

Copying VARCHAR columns into CHAR columns in target database

Post by TonyInFrance »

Hi guys

I have a generic job that loads datasets into target tables. The issue I have is that the source file contains VARCHARS which are loaded into datasets using schema files. Thus in the schema files I have no choice but to define these fields as string[max=n]. however the target table has the same column defined as CHAR[n]
Thus when the dataset is loaded into the target table (DB2) using a DB2 connector it is (not surprisingly) throwing a warning:

When writing column VARCHAR(min=0,max=n) into database column CHAR(min=n,max=n), truncation, loss of precision or data corruption can occur.

Is there anyway I can avoid this other than defining a message handler? I tried the environment variable APT_STRING_PADCHAR but (as I read Craig's post in another thread) this won't help me since this variable is used only to pad characters and NOT VARCHARS.

Any other environment variable similar to this?
FYI.
I tried putting in a TRANSFORMER stage but to no avail.

Thanks guys

Tony
Tony
BI Consultant - Datastage
prasannakumarkk
Participant
Posts: 117
Joined: Wed Feb 06, 2013 9:24 am
Location: Chennai,TN, India

Post by prasannakumarkk »

In the schema file if you have defined the datatype as char(n) then the APT_STRING_PADCHAR would have got appended.
Or in transformer right pad the column with spaces
Last edited by prasannakumarkk on Wed Mar 27, 2013 10:33 am, edited 1 time in total.
Thanks,
Prasanna
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

Instead of defining string[max=20] in schema file you can define string[20] to make the column treated as Char.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
TonyInFrance
Premium Member
Premium Member
Posts: 288
Joined: Tue May 27, 2008 3:42 am
Location: Luxembourg

Post by TonyInFrance »

prasannakumarkk wrote:In the schema file if you have defined the datatype as char(n) then the APT_STRING_PADCHAR would have got appended.
Or in transformer right pad the column with spaces
The APT_STRING_PADCHAR does not pad varchars even if it is defined in the job which uses a schema file to parse a line. Just tested and the line containing 2 characters (instead of the required 3 as per schema file) is rejected. here's the example:

213;213;205;TK08007537590;2012-09-30;N;;2012-09-30 00:00:00.000000
213;213;53;978C0000032409579280;2012-09-30;N;;2012-09-30 00:00:00.000000

The corresponding schema file is:

record {delim=';',final_delim= 'end_of_record' } (
COETF:nullable string[3] { null_field=''};
COETB:nullable string[3] { null_field=''};
CTCGE:nullable string[3] { null_field=''};
COCO:nullable string[max=20] { null_field=''};
DAEXTI:nullable date {date_format='%yyyy-%mm-%dd', null_field=' '};
COIDT:nullable string[max=12] { null_field=''};
LICPT:nullable string[25] { null_field=''};
ZTSXTI:nullable timestamp {timestamp_format='%yyyy-%mm-%dd %hh:%nn:%ss.6', null_field=''};
)

Please note that the 3rd column 'CTCGE' is defined as 3 characters whereas the second line of my data above has only 2 characters for this column. Thus this second line gets rejected despite setting APT_STRING_PADCHAR to 0x20.

The transformer solution is not an option since I have to use RCP as this job is generic and processes around 500 files in a loop each containing different columns as per respective schema files.
priyadarshikunal wrote:Instead of defining string[max=20] in schema file you can define string[20] to make the column treated as Char.
If I do this as I have done in the example above the line is rejected which is what I'm trying to avoid.
Tony
BI Consultant - Datastage
prasannakumarkk
Participant
Posts: 117
Joined: Wed Feb 06, 2013 9:24 am
Location: Chennai,TN, India

Post by prasannakumarkk »

are you getting the same warning again?
When writing column VARCHAR(min=0,max=n) into database column CHAR(min=n,max=n), truncation, loss of precision or data corruption can occur.
Thanks,
Prasanna
TonyInFrance
Premium Member
Premium Member
Posts: 288
Joined: Tue May 27, 2008 3:42 am
Location: Luxembourg

Post by TonyInFrance »

You mean if I include APT_STRING_PADCHAR in my parsing job? NO. If I do that then the lines are rejected with the appropriate warning which is undesirable.
Tony
BI Consultant - Datastage
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

Hi Toni,

first of all I guess Your not loading datasets to a database, but flatfiles (that is why you can specify a schema-file, which you can't on a dataset (the dataset carries the information in its descriptor-file, but you should not change the information therein).

Of course, when You specify string[3] for a column containing only two bytes of information, DataStage will consider your string-column fixed length and this costs you one of your column-delimiters, which in turn leads to the row being rejected.

Otherwise You get type conversion warnings, because a variable-length string of length 3 may actually be longer than 3 bytes. This is not a problem in your case and without runtime-column-propagation this conversion actually works fine.

Did you consider using a modify-stage and to generate the type-conversion command? Example command: COETB :string[3] = COETB
The problem is that You have to use at least one command on it, so you would have to generate a keep-command for all columns when there is nothing else to change...
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
TonyInFrance
Premium Member
Premium Member
Posts: 288
Joined: Tue May 27, 2008 3:42 am
Location: Luxembourg

Post by TonyInFrance »

Hello Roland,

Many thanks for that.
I could go through the MODIFY stage workaround but can I do that without specifying any column names inside? Because my job is generic and COETB is not always present as a column. These columns are of different names depending on my file.

In the file they are all VARCHAR with a maximum length as specified in the schema file. In the database the same column is CHAR with the same length.

So could I just insert a modify stage in preferably the job that loads the dataset into the table?

To give you finer details I am loading a flat file using a schema file and a column import stage into a dataset. Then this dataset is loaded into a table.

Thanks & regards

Tony
Tony
BI Consultant - Datastage
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

Hi Tony,
I already played the scenario through with the data provided in your post (schema-file and data). I first reproduced your warnings and then added a modify stage using RCP with no column-list.
You have to pass the modify-command to the job by parameter. You could generate the command string in a server-job running before the copy-job and write it to the User-Status-variable. You'll need some metadata-storage to provide the information which columns to modify.
As already stated: The modify-stage expects at least one command, so if there is nothing to change, you'll have to provide a keep-command to keep all columns.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
TonyInFrance
Premium Member
Premium Member
Posts: 288
Joined: Tue May 27, 2008 3:42 am
Location: Luxembourg

Post by TonyInFrance »

Hi Roland

While I understand what to do I confess I do not know how to implement the same.

I understand that I have to add a modify stage in between my dataset and DB2 stage? Fyi. right now its the design is DATASET -> COPY STAGE -> DB2 Connecter. So Can I just replace the COPY stage by a MODIFY stage?
Since I have to use RCP I cannot specify any columns inside. So under Options -> Specifications how can I specify that ALL columns should be retained?

I confess I do not quite understand:
how to pass the modify-command to the job by parameter OR
how to generate the command string in a server-job running before the copy-job and write it to the User-Status-variable.
Tony
BI Consultant - Datastage
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

Hi Tony,

You understood quite correctly. Replace the Copy-Stage by the Modify-Stage and activate RCP on the Output-Tab of the Stage. Do not specify any columns on the stage. RCP will move the complete input to the output-link if you do not specify a keep or a drop command.

The change-datatype command I gave you an example for will perform an explicit conversion of the input-column without changing the column-name
COETB :string[3] = COETB
Input-columnname :new_datatype = Output-Columnname

This is documented. You can concatenate as many change-datatype commands as you need in a specific job by using a semicolon as a separator. Try it. Works just fine.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

Sorry Tony,

I just read your final questions: Create a user-defined server-function with the following code and give it a name that suits you:

Code: Select all

$include DSINCLUDE JOBCONTROL.H
Equate RoutineName To "LogUserStatus"

Deffun DSRMessage(A1, A2, A3) Calling "*DataStage*DSR_MESSAGE"

InputMsg = "UserStatus: ":UserStatus

If Isnull(InputMsg) Then
InputMsg = " "
End
Message = DSRMessage("DSTAGE_TRX_E_0017", "USER INFO: %1 JOB=> %2 STAGE=> %3 LINK=> %4", InputMsg:@FM:DSJobName:@FM:DSStageName:@FM:DSLinkName)
Call DSLogInfo(Message,RoutineName)
Call DSSetUserStatus(UserStatus)
Ans=0
Then create a server job that reads the columns and the datatypes you need to convert to from a metadata-table.
You will have to create this table yourself and insert the metadata you need. Concatenate the commands in the format I gave you.
If there is no column to convert, generate a keep command: keep column_name1, column_name2 [,column_name3...]
List all columns from your input.
Provide the generated string as input to the function you created within a transformer. This sets the UserStatus-variable of the job.
Add a parameter MODIFY_COMMAND to your Copy-Job and set this to the Activity Variable ServerJobName.$UserStatus.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
TonyInFrance
Premium Member
Premium Member
Posts: 288
Joined: Tue May 27, 2008 3:42 am
Location: Luxembourg

Post by TonyInFrance »

Thanks Roland.
I will try and implement this for my next delivery.
The current release was urgent and thus I had to go through the 'message handler' route.

Cheers for the help. I really appreciate your time.

T
Tony
BI Consultant - Datastage
Post Reply