Target Excel File renaming column names

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Target Excel File renaming column names

Post by ShaneMuir »

Hi All

I am having an issue whereby when using a target Unstructured File Stage to write to an Excel file, with RCP enabled on all stages, that the target stage is renaming the column names.

The job itself is quite simple:

Code: Select all

DB  ----->  TFM ------>  EXCEL
The DB executes a query
The TFM is a placeholder incase we need to do anything on common fields
The EXCEL stage writes the excel file.

The job is fully parameterised, ie the SQL statement and target filenames / directories.

The issue is as follows: Where the SQL output column name is something like "Overdue_1-3_Month" the column is output in the Excel file as "CC_27_Overdue_1__3_Month".

It would seem that the stage does not like the hyphen (assuming that it is getting interpreted as a minus operator?) and as such renames the column (CC_27 seems to be column number 27 of the output then it has replaced the hyphen with a double underscore).

I have tried enabling quoted identifiers, and have checked the output from the SQL and it is passing the expected value "Overdue_1-3_Month", so its definitely occurring internally within the Excel file creation.

So my question is two fold:
Does anybody know if there is a way to override this auto renaming (or if indeed that is what is actually happening)?
Can anybody point me in the direction of some documentation which details in what circumstances this renaming can occur?

PS: I have checked with the target audience about changing the target column name - turns out the spreadsheet is used downstream by other external processes which rely on the column name being correct. Not saying that it cannot happen, but they would rather we try and fix it first.
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

If you are not using any major features of excel, you might get around this with a CSV file (plain text file with commas between columns).

You might double check that the columns are not set to parameterized in the unstructured stage.

You might also be able to work around it by brute force via adding the column headers first "as data not headers" then modifying the existing sheet to append the data portion to that file?

These are just workarounds, though. I didnt find any documentation on this behavior nor a way to control it.
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

Thanks for the thoughts.

I too am looking at the generate the column headers first route as it would seem the only logical way to get around it. But rather than modify the existing file, feed it in via a funnel, its just a matter of determining how to generate the column headers from the input SQL/stored procedures.

I was also thinking about going to csv then using a script to convert to xlsx (this would probably actually offer better performance as the unstructured file stage does seem a little slow (if not resource intensive)).

Thanks again for the input.
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

The column names can be peeled off a variety of ways, one quick crude approach is to write a flat file with 1 record and column names enabled, then read that and tell it the flat file does not have headers and throw away the second record. You can do similar things without the file if you want to string together enough stages to make that happen.
Post Reply