Handling extra tabs in a tab delimited files.

SwathiVineela · Post by **SwathiVineela** » Thu Sep 24, 2009 12:17 am

Hi,

I have a txt file which is tab delimited.
And now if you have some extra tabs with in the data of a particular column then my sequential stage is giving an error saying
"Trim_Test..Source.Extraction: read_delimited() - row 1, too many columns in record"

For Example: If I have EMP_NAME CITY as a header of my sequential file
Both the columns were separated by a tab which acts as a delimiter.Above mentioned error was coming if the data for the above header is coming with some extra tabs in the EMP_NAME like "SW THI" "CI TY".

How to delete the extra tabs in the datausing a sequential stage in Datastage server edition?

I dont have previlages for running some commands. So please suggest me something which can be done in the stages.

Thanks in advance.

Swathi Vineela

SwathiVineela · Post by **SwathiVineela** » Thu Sep 24, 2009 5:00 am

Could any one please take this case ASAP?

priyadarshikunal · Post by **priyadarshikunal** » Thu Sep 24, 2009 5:16 am

ASAP is not for a volunteered site. They post when they get time.

search is a good utility to save time. Search it.

and also tell us what you have tried so far.

n.parameswara.reddy@accen · Thu Sep 24, 2009 5:18 am

SwathiVineela wrote:Could any one please take this case ASAP?

In the parallel job, inside the format tab ,field delimiter = TAB
and quotes = double options available.

Same option is there in the server job also. check the format option once

kumar_s · Post by **kumar_s** » Thu Sep 24, 2009 7:13 am

One method would be to pre process the file by converting "/t" to ",".
Convert the table between two double qoutes to comma. And read the file as comma seperated. And you can easily trim the tab using tr command or within datastage too.

virupakshagm · Post by **virupakshagm** » Thu Sep 24, 2009 4:07 pm

Any one of the following will give the solution to you.

1. What is the line terminator of the sorce file. whether it is Unix style or DOS style.

2. define a character set map in Sequential File stage using the NLS tab to UTF8.

3. In the Incomplete column, select 'Replace' to pad a short column with SQL null, or act in accordance with Missing columns action if missing.

Thanks,
Viru

jgreve · Post by **jgreve** » Thu Sep 24, 2009 6:23 pm

What is the source of your text file?
It sounds like somebody typed out the data "incorrectly"
e.g. instead of city="Vasco da Gama" they typed city="Vasco\tda\tGama"

Can you persuade your data source to correct the issue?

If not, what kumar_s said sounds good.

You'll need to change "\t" to ","

Or possibly (in perl): s/([^"])\t([^"])/$1 $2/g;
remove tabs that are not enclosed in quotes.

Code: Select all


note - 0x09=tab character, 0x10=newline, 0x20=space character

before: " apple " 0x09 " banana 0x09 cherry " 0x09 0x10
                   |             |             |___one leading quote, not completely bordered, leave alone.
                   |             |
                   |             |___no border quotes; repace with space char.
                   |
                   |___ bordered by quotes; leave this tab alone.

after: " apple " 0x09 " banana 0x20 cherry " 0x09 0x10

A small example:

Code: Select all

$ cat grik.pl
my $a = ""apple"\t"banana\tcherry"\t\n";
print  $a;
$a =~ s/([^"])\t([^"])/$1 $2/g;
print  $a;
$ perl grik.pl
"apple" "banana cherry"
"apple" "banana cherry"
$ perl grik.pl | od -xc
0000000     2261    7070    6c65    2209    2262    616e    616e    6109
           "   a   p   p   l   e   "  \t   "   b   a   n   a   n   a  \t
0000020     6368    6572    7279    2209    0a22    6170    706c    6522
           c   h   e   r   r   y   "  \t  \n   "   a   p   p   l   e   "
0000040     0922    6261    6e61    6e61    2063    6865    7272    7922
          \t   "   b   a   n   a   n   a       c   h   e   r   r   y   "
0000060     090a
          \t  \n
0000062
$

However...
Do you have any records with unmatched quotes?
How does your source system escape quotes as data?

I know this is dsxchange, not perlxchange

*shrug* I just don't know how to easily do something
like this in datastage.
Good luck!
John G.

btw, in general - can datastage handle quoted data?
In particular, can datastage handle delimiters inside of quoted fields?

I have not had good results with that and would describe datastage's
quote support as weak.

SwathiVineela wrote:Hi,

I have a txt file which is tab delimited.
And now if you have some extra tabs with in the data of a particular column then my sequential stage is giving an error saying
"Trim_Test..Source.Extraction: read_delimited() - row 1, too many columns in record"

DSXchange

Handling extra tabs in a tab delimited files.

Handling extra tabs in a tab delimited files.

Re: Handling extra tabs in a tab delimited files.