Handling extra tabs in a tab delimited files.

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
SwathiVineela
Participant
Posts: 9
Joined: Mon Mar 03, 2008 1:30 am
Location: India

Handling extra tabs in a tab delimited files.

Post by SwathiVineela »

Hi,

I have a txt file which is tab delimited.
And now if you have some extra tabs with in the data of a particular column then my sequential stage is giving an error saying
"Trim_Test..Source.Extraction: read_delimited() - row 1, too many columns in record"

For Example: If I have EMP_NAME CITY as a header of my sequential file
Both the columns were separated by a tab which acts as a delimiter.Above mentioned error was coming if the data for the above header is coming with some extra tabs in the EMP_NAME like "SW THI" "CI TY".

How to delete the extra tabs in the datausing a sequential stage in Datastage server edition?

I dont have previlages for running some commands. So please suggest me something which can be done in the stages.

Thanks in advance.


Swathi Vineela
SwathiVineela
Participant
Posts: 9
Joined: Mon Mar 03, 2008 1:30 am
Location: India

Post by SwathiVineela »

Could any one please take this case ASAP?
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

ASAP is not for a volunteered site. They post when they get time.

search is a good utility to save time. Search it.

and also tell us what you have tried so far.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
n.parameswara.reddy@accen
Participant
Posts: 40
Joined: Mon May 18, 2009 5:22 am

Post by n.parameswara.reddy@accen »

SwathiVineela wrote:Could any one please take this case ASAP?
In the parallel job, inside the format tab ,field delimiter = TAB
and quotes = double options available.

Same option is there in the server job also. check the format option once
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

One method would be to pre process the file by converting "/t" to ",".
Convert the table between two double qoutes to comma. And read the file as comma seperated. And you can easily trim the tab using tr command or within datastage too.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
virupakshagm
Participant
Posts: 18
Joined: Tue Dec 18, 2007 5:25 am
Location: Bangalore

Post by virupakshagm »

Any one of the following will give the solution to you.

1. What is the line terminator of the sorce file. whether it is Unix style or DOS style.

2. define a character set map in Sequential File stage using the NLS tab to UTF8.

3. In the Incomplete column, select 'Replace' to pad a short column with SQL null, or act in accordance with Missing columns action if missing.

Thanks,
Viru
jgreve
Premium Member
Premium Member
Posts: 107
Joined: Mon Sep 25, 2006 4:25 pm

Re: Handling extra tabs in a tab delimited files.

Post by jgreve »

What is the source of your text file?
It sounds like somebody typed out the data "incorrectly"
e.g. instead of city="Vasco da Gama" they typed city="Vasco\tda\tGama"

Can you persuade your data source to correct the issue?

If not, what kumar_s said sounds good.

You'll need to change "\t" to ","

Or possibly (in perl): s/([^"])\t([^"])/$1 $2/g;
remove tabs that are not enclosed in quotes.

Code: Select all


note - 0x09=tab character, 0x10=newline, 0x20=space character

before: " apple " 0x09 " banana 0x09 cherry " 0x09 0x10
                   |             |             |___one leading quote, not completely bordered, leave alone.
                   |             |
                   |             |___no border quotes; repace with space char.
                   |
                   |___ bordered by quotes; leave this tab alone.

after: " apple " 0x09 " banana 0x20 cherry " 0x09 0x10
A small example:

Code: Select all

$ cat grik.pl
my $a = ""apple"\t"banana\tcherry"\t\n";
print  $a;
$a =~ s/([^"])\t([^"])/$1 $2/g;
print  $a;
$ perl grik.pl
"apple" "banana cherry"
"apple" "banana cherry"
$ perl grik.pl | od -xc
0000000     2261    7070    6c65    2209    2262    616e    616e    6109
           "   a   p   p   l   e   "  \t   "   b   a   n   a   n   a  \t
0000020     6368    6572    7279    2209    0a22    6170    706c    6522
           c   h   e   r   r   y   "  \t  \n   "   a   p   p   l   e   "
0000040     0922    6261    6e61    6e61    2063    6865    7272    7922
          \t   "   b   a   n   a   n   a       c   h   e   r   r   y   "
0000060     090a
          \t  \n
0000062
$
However...
Do you have any records with unmatched quotes?
How does your source system escape quotes as data?

I know this is dsxchange, not perlxchange :)
*shrug* I just don't know how to easily do something
like this in datastage.
Good luck!
John G.

btw, in general - can datastage handle quoted data?
In particular, can datastage handle delimiters inside of quoted fields?

I have not had good results with that and would describe datastage's
quote support as weak.

SwathiVineela wrote:Hi,

I have a txt file which is tab delimited.
And now if you have some extra tabs with in the data of a particular column then my sequential stage is giving an error saying
"Trim_Test..Source.Extraction: read_delimited() - row 1, too many columns in record"
Post Reply