Hi,
I have a txt file which is tab delimited.
And now if you have some extra tabs with in the data of a particular column then my sequential stage is giving an error saying
"Trim_Test..Source.Extraction: read_delimited() - row 1, too many columns in record"
For Example: If I have EMP_NAME CITY as a header of my sequential file
Both the columns were separated by a tab which acts as a delimiter.Above mentioned error was coming if the data for the above header is coming with some extra tabs in the EMP_NAME like "SW THI" "CI TY".
How to delete the extra tabs in the datausing a sequential stage in Datastage server edition?
I dont have previlages for running some commands. So please suggest me something which can be done in the stages.
Thanks in advance.
Swathi Vineela
Handling extra tabs in a tab delimited files.
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 9
- Joined: Mon Mar 03, 2008 1:30 am
- Location: India
-
- Participant
- Posts: 9
- Joined: Mon Mar 03, 2008 1:30 am
- Location: India
-
- Premium Member
- Posts: 1735
- Joined: Thu Mar 01, 2007 5:44 am
- Location: Troy, MI
-
- Participant
- Posts: 40
- Joined: Mon May 18, 2009 5:22 am
One method would be to pre process the file by converting "/t" to ",".
Convert the table between two double qoutes to comma. And read the file as comma seperated. And you can easily trim the tab using tr command or within datastage too.
Convert the table between two double qoutes to comma. And read the file as comma seperated. And you can easily trim the tab using tr command or within datastage too.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
-
- Participant
- Posts: 18
- Joined: Tue Dec 18, 2007 5:25 am
- Location: Bangalore
Any one of the following will give the solution to you.
1. What is the line terminator of the sorce file. whether it is Unix style or DOS style.
2. define a character set map in Sequential File stage using the NLS tab to UTF8.
3. In the Incomplete column, select 'Replace' to pad a short column with SQL null, or act in accordance with Missing columns action if missing.
Thanks,
Viru
1. What is the line terminator of the sorce file. whether it is Unix style or DOS style.
2. define a character set map in Sequential File stage using the NLS tab to UTF8.
3. In the Incomplete column, select 'Replace' to pad a short column with SQL null, or act in accordance with Missing columns action if missing.
Thanks,
Viru
Re: Handling extra tabs in a tab delimited files.
What is the source of your text file?
It sounds like somebody typed out the data "incorrectly"
e.g. instead of city="Vasco da Gama" they typed city="Vasco\tda\tGama"
Can you persuade your data source to correct the issue?
If not, what kumar_s said sounds good.
You'll need to change "\t" to ","
Or possibly (in perl): s/([^"])\t([^"])/$1 $2/g;
remove tabs that are not enclosed in quotes.
A small example:
However...
Do you have any records with unmatched quotes?
How does your source system escape quotes as data?
I know this is dsxchange, not perlxchange
*shrug* I just don't know how to easily do something
like this in datastage.
Good luck!
John G.
btw, in general - can datastage handle quoted data?
In particular, can datastage handle delimiters inside of quoted fields?
I have not had good results with that and would describe datastage's
quote support as weak.
It sounds like somebody typed out the data "incorrectly"
e.g. instead of city="Vasco da Gama" they typed city="Vasco\tda\tGama"
Can you persuade your data source to correct the issue?
If not, what kumar_s said sounds good.
You'll need to change "\t" to ","
Or possibly (in perl): s/([^"])\t([^"])/$1 $2/g;
remove tabs that are not enclosed in quotes.
Code: Select all
note - 0x09=tab character, 0x10=newline, 0x20=space character
before: " apple " 0x09 " banana 0x09 cherry " 0x09 0x10
| | |___one leading quote, not completely bordered, leave alone.
| |
| |___no border quotes; repace with space char.
|
|___ bordered by quotes; leave this tab alone.
after: " apple " 0x09 " banana 0x20 cherry " 0x09 0x10
Code: Select all
$ cat grik.pl
my $a = ""apple"\t"banana\tcherry"\t\n";
print $a;
$a =~ s/([^"])\t([^"])/$1 $2/g;
print $a;
$ perl grik.pl
"apple" "banana cherry"
"apple" "banana cherry"
$ perl grik.pl | od -xc
0000000 2261 7070 6c65 2209 2262 616e 616e 6109
" a p p l e " \t " b a n a n a \t
0000020 6368 6572 7279 2209 0a22 6170 706c 6522
c h e r r y " \t \n " a p p l e "
0000040 0922 6261 6e61 6e61 2063 6865 7272 7922
\t " b a n a n a c h e r r y "
0000060 090a
\t \n
0000062
$
Do you have any records with unmatched quotes?
How does your source system escape quotes as data?
I know this is dsxchange, not perlxchange
*shrug* I just don't know how to easily do something
like this in datastage.
Good luck!
John G.
btw, in general - can datastage handle quoted data?
In particular, can datastage handle delimiters inside of quoted fields?
I have not had good results with that and would describe datastage's
quote support as weak.
SwathiVineela wrote:Hi,
I have a txt file which is tab delimited.
And now if you have some extra tabs with in the data of a particular column then my sequential stage is giving an error saying
"Trim_Test..Source.Extraction: read_delimited() - row 1, too many columns in record"