We have a job which extracts from database and writes to a file. Problem is that one of the column had comma as part of the value for some of the records. When written to the file delimter is ,(comma) and quote charecter is 000. so now there is one extra column for some of the records. How can I make these two columns as one column.
I can't re extract from the table as the table is changed. I got to fix this file now.
I can find out the rows which has the extra columns but don't know how to seggragate them from the file. If I can seperate them then I can use concatanate function to merge the two columns.
Please advise...
Thanks
Commas in data value...How to fix the existing file ?
Moderators: chulett, rschirm, roy
Commas in data value...How to fix the existing file ?
Last edited by kaps on Wed Nov 21, 2007 12:05 pm, edited 1 time in total.
Re: Commas in data value...How to fix ?
check for Incomplete column specification like "Retain","Replace","Error".............,
hope that may help.
hope that may help.
-
- Premium Member
- Posts: 252
- Joined: Mon Sep 19, 2005 10:28 pm
- Location: Melbourne, Australia
- Contact:
You can get the rows with too many commas as follows:
Replace the "25" with a number that is 1 more than the "correct" number of commas.
You can get the rows with the right number of commas as follows:
Cat them together after you fix the bad ones.
Code: Select all
sed -e '/\(.*,\)\{25\}/ p' -e 'd' myfile.csv > toomany.csv
You can get the rows with the right number of commas as follows:
Code: Select all
sed -e '/\(.*,\)\{25\}/ d' myfile.csv > justright.csv
Ross Leishman
Re: Commas in data value...How to fix ?
You could do the seperation in datastage if you wanted. Read the file in with file delimeter 000 quote delimeter 000. Then use the Count($,',')="Correct Number Of Commas" in a constraint and write the results out, with file delimeter 000 quote delimeter 000. You can put the invalid records in a reject file with a slight mod to the constraint or reject row.
Is the field that has the problem consistently in the same position, or do multiple fields have the issue? If it is always the 17th field, for example, then you could use a little Perl script to find rows that have too many commas and then remove the comma after the beginning of that problem field.
Something like this should work:
Now, I just kind of did this off the top of my head and haven't really tested it other than for syntax, so test it out. Just run it like this: perl scriptname.pl filename. You also need to identify the number of commas in a good row and replace the zero associated with $good_count with that value. Do the same thing with $bad_comma to tell the program which is the bad position on rows that have more commas than what is in $good_count.
If you decide to use this and need assistance, just ask away!
Hope that helps!
Something like this should work:
Code: Select all
#!/usr/bin/perl -w
my $count = 0;
my $comma_count = 0; # used for the number of commas found in row
my $good_cnt = 0; # replace this with the total number of commas for a good row
my $bad_comma = 10; # replace this with the positional count for which #comma is bad above for for each line involved
my $line = '';
my $element = '';
my @array = '';
my $file = "$ARGV[0]";
my $outfile = "fixed_" . "$file";
open (IN, "< $file") || die "Could not open $file for reading!\n";
open (OUT, "> $outfile") || die "Could not open $outfile for writing!\n";
while ($line = <IN>)
{
chomp $line;
$comma_count = tr/\,//c; # counts the number of commas on a line
$count = -1; #perl counts starting at zero, so need to be one less to match array
if ($comma_count > $good_cnt)
{
@array = split (/\,/,$line);
$line = ''; # can reuse this now that array has the values
foreach $element(@array)
{
$count++;
if ($count == $bad_comma)
{
$line = "$line"; # no comma - effectively joins errored field
} else
{
$line = "$line,";
}
}
chop $line; # remove the final comma
}
print OUT "$line\n";
}
close IN;
close OUT;
If you decide to use this and need assistance, just ask away!
Hope that helps!
Bestest!
John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services
"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services
"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"