Commas in data value...How to fix the existing file ?

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
kaps
Participant
Posts: 452
Joined: Tue May 10, 2005 12:36 pm

Commas in data value...How to fix the existing file ?

Post by kaps »

We have a job which extracts from database and writes to a file. Problem is that one of the column had comma as part of the value for some of the records. When written to the file delimter is ,(comma) and quote charecter is 000. so now there is one extra column for some of the records. How can I make these two columns as one column.

I can't re extract from the table as the table is changed. I got to fix this file now.

I can find out the rows which has the extra columns but don't know how to seggragate them from the file. If I can seperate them then I can use concatanate function to merge the two columns.

Please advise...

Thanks
Last edited by kaps on Wed Nov 21, 2007 12:05 pm, edited 1 time in total.
sachin1
Participant
Posts: 325
Joined: Wed May 30, 2007 7:42 am
Location: india

Re: Commas in data value...How to fix ?

Post by sachin1 »

check for Incomplete column specification like "Retain","Replace","Error".............,
hope that may help.
rleishman
Premium Member
Premium Member
Posts: 252
Joined: Mon Sep 19, 2005 10:28 pm
Location: Melbourne, Australia
Contact:

Post by rleishman »

You can get the rows with too many commas as follows:

Code: Select all

sed -e '/\(.*,\)\{25\}/ p' -e 'd' myfile.csv > toomany.csv
Replace the "25" with a number that is 1 more than the "correct" number of commas.

You can get the rows with the right number of commas as follows:

Code: Select all

sed -e '/\(.*,\)\{25\}/ d' myfile.csv > justright.csv
Cat them together after you fix the bad ones.
Ross Leishman
NickH
Participant
Posts: 6
Joined: Mon Jul 26, 2004 7:46 am

Re: Commas in data value...How to fix ?

Post by NickH »

You could do the seperation in datastage if you wanted. Read the file in with file delimeter 000 quote delimeter 000. Then use the Count($,',')="Correct Number Of Commas" in a constraint and write the results out, with file delimeter 000 quote delimeter 000. You can put the invalid records in a reject file with a slight mod to the constraint or reject row.
kaps
Participant
Posts: 452
Joined: Tue May 10, 2005 12:36 pm

Post by kaps »

Nick - I am having a problem when I have constraint like count($,',')=21 then it's giving me error as Variable '$' not defined. How can I say take the whole record and count ?

rleishman - I have executed your command. It's still working. I shall post the results.

Thanks for the help...
jdmiceli
Premium Member
Premium Member
Posts: 309
Joined: Wed Feb 22, 2006 10:03 am
Location: Urbandale, IA

Post by jdmiceli »

Is the field that has the problem consistently in the same position, or do multiple fields have the issue? If it is always the 17th field, for example, then you could use a little Perl script to find rows that have too many commas and then remove the comma after the beginning of that problem field.

Something like this should work:

Code: Select all


#!/usr/bin/perl -w

my $count = 0;
my $comma_count = 0; # used for the number of commas found in row
my $good_cnt = 0;  # replace this with the total number of commas for a good row
my $bad_comma = 10;  # replace this with the positional count for which #comma is bad above for for each line involved
my $line = '';
my $element = '';
my @array = '';

my $file = "$ARGV[0]";
my $outfile = "fixed_" . "$file";

open (IN, "< $file") || die "Could not open $file for reading!\n";
open (OUT, "> $outfile") || die "Could not open $outfile for writing!\n";

while ($line = <IN>)
{
    chomp $line;    
    $comma_count = tr/\,//c; # counts the number of commas on a line
    
    $count = -1; #perl counts starting at zero, so need to be one less to match array
    if ($comma_count > $good_cnt)
    {
        @array = split (/\,/,$line);
        $line = ''; # can reuse this now that array has the values
        
        foreach $element(@array)
        {
            $count++;
            if ($count == $bad_comma)
            {
                $line = "$line"; # no comma - effectively joins errored field
            } else
            {
                $line = "$line,";
            }
        }
        chop $line; # remove the final comma
    }
    print OUT "$line\n";
}

close IN;
close OUT;

Now, I just kind of did this off the top of my head and haven't really tested it other than for syntax, so test it out. Just run it like this: perl scriptname.pl filename. You also need to identify the number of commas in a good row and replace the zero associated with $good_count with that value. Do the same thing with $bad_comma to tell the program which is the bad position on rows that have more commas than what is in $good_count.

If you decide to use this and need assistance, just ask away!

Hope that helps!
Bestest!

John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services


"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
Post Reply