Reading Flat file (CSV)

h4harry1 · Post by **h4harry1** » Tue Apr 19, 2011 6:16 pm

Hi All,
I have a Flat file with comma as delimiter in the columns. I am reading data from this file using sequential file stage (Delimiter Comma). But the only problem is one of the column value includes comma in it. Don't know how to handle this. Any Help.
Thanks in advance.

ray.wurlod · Post by **ray.wurlod** » Tue Apr 19, 2011 6:52 pm

Are the character strings quoted?

Sampath · Post by **Sampath** » Wed Apr 20, 2011 1:14 am

You can take full row as a one record and put transform stage and use Field Function

h4harry1 · Post by **h4harry1** » Wed Apr 20, 2011 4:31 am

ray.wurlod wrote:Are the character strings quoted? ...

Thanks for reply...

NO .... Not Quoted...

blewip · Post by **blewip** » Wed Apr 20, 2011 4:37 am

This has to be fixed in the input file.

Possibly use a different delimiter.

DataStage cannot guess which is the delemiter and which is data.

jwiles · Post by **jwiles** » Wed Apr 20, 2011 6:36 am

Is the <non-delimiter> comma always present in the column, or just sometimes?

If always present, one workaround is to just adjust the schema to import that column as two columns and then concatenate them later in a transformer.

If no always present, read the entire record as a single column (suggested above), count the number of commas present and parse accordingly.

Regards,

h4harry1 · Post by **h4harry1** » Wed Apr 20, 2011 7:11 am

jwiles wrote:Is the <non-delimiter> comma always present in the column, or just sometimes?

If always present, one workaround is to just adjust the schema to import that column as two columns and then concatenate them later in a transformer.

If no always present, read the entire record as a single column (suggested above), count the number of commas present and parse accordingly.

Regards,

Thanks for the reply
Comma (non-delimiter) is not present in all of the the columns, only in few records. My CSV contains 90 columns and if i go for your second suggestion i think it takes lot of time....
Any suggestion ?
Regards

h4harry1 · Post by **h4harry1** » Wed Apr 20, 2011 7:13 am

blewip wrote:This has to be fixed in the input file.

Possibly use a different delimiter.

DataStage cannot guess which is the delemiter and which is data.

Any other suggestion plz. I m not allowed to change source file at all.
Thanks

zulfi123786 · Post by **zulfi123786** » Wed Apr 20, 2011 7:19 am

your fields are not quoted
you cant change the input file
you have delimiters within the columns
Its is not a fixed width file, (its comma delimited)

How can we make datastage to guess where the string ends?

blewip · Post by **blewip** » Wed Apr 20, 2011 7:31 am

h4harry1 wrote:
blewip wrote:This has to be fixed in the input file.

Possibly use a different delimiter.

DataStage cannot guess which is the delemiter and which is data.
Any other suggestion plz. I m not allowed to change source file at all.
Thanks

The source file is in error, simple as that.

You cannot use a delimiter that is also valid data for the fields.

chulett · Post by **chulett** » Wed Apr 20, 2011 7:35 am

h4harry1 wrote:Any other suggestion plz. I m not allowed to change source file at all.
Thanks

Whomever is producing the file needs to "change" it. What you are receiving right now is invalid.

jwiles · Post by **jwiles** » Wed Apr 20, 2011 8:01 am

h4harry1 wrote:if i go for your second suggestion i think it takes lot of time....

Not necessarily...you can either parse it out within a transformer (which would be less efficient) or just split the stream to two different column importers and follow one with a transformer to "fix" the extra column. Then funnel the streams back together. Thinking it takes a lot of time is not the same as knowing...you won't know for certain

Craig's suggestion is the preferred method...the data should be fixed at the source before you receive it. Make a request that the string columns be quoted: "Last Name, First Name","City Name","State" so you can properly import the data with a minimum of extra work and processing time. The worst they can say is "no", at which point you'll need to implement one of the workarounds that's been suggested.

Regards,

blewip · Post by **blewip** » Wed Apr 20, 2011 10:19 am

Second option wouldn't work if there were two commas in a field.

If it were free text and they allowed commas why wouldn't there be two.

MarkB · Post by **MarkB** » Wed Apr 20, 2011 10:24 am

blewip wrote:Second option wouldn't work if there were two commas in a field.

If it were free text and they allowed commas why wouldn't there be two.

...which is why, in this case, the only logical solution is Craig's - the file is bad ... go back to the source and have them give you the file in the correct format.

DSXchange

Reading Flat file (CSV)

Reading Flat file (CSV)

Re: Reading Flat file (CSV)