Reading Flat file (CSV)

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
h4harry1
Participant
Posts: 16
Joined: Sat Mar 19, 2011 8:01 am

Reading Flat file (CSV)

Post by h4harry1 »

Hi All,
I have a Flat file with comma as delimiter in the columns. I am reading data from this file using sequential file stage (Delimiter Comma). But the only problem is one of the column value includes comma in it. Don't know how to handle this. Any Help.
Thanks in advance.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Are the character strings quoted?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Sampath
Participant
Posts: 21
Joined: Tue Apr 05, 2005 2:03 am

Re: Reading Flat file (CSV)

Post by Sampath »

You can take full row as a one record and put transform stage and use Field Function
h4harry1
Participant
Posts: 16
Joined: Sat Mar 19, 2011 8:01 am

Post by h4harry1 »

ray.wurlod wrote:Are the character strings quoted? ...
Thanks for reply...

NO .... Not Quoted...
blewip
Participant
Posts: 81
Joined: Wed Nov 10, 2004 10:55 am
Location: London

Post by blewip »

This has to be fixed in the input file.

Possibly use a different delimiter.

DataStage cannot guess which is the delemiter and which is data.
Modern Life is Rubbish - Blur
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Is the <non-delimiter> comma always present in the column, or just sometimes?

If always present, one workaround is to just adjust the schema to import that column as two columns and then concatenate them later in a transformer.

If no always present, read the entire record as a single column (suggested above), count the number of commas present and parse accordingly.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
h4harry1
Participant
Posts: 16
Joined: Sat Mar 19, 2011 8:01 am

Post by h4harry1 »

jwiles wrote:Is the <non-delimiter> comma always present in the column, or just sometimes?

If always present, one workaround is to just adjust the schema to import that column as two columns and then concatenate them later in a transformer.

If no always present, read the entire record as a single column (suggested above), count the number of commas present and parse accordingly.

Regards,
Thanks for the reply
Comma (non-delimiter) is not present in all of the the columns, only in few records. My CSV contains 90 columns and if i go for your second suggestion i think it takes lot of time....
Any suggestion ?
Regards
h4harry1
Participant
Posts: 16
Joined: Sat Mar 19, 2011 8:01 am

Post by h4harry1 »

blewip wrote:This has to be fixed in the input file.

Possibly use a different delimiter.

DataStage cannot guess which is the delemiter and which is data.
Any other suggestion plz. I m not allowed to change source file at all.
Thanks
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

Post by zulfi123786 »

your fields are not quoted
you cant change the input file
you have delimiters within the columns
Its is not a fixed width file, (its comma delimited)

How can we make datastage to guess where the string ends?
blewip
Participant
Posts: 81
Joined: Wed Nov 10, 2004 10:55 am
Location: London

Post by blewip »

h4harry1 wrote:
blewip wrote:This has to be fixed in the input file.

Possibly use a different delimiter.

DataStage cannot guess which is the delemiter and which is data.
Any other suggestion plz. I m not allowed to change source file at all.
Thanks
The source file is in error, simple as that.

You cannot use a delimiter that is also valid data for the fields.
Modern Life is Rubbish - Blur
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

h4harry1 wrote:Any other suggestion plz. I m not allowed to change source file at all.
Thanks
Whomever is producing the file needs to "change" it. What you are receiving right now is invalid.
-craig

"You can never have too many knives" -- Logan Nine Fingers
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

h4harry1 wrote:if i go for your second suggestion i think it takes lot of time....
Not necessarily...you can either parse it out within a transformer (which would be less efficient) or just split the stream to two different column importers and follow one with a transformer to "fix" the extra column. Then funnel the streams back together. Thinking it takes a lot of time is not the same as knowing...you won't know for certain

Craig's suggestion is the preferred method...the data should be fixed at the source before you receive it. Make a request that the string columns be quoted: "Last Name, First Name","City Name","State" so you can properly import the data with a minimum of extra work and processing time. The worst they can say is "no", at which point you'll need to implement one of the workarounds that's been suggested.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
blewip
Participant
Posts: 81
Joined: Wed Nov 10, 2004 10:55 am
Location: London

Post by blewip »

Second option wouldn't work if there were two commas in a field.

If it were free text and they allowed commas why wouldn't there be two.
Modern Life is Rubbish - Blur
MarkB
Premium Member
Premium Member
Posts: 95
Joined: Fri Oct 27, 2006 9:13 am

Post by MarkB »

blewip wrote:Second option wouldn't work if there were two commas in a field.

If it were free text and they allowed commas why wouldn't there be two.
...which is why, in this case, the only logical solution is Craig's - the file is bad ... go back to the source and have them give you the file in the correct format.
Post Reply