problem with CSV data.

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Jaleel
Participant
Posts: 13
Joined: Wed Jul 19, 2006 3:57 am

problem with CSV data.

Post by Jaleel »

Hi,

We are getting data in CSV files. These CSV files has "," as field delimiter and Quotes for the data as usual. But the data itself is coming with commas and quotes in between. And we should not change the csv files. But we should import the data successfully.

So anyone can please help me about which file stage i should use to import this type of data successfully?

Or Is there any other way to import such files?

Thanks in advance,
Jaleel
Thanks n Regards,
Jaleel :-)
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

PX has problems when the source csv file is of the format

Code: Select all

1,2,"He,llo","",,"World"
The 3rd column is interpreted correctly, but if the 5th is a string column then it isn't imported as a "null" but causes and error. If this is your problem then I understand your frustration as I've gone through it as well. I've given up trying to get it sorted out and have ended up using a Server job to reformat the data into a fixed length padded format and then had PX read that.
Last edited by ArndW on Thu Aug 24, 2006 9:55 am, edited 2 times in total.
thumsup9
Charter Member
Charter Member
Posts: 168
Joined: Fri Feb 18, 2005 11:29 am

Post by thumsup9 »

What did you mention in you Quote Character under Format Tab in sequential file.
thumsup9
Charter Member
Charter Member
Posts: 168
Joined: Fri Feb 18, 2005 11:29 am

Post by thumsup9 »

Infact its not a bad idea to ask to fix this problem at the source. You can ask whoever sends you the data to change the delimiter to a pipe.. life will be easy...
Jaleel
Participant
Posts: 13
Joined: Wed Jul 19, 2006 3:57 am

Post by Jaleel »

thumsup9 wrote:What did you mention in you Quote Character under Format Tab in sequential file.
I have checked with " and 000 both. But it didn't worked. Could you help me with any other solution!
Thanks n Regards,
Jaleel :-)
ushasunkara
Participant
Posts: 23
Joined: Wed Jan 18, 2006 10:43 am

Post by ushasunkara »

Hi ArndW,

How did you sort it in Server job - can you please give an example...

thankyou...
Usha.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

In my case I just used a sequential file stage to read the file (DS Server interprets the ",," nullable input correctly) and then outputted a fixed width sequential that PX Server then reads. With an appropriate null pad character and fixed lengths the PX read is very easy to do.

I'm not sure that I understood the original problem; perhaps the poster could add one sample line of data that illustrates the problem. It might just be a matter of defining quote and separator characters.
Jaleel
Participant
Posts: 13
Joined: Wed Jul 19, 2006 3:57 am

Post by Jaleel »

ArndW wrote:In my case I just used a sequential file stage to read the file (DS Server interprets the ",," nullable input correctly) and then outputted a fixed width sequential that PX Server then reads. With an appropriate null pad character and fixed lengths the PX read is very easy to do.

I'm not sure that I understood the original problem; perhaps the poster could add one sample line of data that illustrates the problem. It might just be a matter of defining quote and separator characters.
The sample data in the csv file look like this...

"The Landmark "US" @ One Market, Suite 300","100","John"

The issues in this data are in the 1st column- This column is having both ',' and quotes in between the data.
If i specify the quote character as 000 even to avoid the quotes, the column is splitting at ',' and the data is interpreted as 4 columns even there are only 3 columns.
Thanks n Regards,
Jaleel :-)
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

OK, I was afraid of that. In this case DS (both server and PX) have no way of knowing where to split the columns. The file is not machine-readable (unless you can guarantee that the string of `","` does not show up in the text and all columns are double-quote delimited. Have your program that generates the .CSV either use a quote character that does not appear in the text, use fixed lengths, or correctly handle embedded quotes in the strings.
Jaleel
Participant
Posts: 13
Joined: Wed Jul 19, 2006 3:57 am

Post by Jaleel »

ArndW wrote:OK, I was afraid of that. In this case DS (both server and PX) have no way of knowing where to split the columns. The file is not machine-readable (unless you can guarantee that the string of `","` does not show up in the text and all columns are double-quote delimited. Have your program that generates the .CSV either use a quote character that does not appear in the text, use fixed lengths, or correctly handle embedded quotes in the strings.

Thanks Arndw,

I want to know how can we specify fixed lengths in a csv file.
Thanks n Regards,
Jaleel :-)
meena
Participant
Posts: 430
Joined: Tue Sep 13, 2005 12:17 pm

Post by meena »

Hi Jaleel,
There is an option in Format tab:"Fixed width column".
I want to know how can we specify fixed lengths in a csv file
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

"Fixed width" and "comma-separated values" are mutually exclusive terms and mutually exclusive technologies. You have one or the other.

Require "them" to provide legal delimited files. If the data contains commas, get them to use a different delimiter. Or pre-process the file using a server job. You might be surprised how fast this can be.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply