Problem in reading .csv file

dnat · Post by **dnat** » Thu Nov 09, 2017 4:25 pm

Hi,

I have a .csv format file and it comes in the following way

For ex it has 4 fields

a,345,cd,678
b,"6,878",cd,"9,870"
c,"7,989",fg,"7,880"
d,567,cd,989

The second and the fourth field are numeric fields and if it is more than 3 digit number the it comes with a comma but within double quotes.
When i try view data, i am not able to view the 2nd and 3rd record. The issue is because how the final delimiter string is stored.
If i give Final delimiter as end then 1st and 4th record are viewable. If make Final delimiter as \n then 2nd and 3rd record are viewable. Any idea how to handle it?

chulett · Post by **chulett** » Thu Nov 09, 2017 5:16 pm

Not seeing off the top of my head how this is related to the final delimiter. However, I'm curious how did you define the metadata? To me they should all be defined as strings and then the 2nd and 4th fields will need to be converted inside the job post-read...

dnat · Post by **dnat** » Thu Nov 09, 2017 5:28 pm

Yes, they are all defined as Strings. And i will be converting those fields as required inside the job.

But now the problem is with reading the file itself. While i view data in sequential file stage the 2nd and 3rd record get dropped if i specify the Final delimiter as "end" or "none". If i specify the Final delimiter as "\n" then 1st and 4th record get dropped.

The issue is definitely in the source file as they said that we will be sending the file with whatever formats they already have. But is there a way to handle this?

UCDI · Post by **UCDI** » Fri Nov 10, 2017 10:01 am

several things you may be able to do...

- you could not use numeric commas (123456 is written 123456 not 123,456)
- you could use something other than , for the fields (tab, |, #, whatever)
- you could parse the entire line by hand

the first 2 options may mean writing a pre-processor to convert the files. This is probably a simple line or 2 of unix magic commands or could be a short C program or a datastage transformer (which becomes the 3rd option).

ray.wurlod · Post by **ray.wurlod** » Fri Nov 10, 2017 5:12 pm

Try using a Server Sequential File stage (perhaps in a Server Shared Container). This stage is much more flexible about how it handles multiple formats.

dnat · Post by **dnat** » Mon Nov 13, 2017 12:05 pm

Thanks for all your inputs. I added the "Record delimiter string" to DOS format and was able to read the file as it was.