Issue with pipe delimeter

krisna · Post by **krisna** » Thu Nov 04, 2010 11:03 pm

Hi,

I had a requirement, where my source is a sequential file and it is pipe delimited. In source file description column has a value with pipe.

for example : second_column has value car|care

record looks like this

1|car|care|UK

where car|care is a single column value.

Problem here is when it is reading it is treated as two separate values, where as it should be a single column value.

ie first_column = 1
second_column = car|care
third_column = UK

Looking for a solution.

Thanks in Advance.

ray.wurlod · Post by **ray.wurlod** » Fri Nov 05, 2010 12:14 am

There isn't one unless each string field is quoted.

You can read the entire line as a single string and parse it according to your own rules, for example in a Transfomer stage.

This will actually give improved performance for large volumes because your parsing is being performed in parallel rather than sequentially.

chulett · Post by **chulett** » Fri Nov 05, 2010 7:09 am

Right - get the source file corrected, right now it is invalid.

1|"car|care"|UK