Page 1 of 1

Reading pipe delimited; but some columns missing

Posted: Fri Oct 04, 2013 2:57 am
by karumudi7
Hi,

I have a pipe delimited file with 3 columns. But for some records at the end there are some missing fields and pipes (delimiter) is also not availble.
How can I read that file without losing any records!
Sample File:

Code: Select all

 Name|Email|Country
 ABC|abc@gmail.com|USA
 DEF|
 GHXY||INDIA
Here the problem is with second record, while reading the file I am getting only 2 records. Second one is rejected with error "Email" column not found...

Need to load these 3 records into the table, if data is not available NULL will be loaded for the purticular column.

Thanks!

Posted: Fri Oct 04, 2013 3:28 am
by ArndW
This was easier in Server, but there should be a way to do this in parallel jobs. Is this sequential file defined with variable field/record length and with a <cr><lf> or <lf> line terminator?

Posted: Fri Oct 04, 2013 3:41 am
by ArndW
I experimented a bit and while there is no equivalent to the server "missing columns" mode, you can put in the following filter condition:

Code: Select all

awk 'BEGIN {FS = OFS = "|"} {$3 = $3; print}'
This will pad each input line to make exactly 3 pipe delimited fields; then you need to handle the empty strings appropriately in your job.

Posted: Fri Oct 04, 2013 4:05 am
by karumudi7
ArndW wrote:I experimented a bit and while there is no equivalent to the server "missing columns" mode, you can put in the following filter condition:

Code: Select all

awk 'BEGIN {FS = OFS = "|"}  ...[/quote]

Unable to read, Premium content.  :(

Posted: Fri Oct 04, 2013 4:37 am
by arunkumarmm
Never mind about your column delimiter. Read the whole record as one single column and split it up inside the transformer using field.

Posted: Fri Oct 04, 2013 5:15 am
by ray.wurlod
Use a server Sequential File stage in a server Shared Container and make use of its excellent missing columns functionality.

Or, if it's a small to medium sized file, use a server job entirely.

Posted: Sun Oct 06, 2013 10:17 am
by rsomiset
Did you try to read the whole record as 1 single column in source stage and then split using Field function in a transformer stage using delimiter(pipe in this case)?