Page 1 of 1
Reading pipe delimited; but some columns missing
Posted: Fri Oct 04, 2013 2:57 am
by karumudi7
Hi,
I have a pipe delimited file with 3 columns. But for some records at the end there are some missing fields and pipes (delimiter) is also not availble.
How can I read that file without losing any records!
Sample File:
Code: Select all
Name|Email|Country
ABC|abc@gmail.com|USA
DEF|
GHXY||INDIA
Here the problem is with second record, while reading the file I am getting only 2 records. Second one is rejected with error "Email" column not found...
Need to load these 3 records into the table, if data is not available NULL will be loaded for the purticular column.
Thanks!
Posted: Fri Oct 04, 2013 3:28 am
by ArndW
This was easier in Server, but there should be a way to do this in parallel jobs. Is this sequential file defined with variable field/record length and with a <cr><lf> or <lf> line terminator?
Posted: Fri Oct 04, 2013 3:41 am
by ArndW
I experimented a bit and while there is no equivalent to the server "missing columns" mode, you can put in the following filter condition:
Code: Select all
awk 'BEGIN {FS = OFS = "|"} {$3 = $3; print}'
This will pad each input line to make exactly 3 pipe delimited fields; then you need to handle the empty strings appropriately in your job.
Posted: Fri Oct 04, 2013 4:05 am
by karumudi7
ArndW wrote:I experimented a bit and while there is no equivalent to the server "missing columns" mode, you can put in the following filter condition:
Code: Select all
awk 'BEGIN {FS = OFS = "|"} ...[/quote]
Unable to read, Premium content. :(
Posted: Fri Oct 04, 2013 4:37 am
by arunkumarmm
Never mind about your column delimiter. Read the whole record as one single column and split it up inside the transformer using field.
Posted: Fri Oct 04, 2013 5:15 am
by ray.wurlod
Use a server Sequential File stage in a server Shared Container and make use of its excellent missing columns functionality.
Or, if it's a small to medium sized file, use a server job entirely.
Posted: Sun Oct 06, 2013 10:17 am
by rsomiset
Did you try to read the whole record as 1 single column in source stage and then split using Field function in a transformer stage using delimiter(pipe in this case)?