Reading data from flat file
Moderators: chulett, rschirm, roy
Reading data from flat file
Hi
My flat file has data in this form:
"Korea" "321" "FY2003" ....
These are tab delimited. How to mention this in sequential stage properties. Quote character is "
My flat file has data in this form:
"Korea" "321" "FY2003" ....
These are tab delimited. How to mention this in sequential stage properties. Quote character is "
Pradeep Kumar
Thanks a lot. Where can we get that code information from.
I am facing with one more issue. My data is having some other fields in flat file which is as follows:
"Korea" "321" "FY2003" #SI #SI 212.89 #SI 3SI 63.90
"ASIA" "211" "FY2002" #SI 32.98 #SI #SI 321.90
The Quotes are only for first few fields. So when I mentioned Quote character as " it was giving an error "read_delimited() invalid quotes"
Could anyone let me know how to solve this?
I am facing with one more issue. My data is having some other fields in flat file which is as follows:
"Korea" "321" "FY2003" #SI #SI 212.89 #SI 3SI 63.90
"ASIA" "211" "FY2002" #SI 32.98 #SI #SI 321.90
The Quotes are only for first few fields. So when I mentioned Quote character as " it was giving an error "read_delimited() invalid quotes"
Could anyone let me know how to solve this?
Pradeep Kumar
Any ASCII chart, there are tons of sites on the web - take your pick. Note that you'll see the TAB character referred to as HT or Horizontal Tab on some sites, same thing.pradkumar wrote:Thanks a lot. Where can we get that code information from.
That shouldn't be because not all fields are quoted... it should mean there is a field with 'unbalanced quotes' in it - meaning an odd number. Check the records it complains about for that issue and let us know. May look something like this:pradkumar also wrote:I am facing with one more issue. My data is having some other fields in flat file which is as follows:
"Korea" "321" "FY2003" #SI #SI 212.89 #SI 3SI 63.90
"ASIA" "211" "FY2002" #SI 32.98 #SI #SI 321.90
The Quotes are only for first few fields. So when I mentioned Quote character as " it was giving an error "read_delimited() invalid quote"
Code: Select all
"ASIA" "211" "FY2002" #SI 32.98 #SI" #SI 321.90
Code: Select all
"ASIA" "211" "FY2002" #SI 32.98 "#"SI" #SI 321.90
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Does it complain about every record or only some of the records? All it takes is one stray character in one record to give you that message. The error should also have told you which field had the problem.
Setting the quote character to " doesn't mean every field needs to be surrounded by those particular quote characters. The records you posted shouldn't be any problem to read... except the first record seems to have six fields in it and the second record seems to have seven... which is it?
Setting the quote character to " doesn't mean every field needs to be surrounded by those particular quote characters. The records you posted shouldn't be any problem to read... except the first record seems to have six fields in it and the second record seems to have seven... which is it?
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Yes Craig.
It is compaining for each and every record. i deleted few records and tested it, but it was of no use . For the unequal number of columns, I am handling it by using "REPLACE" option in Incomplete Column field.
Only few fields in each record has quote character. The amount fields are not having any quote character
It is compaining for each and every record. i deleted few records and tested it, but it was of no use . For the unequal number of columns, I am handling it by using "REPLACE" option in Incomplete Column field.
Only few fields in each record has quote character. The amount fields are not having any quote character
Pradeep Kumar
I tried removing Quotes for the fields. I removed them. But when I am trying to read the file, the entire record is coming in the firsr field only.
Now the i/p record is :
asia 211 FY2001 890.00 987.0 78.90
Delimiter: 009
Quote: 000
In Output, the entire record is fitted into the first column only.
Now the i/p record is :
asia 211 FY2001 890.00 987.0 78.90
Delimiter: 009
Quote: 000
In Output, the entire record is fitted into the first column only.
Pradeep Kumar
Your last two statements are perfectly fine - that's normal for a sequential file. We're rapidly approaching a place where I can't help you any more and you'll need to figure this out on your own. If it is complaining about every record, then something is fundamentally wrong with your metadata in the stage. Not being there to peer over your shoulder makes it hard to know exactly what you may have wrong.pradkumar wrote:Yes Craig.
It is compaining for each and every record. i deleted few records and tested it, but it was of no use . For the unequal number of columns, I am handling it by using "REPLACE" option in Incomplete Column field.
Only few fields in each record has quote character. The amount fields are not having any quote character
There's no reason why records like you've posted can't be read. Heck, I created a file with your two example records in it and had no issues reading it other than the missing column. Make sure you have the Delimiter set to '009', the Quote character to a double-quote and you have not changed any of the Default values there and you should be fine.
You do realize 'missing' columns are only missing from the end of the record, not from the middle, yes? I am assuming there are four text fields then three numeric fields in each record, a correctly formatted flat file where the third text field as 'missing' would look like this:
Code: Select all
"asia"<tab>"211"<tab>"FY2001"<tab><tab>890.00<tab>987.0<tab>78.90
"UK"<tab>"432"<tab>"FY2003"<tab>#SI<tab>89.00<tab>87.00<tab>12.00
Code: Select all
"asia"<tab>"211"<tab>"FY2001"<tab>890.00<tab>987.0<tab>78.90<tab>[missing!]
"UK"<tab>"432"<tab>"FY2003"<tab>#SI<tab>89.00<tab>87.00<tab>12.00
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
There was no reason to remove the quotes, nothing you've shown us indicates anything other than your settings in the stage are wrong. Besides, are you planning on manually editing these files before you load them each time?pradkumar wrote:I tried removing Quotes for the fields. I removed them. But when I am trying to read the file, the entire record is coming in the firsr field only.
Now the i/p record is :
asia 211 FY2001 890.00 987.0 78.90
Delimiter: 009
Quote: 000
In Output, the entire record is fitted into the first column only.
Now you've managed to remove the tabs from the file as well or everything wouldn't have been read into the first field.
Start over.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Metadata - meaning the settings you've used in the stage do not match what it finds in the file. If they did all would be well. 'Everything in the first column' simply means you've got the delimiter wrong. I've explained what 'Invalid Quotes' mean.pradkumar wrote:I tried to read with delimiter 009 and quote character ". The result was that entire input record moved into a single output column..
It also says invalid quotes.
Is this a problem with Metadata? or soemthing else.
This isn't all that hard, but you do need to know the gory details of the file you are trying to read. Make sure all settings are correct and the file will read in fine. You may need a hex editor to see what the delimiter actually is in that file...
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers