Reading data from flat file

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
pradkumar
Charter Member
Charter Member
Posts: 393
Joined: Wed Oct 18, 2006 1:09 pm

Reading data from flat file

Post by pradkumar »

Hi

My flat file has data in this form:
"Korea" "321" "FY2003" ....

These are tab delimited. How to mention this in sequential stage properties. Quote character is "
Pradeep Kumar
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

The TAB character is '009' and that's how you represent it in the stage.
-craig

"You can never have too many knives" -- Logan Nine Fingers
pradkumar
Charter Member
Charter Member
Posts: 393
Joined: Wed Oct 18, 2006 1:09 pm

Post by pradkumar »

Thanks a lot. Where can we get that code information from.

I am facing with one more issue. My data is having some other fields in flat file which is as follows:

"Korea" "321" "FY2003" #SI #SI 212.89 #SI 3SI 63.90
"ASIA" "211" "FY2002" #SI 32.98 #SI #SI 321.90

The Quotes are only for first few fields. So when I mentioned Quote character as " it was giving an error "read_delimited() invalid quotes"

Could anyone let me know how to solve this?
Pradeep Kumar
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

pradkumar wrote:Thanks a lot. Where can we get that code information from.
Any ASCII chart, there are tons of sites on the web - take your pick. :wink: Note that you'll see the TAB character referred to as HT or Horizontal Tab on some sites, same thing.
pradkumar also wrote:I am facing with one more issue. My data is having some other fields in flat file which is as follows:

"Korea" "321" "FY2003" #SI #SI 212.89 #SI 3SI 63.90
"ASIA" "211" "FY2002" #SI 32.98 #SI #SI 321.90

The Quotes are only for first few fields. So when I mentioned Quote character as " it was giving an error "read_delimited() invalid quote"
That shouldn't be because not all fields are quoted... it should mean there is a field with 'unbalanced quotes' in it - meaning an odd number. Check the records it complains about for that issue and let us know. May look something like this:

Code: Select all

"ASIA" "211" "FY2002" #SI 32.98 #SI" #SI 321.90
Or this:

Code: Select all

"ASIA" "211" "FY2002" #SI 32.98 "#"SI" #SI 321.90
Both will throw that error.
-craig

"You can never have too many knives" -- Logan Nine Fingers
pradkumar
Charter Member
Charter Member
Posts: 393
Joined: Wed Oct 18, 2006 1:09 pm

Post by pradkumar »

I went through the entire file.
The problem is not because of unbalanced quotes. Every record is of the same type.

"asia" "211" "FY2001" 890.00 987.0 78.90
"UK" "432" "FY2003" #SI 89.00 87.00 12.00
Pradeep Kumar
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Does it complain about every record or only some of the records? :? All it takes is one stray character in one record to give you that message. The error should also have told you which field had the problem.

Setting the quote character to " doesn't mean every field needs to be surrounded by those particular quote characters. The records you posted shouldn't be any problem to read... except the first record seems to have six fields in it and the second record seems to have seven... which is it?
-craig

"You can never have too many knives" -- Logan Nine Fingers
pradkumar
Charter Member
Charter Member
Posts: 393
Joined: Wed Oct 18, 2006 1:09 pm

Post by pradkumar »

Yes Craig.
It is compaining for each and every record. i deleted few records and tested it, but it was of no use :( . For the unequal number of columns, I am handling it by using "REPLACE" option in Incomplete Column field.

Only few fields in each record has quote character. The amount fields are not having any quote character
Pradeep Kumar
pradkumar
Charter Member
Charter Member
Posts: 393
Joined: Wed Oct 18, 2006 1:09 pm

Post by pradkumar »

I tried removing Quotes for the fields. I removed them. But when I am trying to read the file, the entire record is coming in the firsr field only.

Now the i/p record is :
asia 211 FY2001 890.00 987.0 78.90

Delimiter: 009
Quote: 000

In Output, the entire record is fitted into the first column only.
Pradeep Kumar
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

pradkumar wrote:Yes Craig.
It is compaining for each and every record. i deleted few records and tested it, but it was of no use :( . For the unequal number of columns, I am handling it by using "REPLACE" option in Incomplete Column field.

Only few fields in each record has quote character. The amount fields are not having any quote character
Your last two statements are perfectly fine - that's normal for a sequential file. We're rapidly approaching a place where I can't help you any more and you'll need to figure this out on your own. If it is complaining about every record, then something is fundamentally wrong with your metadata in the stage. Not being there to peer over your shoulder makes it hard to know exactly what you may have wrong.

There's no reason why records like you've posted can't be read. Heck, I created a file with your two example records in it and had no issues reading it other than the missing column. Make sure you have the Delimiter set to '009', the Quote character to a double-quote and you have not changed any of the Default values there and you should be fine.

:!: You do realize 'missing' columns are only missing from the end of the record, not from the middle, yes? I am assuming there are four text fields then three numeric fields in each record, a correctly formatted flat file where the third text field as 'missing' would look like this:

Code: Select all

"asia"<tab>"211"<tab>"FY2001"<tab><tab>890.00<tab>987.0<tab>78.90 
"UK"<tab>"432"<tab>"FY2003"<tab>#SI<tab>89.00<tab>87.00<tab>12.00 
Assuming the equivalent of #SI is what is 'missing' from the first record. Otherwise, what you'll get this this:

Code: Select all

"asia"<tab>"211"<tab>"FY2001"<tab>890.00<tab>987.0<tab>78.90<tab>[missing!]  
"UK"<tab>"432"<tab>"FY2003"<tab>#SI<tab>89.00<tab>87.00<tab>12.00 
The numerics will all slide down one position and mess you up. See the difference? Can you bring the file up in some kind of Windows editor that can show you the tabs, something akin to UNIX / vi where it can show them a ^I characters? If everything in the file is correct, then there should be no reason on your part to handle any 'missing columns'.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

pradkumar wrote:I tried removing Quotes for the fields. I removed them. But when I am trying to read the file, the entire record is coming in the firsr field only.

Now the i/p record is :
asia 211 FY2001 890.00 987.0 78.90

Delimiter: 009
Quote: 000

In Output, the entire record is fitted into the first column only.
There was no reason to remove the quotes, nothing you've shown us indicates anything other than your settings in the stage are wrong. Besides, are you planning on manually editing these files before you load them each time?

Now you've managed to remove the tabs from the file as well or everything wouldn't have been read into the first field. :evil:

Start over.
-craig

"You can never have too many knives" -- Logan Nine Fingers
pradkumar
Charter Member
Charter Member
Posts: 393
Joined: Wed Oct 18, 2006 1:09 pm

Post by pradkumar »

I tried to read with delimiter 009 and quote character ". The result was that entire input record moved into a single output column..

It also says invalid quotes.
Is this a problem with Metadata? or soemthing else.
Pradeep Kumar
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

pradkumar wrote:I tried to read with delimiter 009 and quote character ". The result was that entire input record moved into a single output column..

It also says invalid quotes.
Is this a problem with Metadata? or soemthing else.
Metadata - meaning the settings you've used in the stage do not match what it finds in the file. If they did all would be well. 'Everything in the first column' simply means you've got the delimiter wrong. I've explained what 'Invalid Quotes' mean.

This isn't all that hard, but you do need to know the gory details of the file you are trying to read. Make sure all settings are correct and the file will read in fine. You may need a hex editor to see what the delimiter actually is in that file...
-craig

"You can never have too many knives" -- Logan Nine Fingers
pradkumar
Charter Member
Charter Member
Posts: 393
Joined: Wed Oct 18, 2006 1:09 pm

Post by pradkumar »

Hi Craig

I got it. I downloaded hex editor and found the delimiter. Thanks for your valuable suggestions.

I used &h20 for delimiter and quotes: "
It worked perfectly

Thanks once again
Pradeep Kumar
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Hex 20 is a space, btw. So this works on the original file? Any issues with the 'missing' column or is all good there as well?
-craig

"You can never have too many knives" -- Logan Nine Fingers
pradkumar
Charter Member
Charter Member
Posts: 393
Joined: Wed Oct 18, 2006 1:09 pm

Post by pradkumar »

There was an issue with missing columns. I used in the Incomplete column description "REPLACE". It worked fine
Pradeep Kumar
Post Reply