Page 1 of 2

Selecting only specific columns from a file in unix

Posted: Fri Sep 04, 2009 2:58 am
by swerajan
Hi
I have a file available in UNIX box. There are many columns in it. I want datastage to read only selected columns from the file. How can this be done? It is tedious process to load the metadata of the file to Sequential file stage. Please post me the answer as soon as possible

Re: Selecting only specific columns from a file in unix

Posted: Fri Sep 04, 2009 3:34 am
by Sainath.Srinivasan
swerajan wrote:Please post me the answer as soon as possible
as soon as possible

Posted: Fri Sep 04, 2009 3:35 am
by Sainath.Srinivasan
Is it a fixed width file?

How many columns are there in number ?

Posted: Fri Sep 04, 2009 3:35 am
by ArndW
Sainath beat me to it.... But I'll add some information - You can declare the whole line as one VarChar() field and parse out what you need instead of entering a lot of column metadata.

Posted: Fri Sep 04, 2009 3:47 am
by swerajan
Its a comma delimiter file. and the column number is 40

Posted: Fri Sep 04, 2009 3:48 am
by swerajan
Sainath... u r tooooooooo brilliant to gve me such an answer.. Thanx ;) :wink:

Posted: Fri Sep 04, 2009 3:50 am
by swerajan
Arnd... Can u b bit more specific with ur answer. datastage sounds like greek nd latin nw.. !!

Posted: Fri Sep 04, 2009 3:51 am
by ArndW
I'll let Sainath assist you further on this thread, now that the two of you have hit it off so well. :wink:

Posted: Fri Sep 04, 2009 3:58 am
by Sainath.Srinivasan
In your column definition, define only one field, say fullRecord, with datatype Varchar and give it maximum length possible for the record as whole.

Search for field command to split and process further.

Arnd = :twisted:

Posted: Fri Sep 04, 2009 4:02 am
by swerajan
To use field function i need to use transformer and dat wld eat up performance rt?
One point in reading specific fields is with an aim to improve performance in addition to avoid manual typin of fields.. :)

Posted: Fri Sep 04, 2009 4:18 am
by Sainath.Srinivasan
But your columns do not have fixed layout to locate them.

Reading as Varchar will be useful to avoid translating unnecessary fields.

PS : Is it your keyboard having trouble recognising characters typed or you sending in cryptic format ? It is better to type full words.

Posted: Fri Sep 04, 2009 4:21 am
by kiran259
[quote="swerajan"]To use field function i need to use transformer and dat wld eat up performance rt?
Why don't you try and check the "performance"?If not,you look at Column Import stage.

Posted: Fri Sep 04, 2009 4:23 am
by swerajan
The format is same.. but i need only limited fields.

I dint get wat you mean to say by this....
"PS : Is it your keyboard having trouble recognising character ...[/quote]"

Posted: Fri Sep 04, 2009 4:49 am
by ArndW
swerajan - most of the participants on this forum are not native English speakers, so using "text-speak" or "chatspeak" with lots of cryptic abbreviations is not good and certainly not professional. Likewise, this is a volunteer forum and thus the response to your request for immediate and expedited service was not well received.

Sequential files need to be read completely by the system in any case, even if only one field is to be processed. Thus your most efficient method will be to declare the field metadata correctly. Second best is to use one column and then a transform stage to parse out the part that you want.

In the words of the author Robert Heinlein : "TANSTAAFL".

Posted: Fri Sep 04, 2009 7:21 am
by chulett
As noted, you cannot 'read selective columns' from sequential media, it's not in its nature. Also, I wouldn't call 40 columns 'many' or too many to make it any kind of difficult to get the metadata right. Sheesh. How long does it take to import the metadata for the file and then load it into the job? Moments. Sure, you may need to do a little typing if the file does not contain a header record but it's worth the small amount of effort so you have the proper metadata.

In the time you've been here or futzed around trying to find something 'easier' you could have done it the right way and have moved on.