Selecting only specific columns from a file in unix

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

swerajan
Participant
Posts: 42
Joined: Tue Aug 05, 2008 4:34 am

Selecting only specific columns from a file in unix

Post by swerajan »

Hi
I have a file available in UNIX box. There are many columns in it. I want datastage to read only selected columns from the file. How can this be done? It is tedious process to load the metadata of the file to Sequential file stage. Please post me the answer as soon as possible
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Re: Selecting only specific columns from a file in unix

Post by Sainath.Srinivasan »

swerajan wrote:Please post me the answer as soon as possible
as soon as possible
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Is it a fixed width file?

How many columns are there in number ?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Sainath beat me to it.... But I'll add some information - You can declare the whole line as one VarChar() field and parse out what you need instead of entering a lot of column metadata.
swerajan
Participant
Posts: 42
Joined: Tue Aug 05, 2008 4:34 am

Post by swerajan »

Its a comma delimiter file. and the column number is 40
swerajan
Participant
Posts: 42
Joined: Tue Aug 05, 2008 4:34 am

Post by swerajan »

Sainath... u r tooooooooo brilliant to gve me such an answer.. Thanx ;) :wink:
swerajan
Participant
Posts: 42
Joined: Tue Aug 05, 2008 4:34 am

Post by swerajan »

Arnd... Can u b bit more specific with ur answer. datastage sounds like greek nd latin nw.. !!
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I'll let Sainath assist you further on this thread, now that the two of you have hit it off so well. :wink:
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

In your column definition, define only one field, say fullRecord, with datatype Varchar and give it maximum length possible for the record as whole.

Search for field command to split and process further.

Arnd = :twisted:
swerajan
Participant
Posts: 42
Joined: Tue Aug 05, 2008 4:34 am

Post by swerajan »

To use field function i need to use transformer and dat wld eat up performance rt?
One point in reading specific fields is with an aim to improve performance in addition to avoid manual typin of fields.. :)
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

But your columns do not have fixed layout to locate them.

Reading as Varchar will be useful to avoid translating unnecessary fields.

PS : Is it your keyboard having trouble recognising characters typed or you sending in cryptic format ? It is better to type full words.
kiran259
Participant
Posts: 48
Joined: Thu Aug 16, 2007 11:17 pm
Location: United States
Contact:

Post by kiran259 »

[quote="swerajan"]To use field function i need to use transformer and dat wld eat up performance rt?
Why don't you try and check the "performance"?If not,you look at Column Import stage.
Kiran Vaduguri

As soon as the fear approaches near, attack and destroy it.
swerajan
Participant
Posts: 42
Joined: Tue Aug 05, 2008 4:34 am

Post by swerajan »

The format is same.. but i need only limited fields.

I dint get wat you mean to say by this....
"PS : Is it your keyboard having trouble recognising character ...[/quote]"
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

swerajan - most of the participants on this forum are not native English speakers, so using "text-speak" or "chatspeak" with lots of cryptic abbreviations is not good and certainly not professional. Likewise, this is a volunteer forum and thus the response to your request for immediate and expedited service was not well received.

Sequential files need to be read completely by the system in any case, even if only one field is to be processed. Thus your most efficient method will be to declare the field metadata correctly. Second best is to use one column and then a transform stage to parse out the part that you want.

In the words of the author Robert Heinlein : "TANSTAAFL".
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

As noted, you cannot 'read selective columns' from sequential media, it's not in its nature. Also, I wouldn't call 40 columns 'many' or too many to make it any kind of difficult to get the metadata right. Sheesh. How long does it take to import the metadata for the file and then load it into the job? Moments. Sure, you may need to do a little typing if the file does not contain a header record but it's worth the small amount of effort so you have the proper metadata.

In the time you've been here or futzed around trying to find something 'easier' you could have done it the right way and have moved on.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply