Selective reading Metadata versus using a copy stage/Modify

samplify · Post by **samplify** » Mon Dec 10, 2007 8:49 am

Hi Folks,

Here is my dilemma:

I have 200 columns in the input source and I need only 100 for downstream processing, Should I read all 200 with a dataset and drop redundant using a copy/modify stage as next step or should I restrict the metadata definition in the input dataset to read only 100 selective columns (by altering the column definition/layout)

I am looking from the code maintainability and most importantly performance point of view.

Appreciate your help.

ray.wurlod · Post by **ray.wurlod** » Mon Dec 10, 2007 1:18 pm

Depends on your source. If it's a sequential file (in which you must read past every byte to get to the next) then you have to read them all. In this case you can use the column property Drop On Input. Otherwise, select only those columns that you actually need from the source table.

manjunathnet · Post by **manjunathnet** » Wed Dec 12, 2007 1:17 am

both the ways u can work, 1st read all 200 columns in seq file and then drop 100 un wanted columns by copy stage.
2nd u can directly read only required 100 columns.
becoze all these u r reading based on the metadata, so when u r loading columns from metadata for required 100 column and it reads data according the column defination not based on order of columns.

ray.wurlod · Post by **ray.wurlod** » Wed Dec 12, 2007 1:46 am

"you", not "u".
"are", not "r".
"because", not "becoze"
"definition", not "defination"