Page 1 of 1

Metadata and Dataset

Posted: Fri Jun 05, 2009 10:47 am
by wahi80
Hi,

I wanted to know if a dataset is created in Job1 with 20 columns.

In Job2 we read from this dataset and the metadata is defined only for 10 columns, the remaining 10 columns are never defined in the dataset.

The job executes successfully, but I wanted to know will the data integrity be maintained.Currently data looks fine, but I'm not sure of large data loads

Anyone faced similar issues?

What if I have a Job3 and I define only 5 columns out of 20 in metadata?

Regards
Ankur

Posted: Fri Jun 05, 2009 10:54 am
by nagarjuna
Hi ,
When you are creating any datasets then it will store the schema in the descriptor file .In your case , you have created dataset with 20 col .In otherjob , eventhough you specify 5 col you are able to view the data.But , If you change any datatype then you wont be able to read the data.

Posted: Fri Jun 05, 2009 3:39 pm
by ray.wurlod
Is Runtime Column Propagation enabled? What (precisely) do you mean by "data integrity" here?

In no stage type (except Sequential File stage*) do you need to read all columns from source.

* Even in Sequential File stage there is a column property "drop on import" available. But you still have to read every byte in the file to get to the next.

Posted: Fri Jun 05, 2009 3:52 pm
by wahi80
Run Time propogation is not enabled. By data integrity I meant if data could get corrupted

Posted: Sun Jun 07, 2009 2:31 pm
by ray.wurlod
Nothing will happen to data unless YOU program DataStage to make those changes.

Metadata and Dataset

Posted: Mon Jun 08, 2009 12:10 am
by ajay.vaidyanathan
Hi,

As Nagarjuna rightly said, unless you make any changes (Metadata Change) to your dataset in the first job, you can always go ahead and use it for the specified number of columns you want in the next successive jobs.

Posted: Mon Jun 08, 2009 6:17 am
by sjfearnside
If the source data you are reading has mandatory columns, i.e. must have a valid value, and you drop one or more of those columns, you will have a potential data integrity problem.