Metadata and Dataset

wahi80 · Post by **wahi80** » Fri Jun 05, 2009 10:47 am

Hi,

I wanted to know if a dataset is created in Job1 with 20 columns.

In Job2 we read from this dataset and the metadata is defined only for 10 columns, the remaining 10 columns are never defined in the dataset.

The job executes successfully, but I wanted to know will the data integrity be maintained.Currently data looks fine, but I'm not sure of large data loads

Anyone faced similar issues?

What if I have a Job3 and I define only 5 columns out of 20 in metadata?

Regards
Ankur

nagarjuna · Post by **nagarjuna** » Fri Jun 05, 2009 10:54 am

Hi ,
When you are creating any datasets then it will store the schema in the descriptor file .In your case , you have created dataset with 20 col .In otherjob , eventhough you specify 5 col you are able to view the data.But , If you change any datatype then you wont be able to read the data.

ray.wurlod · Post by **ray.wurlod** » Fri Jun 05, 2009 3:39 pm

Is Runtime Column Propagation enabled? What (precisely) do you mean by "data integrity" here?

In no stage type (except Sequential File stage*) do you need to read all columns from source.

* Even in Sequential File stage there is a column property "drop on import" available. But you still have to read every byte in the file to get to the next.

wahi80 · Post by **wahi80** » Fri Jun 05, 2009 3:52 pm

Run Time propogation is not enabled. By data integrity I meant if data could get corrupted

ray.wurlod · Post by **ray.wurlod** » Sun Jun 07, 2009 2:31 pm

Nothing will happen to data unless YOU program DataStage to make those changes.

ajay.vaidyanathan · Post by **ajay.vaidyanathan** » Mon Jun 08, 2009 12:10 am

Hi,

As Nagarjuna rightly said, unless you make any changes (Metadata Change) to your dataset in the first job, you can always go ahead and use it for the specified number of columns you want in the next successive jobs.

sjfearnside · Post by **sjfearnside** » Mon Jun 08, 2009 6:17 am

If the source data you are reading has mandatory columns, i.e. must have a valid value, and you drop one or more of those columns, you will have a potential data integrity problem.