Dynamic Metadata

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dsdev750
Charter Member
Charter Member
Posts: 16
Joined: Sat Jun 04, 2005 10:19 am

Dynamic Metadata

Post by dsdev750 »

Hi,

I have a requirement to load a file containing dynamic metadata. For one, the column names are included in the same file which has the data. Thus, the seq file (info.20060710 ) looks like :

FILE_START
FILE_NAME=info.20060710
REC_TYPE=NEW_EMPLOYEE

FIELDS_START
ID
NAME
DOB
CITY
#Comment

#
MARITAL_STATUS
FIELDS_END

DATA_START
1|John Her|19710101|PENN| |
REC_COUNT=1
DATA_END


FILE_END

The problem is that tomorrow the file info.20060711 could have the column CITY missing. Next day the MARITAL_STATUS could be missing. This could happen for any field except the ID and the NAME fields.

I understand that it is difficult to handle dynamic metadata in datastage. Is it possible in datastage to handle such a file? Is it advisable to do such jobs in datastage?


Thanks.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Datastage is metadata driven, as so i have heard time and again. So it would be advisable to do this kind of manipulation outside datastage to make the file ready for datastage to pickup.
Like maybe a shell script that detects what columns are missing and then accordingly adds it. This way you can create a job that has a static metadata, and even if the file changes, the script can force it to be static.
Might not be the best way, but one of the ways.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The quote from Chuck Reid in DSGuru2B's post is entirely apposite here.

Can you create a specification of how you might determine what is present and what is missing? If so, you might be able to determine techniques from that.

How would it affect your source-to-target data mapping if this were the case? What business rules apply? Would you need to notify end users that their LOCATION dimension is missing CITY? How do you think they would react?

What you really need to do is to beat up on the data providers so that data are never missing - only whole rows are ever missing. DataStage can deal extremely efficiently with those! :lol:
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
anujgarg
Participant
Posts: 38
Joined: Sun Jun 26, 2005 11:17 pm

Post by anujgarg »

You can use RunTime Column propogation method for this.
But you first need to seperate metadata and records then from metadata create record schema file and in the dataset use this record schema file.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Not sure how RCP would help here; the problem here is that the column itself is missing from the data (if I read the question correctly). RCP is for when you don't want to physically include the column definition in the metadata for a link.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
thompsonp
Premium Member
Premium Member
Posts: 205
Joined: Tue Mar 01, 2005 8:41 am

Post by thompsonp »

It is possible to deal with this type of problem in DataStage but it is preferable to have fixed metadata.

I'm not sure about the format of your file. Is there a header row with the file details, then a row stating what the columns are, then data rows?

We have written jobs that read in a file and based on the list of supplied columns they map the data rows to a fixed format output. This fixed format file is then used in subsequent jobs. We hold the mapping of possible columns to the fixed format in the database and perform the mapping in a build-op.

Obviously the columns that may appear in the file one day and not the next would have to be nullable in the fixed format, or at least you would need some rules about what value to default them to.
dsdev750
Charter Member
Charter Member
Posts: 16
Joined: Sat Jun 04, 2005 10:19 am

Post by dsdev750 »

Thanks everybody for responding.
I guess the final answer seems to be to somehow find out the missing columns amd update the file with default values for those columns. Once the file has been updated, Design the DS job in such a way that it reads the data section with the complete static metadata. Right?

Thanks again.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Static metadata certainly makes the DataStage stuff much easier to do.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Or read the file and write a Schema file for its structure. User RPC Enabled job. Use the Schema file which changes everyday as the soruce.

Or have a metadata as ID,Name and rest all varchar(225). Have a transformer, detect the missing column and distribute the columns accordingly.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Post Reply