Dynamic Metadata
Moderators: chulett, rschirm, roy
Dynamic Metadata
Hi,
I have a requirement to load a file containing dynamic metadata. For one, the column names are included in the same file which has the data. Thus, the seq file (info.20060710 ) looks like :
FILE_START
FILE_NAME=info.20060710
REC_TYPE=NEW_EMPLOYEE
FIELDS_START
ID
NAME
DOB
CITY
#Comment
#
MARITAL_STATUS
FIELDS_END
DATA_START
1|John Her|19710101|PENN| |
REC_COUNT=1
DATA_END
FILE_END
The problem is that tomorrow the file info.20060711 could have the column CITY missing. Next day the MARITAL_STATUS could be missing. This could happen for any field except the ID and the NAME fields.
I understand that it is difficult to handle dynamic metadata in datastage. Is it possible in datastage to handle such a file? Is it advisable to do such jobs in datastage?
Thanks.
I have a requirement to load a file containing dynamic metadata. For one, the column names are included in the same file which has the data. Thus, the seq file (info.20060710 ) looks like :
FILE_START
FILE_NAME=info.20060710
REC_TYPE=NEW_EMPLOYEE
FIELDS_START
ID
NAME
DOB
CITY
#Comment
#
MARITAL_STATUS
FIELDS_END
DATA_START
1|John Her|19710101|PENN| |
REC_COUNT=1
DATA_END
FILE_END
The problem is that tomorrow the file info.20060711 could have the column CITY missing. Next day the MARITAL_STATUS could be missing. This could happen for any field except the ID and the NAME fields.
I understand that it is difficult to handle dynamic metadata in datastage. Is it possible in datastage to handle such a file? Is it advisable to do such jobs in datastage?
Thanks.
Datastage is metadata driven, as so i have heard time and again. So it would be advisable to do this kind of manipulation outside datastage to make the file ready for datastage to pickup.
Like maybe a shell script that detects what columns are missing and then accordingly adds it. This way you can create a job that has a static metadata, and even if the file changes, the script can force it to be static.
Might not be the best way, but one of the ways.
Like maybe a shell script that detects what columns are missing and then accordingly adds it. This way you can create a job that has a static metadata, and even if the file changes, the script can force it to be static.
Might not be the best way, but one of the ways.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The quote from Chuck Reid in DSGuru2B's post is entirely apposite here.
Can you create a specification of how you might determine what is present and what is missing? If so, you might be able to determine techniques from that.
How would it affect your source-to-target data mapping if this were the case? What business rules apply? Would you need to notify end users that their LOCATION dimension is missing CITY? How do you think they would react?
What you really need to do is to beat up on the data providers so that data are never missing - only whole rows are ever missing. DataStage can deal extremely efficiently with those!![Laughing :lol:](./images/smilies/icon_lol.gif)
Can you create a specification of how you might determine what is present and what is missing? If so, you might be able to determine techniques from that.
How would it affect your source-to-target data mapping if this were the case? What business rules apply? Would you need to notify end users that their LOCATION dimension is missing CITY? How do you think they would react?
What you really need to do is to beat up on the data providers so that data are never missing - only whole rows are ever missing. DataStage can deal extremely efficiently with those!
![Laughing :lol:](./images/smilies/icon_lol.gif)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Not sure how RCP would help here; the problem here is that the column itself is missing from the data (if I read the question correctly). RCP is for when you don't want to physically include the column definition in the metadata for a link.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
It is possible to deal with this type of problem in DataStage but it is preferable to have fixed metadata.
I'm not sure about the format of your file. Is there a header row with the file details, then a row stating what the columns are, then data rows?
We have written jobs that read in a file and based on the list of supplied columns they map the data rows to a fixed format output. This fixed format file is then used in subsequent jobs. We hold the mapping of possible columns to the fixed format in the database and perform the mapping in a build-op.
Obviously the columns that may appear in the file one day and not the next would have to be nullable in the fixed format, or at least you would need some rules about what value to default them to.
I'm not sure about the format of your file. Is there a header row with the file details, then a row stating what the columns are, then data rows?
We have written jobs that read in a file and based on the list of supplied columns they map the data rows to a fixed format output. This fixed format file is then used in subsequent jobs. We hold the mapping of possible columns to the fixed format in the database and perform the mapping in a build-op.
Obviously the columns that may appear in the file one day and not the next would have to be nullable in the fixed format, or at least you would need some rules about what value to default them to.
Thanks everybody for responding.
I guess the final answer seems to be to somehow find out the missing columns amd update the file with default values for those columns. Once the file has been updated, Design the DS job in such a way that it reads the data section with the complete static metadata. Right?
Thanks again.
I guess the final answer seems to be to somehow find out the missing columns amd update the file with default values for those columns. Once the file has been updated, Design the DS job in such a way that it reads the data section with the complete static metadata. Right?
Thanks again.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Or read the file and write a Schema file for its structure. User RPC Enabled job. Use the Schema file which changes everyday as the soruce.
Or have a metadata as ID,Name and rest all varchar(225). Have a transformer, detect the missing column and distribute the columns accordingly.
Or have a metadata as ID,Name and rest all varchar(225). Have a transformer, detect the missing column and distribute the columns accordingly.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'