We are planning to design a generic component in datastage with the following specifications.
It has to read the metadata of a file at run time.
It will contain different keys for different file definitions.
Then it has to perform the aggregations at runtime based on the file definitions.
The field to which the aggregation has to be applied is also dynamic.
We have planned to design this component first by creating dynamic schema files which should be read in a custom stage.
Then the custom stage has to concatenate all the key fields and create one generic key which will be used for aggregation.
Can somebody throw some light on the following points assuming that we receive a CSV input file?)
1) How to generate dynamic schema files based on the different input files we receive.
I have gone through parallel developer guide to create a schema file which can be done when we know the format. But we cannot create it at run time by seeing the input file metadata.
2) How can we concatenate only key fields if their position and number is different for each input file?
For example in first case input file may contain 5 columns say A, B, C, D, E out of which C and D can be the key columns.
In second case input file may contain 10 columns M,N,O,P,Q,R,S,T,U,V Out of which M,P,S and U can be key columns.
Dynamic aggregation.
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 147
- Joined: Sat Apr 30, 2005 1:23 am
- Location: Bangalore,India
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 147
- Joined: Sat Apr 30, 2005 1:23 am
- Location: Bangalore,India
Generally this component has to used for the clients we process the data.
Suppose if we have 5 clients today we will have 5 different file formats and the file formats will increase if the number of clients increases.
The input files can be CSV files with the first row as column names. Can we read this first row and create a schema file for the metadata of the custom stage?
Regarding the key columns we will create a reference file/table to specify the key column names and the custom stage has to read the key columns from the reference and concatenate the fields with that name from the input file.
Can you please let me know how feasible is this solution?
Suppose if we have 5 clients today we will have 5 different file formats and the file formats will increase if the number of clients increases.
The input files can be CSV files with the first row as column names. Can we read this first row and create a schema file for the metadata of the custom stage?
Regarding the key columns we will create a reference file/table to specify the key column names and the custom stage has to read the key columns from the reference and concatenate the fields with that name from the input file.
Can you please let me know how feasible is this solution?
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Should be do-able. Your custom stage will need to refer to the reference file/table, of course. And you will need to come up with a convention for naming the schema file and getting this into the job(s). Main problem is that, if you need to do any transformation, you must make reference to a specific column/field name.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 147
- Joined: Sat Apr 30, 2005 1:23 am
- Location: Bangalore,India