One to many column mapping

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Harini
Premium Member
Premium Member
Posts: 80
Joined: Tue Mar 16, 2010 1:32 am

One to many column mapping

Post by Harini »

Hi all,

I am trying to map one column from source to two columns in the output and most of the stages doesn't seem to allow including column generator.Right now, i have two sequential files which is getting funneled together. I am trying to equate the metadata before funneling.Both the files require different columns, which are actually a copy of existing columns. I don't want to add a transformer just do a 'one-to-many' mapping.

Any other simple way to do this?

Thanks.
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Re: One to many column mapping

Post by BI-RMA »

Harini wrote:I don't want to add a transformer just do a 'one-to-many' mapping.
Why not? Do You fear dramatic performance issues? I would consider that as not likely.

Have to admit I do not fully understand what You are actually about to do from Your description.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
suse_dk
Participant
Posts: 93
Joined: Thu Aug 11, 2011 6:18 am
Location: Denmark

Post by suse_dk »

Well, you can use the modify stage for creating a duplicate column.
But my question is the same as Ronalds... Why not use the tranformer stage?
_________________
- Susanne
Harini
Premium Member
Premium Member
Posts: 80
Joined: Tue Mar 16, 2010 1:32 am

Re: One to many column mapping

Post by Harini »

hmm, yeah, though not really now.. if the load is heavy, and if the number of source file increases, i would have to put that many number of transformers ..

Okay .. My design is like this ..

Code: Select all

SEQ -> "Some stage to equate metadata" -> 
                                               Funnel -> output
SEQ -> "Some stage to equate metadata"  ->
* Some stage to equate metadata .. here i want to add some columns to both of the seq files output so that it gets funneled properly .. these columns are mostly the copy of existing columns.. I can use a column import for this.. but if i have three copy columns, then i have to add three column import which i want to avoid.

"Well, you can use the modify stage for creating a duplicate column."

Okay, please let me know the specification for it.


Thanks.
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

So what you are saying is that you have differently formatted input files, and you are wishing to funnel them together?

And of course to do this the number of columns for the files has to be the same.

Do your filenames have standardised column names? Or can your inputs?

I suggest looking at file schemas. Have a file schema that contains the entire list of columns that you require. When reading in a file, after you have read it in use a merge stage to merge your input file with an dev null empty feed from your complete schema. Set the Unmatched masters to keep (I think).

This should result with an output of all your starting columns plus any extra columns required to get to your total required columns.

I have glossed over quite a few details, but I hope that I have given enough (and hopefully correct) to get you started.
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

If I understand You correctly You have got a number of input files with almost similar structure that You want to send through a Funnel. FileA has got a column 'abc' that contains practically the same information with the same column-metadata as column 'def' from FileB. Now You want to duplicate both columns 'abc' and 'def' to get identical column-lists on both input-streams of the transformer. Is that right?

Did You think about renaming one of the columns to get the column-names identical?

If You absolutely have to fill a target with duplicated information You could still do that with a single transformer behind the Funnel.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
Harini
Premium Member
Premium Member
Posts: 80
Joined: Tue Mar 16, 2010 1:32 am

Post by Harini »

No, it is not renaming. It is that a column in file A, is not present in file B and vice versa. For eg, i have 'start_date' and 'valid_date' in file A and 'start date' in file B. 'valid_date' is nothing but the 'start_date' itself... So, before funneling file A and B, I need to create 'valid_date' in transformer and map 'start_date' to it. Is there a way i can do it without transformer?
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

You can use a Column-Generator-Stage and set the property "Percent-null" to 100. This will give you an added row with all Nulls. You will still have to use a transformer behind the funnel to replace the Null-Values with values from other columns. That way You have only got one Transformer to apply your business-logic, but your transformer has to apply Null-handling to your data, which is significantly more complex than just distributing the content of one input-column to multiple output-columns.

You will have to find out which solution will give You better performance by trying all options.

I do not know any other option than the transformer to propagate the contents of an input-column to multiple output-columns within the same stream directly (You could, of course, use Copy to distribute one input-column to multiple output-streams with different column-names, but you would have to re-join your output-streams before funneling them - hardly more performant than a simple transformer).
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

I still think that what you want can be achieved using RCP and schemas. If you just read your files in and pass a schema over it, you can name the columns whatever you like. Datastage will take the name and metadata from the schema. As long as you have a schema for each file structure you can have the one job that can read any sequential file you like.
vishal_rastogi
Participant
Posts: 47
Joined: Thu Dec 09, 2010 4:37 am

Post by vishal_rastogi »

you can read the entire data as one field with varchar and then funnel it and later use a transformer to segregate the data
Vish
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

ShaneMuir wrote:I still think that what you want can be achieved using RCP and schemas. If you just read your files in and pass a schema over it, you can name the columns whatever you like. Datastage will take the name and metadata from the schema. As long as you have a schema for each file structure you can have the one job that can read any sequential file you like.
What difference does it make wether You define a file-structure in a schema-file or in specific column-definitions within the job-design? It will only allow You to run the same job with different column-definitions without changing the design - if so needed. But that is not the case here.

It will not allow You to add a column to a stream that is not present in the source file - you can't tell DataStage to read the same part of the stream-data twice and distribute it to multiple output columns by just modifying an OSH-schema.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

vishal_rastogi wrote:you can read the entire data as one field with varchar and then funnel it and later use a transformer to segregate the data
You would need to apply different substring-constructs depending on the source-file's structure to achieve this. For each new file You add to the job, You will have to modify the logic in the transformer. I would consider that as (almost) impossible to maintain.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

BI-RMA wrote:What difference does it make wether You define a file-structure in a schema-file or in specific column-definitions within the job-design? It will only allow You to run the same job with different column-definitions without changing the design - if so needed. But that is not the case here.

It will not allow You to add a column to a stream that is not present in the source file - you can't tell DataStage to read the same part of the stream-data twice and distribute it to multiple output columns by just modifying an OSH-schema.
But you can leverage it to make a common schema further down the flow into which missing columns can be added. This allows for more flexibility when receiving the file which is one of the things the OP seemed to be wanting.
Harini wrote:* Some stage to equate metadata .. here i want to add some columns to both of the seq files output so that it gets funneled properly .. these columns are mostly the copy of existing columns.. I can use a column import for this.. but if i have three copy columns, then i have to add three column import which i want to avoid.
I don't think the OP can get around using a transformer at some point.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Going back to this question:
Harini wrote:No, it is not renaming. It is that a column in file A, is not present in file B and vice versa. For eg, i have 'start_date' and 'valid_date' in file A and 'start date' in file B. 'valid_date' is nothing but the 'start_date' itself... So, before funneling file A and B, I need to create 'valid_date' in transformer and map 'start_date' to it. Is there a way i can do it without transformer?
As has been mentioned before, the modify stage can do this and is well documented. One possible specification: "valid_date=start_date;start_date=start_date"

To just simply create columns, the column generator is appropriate and can be provided a schema.

If all you need to do, even as you add new files in the future, is to simply create new columns, or create copies of existing columns (such as valid_date above), create a Shared Container comprised of a SeqFile, Column Generator and Modify stages and parameterize all three, providing schemas to SeqFile and Column Generator, and specifications to Modify. The output of the shared container will be your "common" schema, and will feed the funnel stage. Add as many shared containers as you have input files.

If you're going to need to do some data validation/manipulation/etc, you'll probably need transformers or other custom code. You could include a
transformer after each shared container as needed or after the funnel.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

jwiles wrote:As has been mentioned before, the modify stage can do this and is well documented. One possible specification: "valid_date=start_date;start_date=start_date"
Have to admit I did not use it for this purpose so far, but it is intriguingly simple. I read the mention by suse_dk but did not actually go back to the manual. Good point in this context.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
Post Reply