Problem in Implementing the logic using buildop

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
DEVESHASTHANA
Participant
Posts: 47
Joined: Thu Sep 16, 2004 5:26 am
Location: India

Problem in Implementing the logic using buildop

Post by DEVESHASTHANA »

hi

I am facing the problem in implementing this logic:
We are building a buildop and our requirment is :

We want to change the table structure at runtime in the Input of Interface and my output interface schema is fixed ,can it be possible in buildop stage ,or is there any other way to solve this problem,
I want to change the input schema at run time,

PROB:

input i have various schema files:
file1columnsare: A B C D E F
file2columnsare: A B D F C
file3columnsare: A B C D F
.....
.....


output i have fixed schema file:


columns are : A B K F

here K= C+E ,if c is notavailable then only E and viceversa, and if both are available then C+E

Please help in solving this problem,

Regards,

Devesh Asthana
cyh
Participant
Posts: 18
Joined: Tue Jan 20, 2004 3:23 am

Post by cyh »

In our project, we will force the guy to restructure the file before passing to the BuildOp (or Transformer).

For example :
file1columnsare: A B C D E F -> no change
file2columnsare: A B D F C -> A B C D *E F
file3columnsare: A B C D F -> A B C D *E F

As you know, the order does not matter to the actual processing. And a column generator can help to add a NULL field (e.g. E).

Therefore, there will be only 1 single format for the input file of your BuildOp.

HTH.
DEVESHASTHANA
Participant
Posts: 47
Joined: Thu Sep 16, 2004 5:26 am
Location: India

Post by DEVESHASTHANA »

I think ,it will not solve my problem as i need to add columns for some of the retailers(files),moreover i cannot change the columns as this is some retailer's file,we will be using file as a source ,

columns are : A B K F

here K= C+E ,if c is notavailable then only E and viceversa, and if both are available then C+E


can it be possible ,if yes please share your experience and knowledge,

Regards,

Devesh
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

I think what cyh is recommending is to force your different input files to look the same. You may not be able to have the retailers reformat the file, but once you recieve them, your DataStage program can reformat them any way you want.

Create one job for each type of file. You can call the same buildop from each of them as long as the input stream has the same schema.

import -> "restructure" -> buildop -> output

The "restructure" stage (or stages) is where you take your input schema and reformat it to look the same as what the buildop needs. Make sure all fields are accounted for (rename with Modify, or add field(s) with the Column Generator). Then call the buildop - it will be the same for each job.

Brad.
DEVESHASTHANA
Participant
Posts: 47
Joined: Thu Sep 16, 2004 5:26 am
Location: India

Post by DEVESHASTHANA »

Thanks for your input ,

but it is not possible for me to write a job for all the retailers as there can be 800+ retailers ,so what i want to achieve with this job is
1:parameterising the input with retailer's schema file and
2:mapping them with the output fixstructure layout in the buildop stage( is there any way to parameterise the "Interface> Input" in buildop stage,so that we can give different retailers schema file at runtime)


PROBLEM:

In input i have various schema files:
file1columnsare: A B C D E F
file2columnsare: A B D F C
file3columnsare: A B C D F
.....
.....
....




In output i have fixed schema file:


columns are : A B K F

here K= C+E ,if c is notavailable then only E and viceversa, and if both are available then C+E


This is the problem description , :cry:





regards,

Devesh
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

DEVESHASTHANA wrote: is there any way to parameterise the "Interface> Input" in buildop stage,so that we can give different retailers schema file at runtime)
Alas, no, which is going to make your task rather difficult.

You will need to design an approach the uses a standard schema, but which allows for columns to be missing (null?). And you will need to handle transition from individual retailers' record layouts to your standard layout. Perhaps a Switch stage on retailer type (there must be SOME overlaps!) into different Copy stages for different types.

Without seeing/knowing your full requirement it is difficult to provide focussed suggestions, but if you think along these lines I think there is a chance that you will solve your problem, indeed without needing recourse to writing a BuildOp.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DEVESHASTHANA
Participant
Posts: 47
Joined: Thu Sep 16, 2004 5:26 am
Location: India

Post by DEVESHASTHANA »

Ray,

I will be more than happy :lol: if this can be done without buildopstage,

as for now my problem is to generalize the job for all the retailers(parameterising the job),if there are other approaches to solve this problem do suggest ,

Again I am explain my requirements :

We have to design a Datastage utility(i am mentioning utility as we don't want to make job specific to each retailers) which will work for all the Retailers(800+),here there are no Business logic or tranformation required,We have Retailers input file with some columns and in output our layout is standard fixed no. of columns,

So what we want to achieve through this utility is that it takes retailer name as parameter and fetch its file and map it to the output columns

Eg.
In input i have various files:
file1columnsare: A, B, C, D ,E ,F
file2columnsare: A,B, D, F, C
file3columnsare: A, B, C, D, F
file4columnsare:A,B,C,D,F
.....
.....
....




In output i have fixed schema file:
outputcolumns are : A B K F
here K= C+E ,if E is notavailable then only C, and if both are available then C+E ( this is the only logic that needs to be applied there if required for particular retailer)

This is the exact description of our requirement,:cry:


Is there any way out through which we can do tranformation in the Schema file which is fetch by sequential file without mention the columns in the Sequential file in the px job?



Regards,

Devesh
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Without having given it a lot of (unpaid) thought, I would tend to plan along the following lines. There is a small number of different input file layouts. I would design a separate job for each of these then, in a job sequence, make the decision about which of these ought to be used based upon the particular retailer.

It's not that much additional development work, since most of it is copy and paste.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DEVESHASTHANA
Participant
Posts: 47
Joined: Thu Sep 16, 2004 5:26 am
Location: India

Post by DEVESHASTHANA »

Thanks for the inputs everyone(Ray,bcarlson,cyh)
Ray,
i dont think that it will work for us ,as we have more than hundred different layout in input,


Is there any way out through which we can do tranformations in the Schema file which is fetch by sequential file without mentioning the columns in the Sequential file(columns definition) in the px job?
It means can i see the columns of schema file and use it for transformation in the Px job ?

I know we can do one to one data transfer using schema file to output file ,but i want to do pass only N no. of columns to out put and more than N no. of columns are coming from source files , Is the mapping of columns is possible without mentioning the columns name(in column definition) in the Sequential file which is fetching the schema file and datafile with the same structure as schema file? :cry:



Regards,
Devesh
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

Okay, here's kind of a wierd and propably radical idea.

1. If possible, create a list of fields that could be missing. Sum the lengths of all of them. For the sake of conversation, let 's say there is a total of 100 bytes.
2. Add a 150 byte field to EVERY input, call the field 'MISSING_FIELDS' (use the column generator and set it to spaces, NULL, whatever). The extra 50 bytes would allow for expansion later. Use the datatype of 'UNKNOWN'
3. You mentioned getting a schema file from the retailers for their files. Create a program (DataStage, Unix script, C program, whatever) that determine what fields could be missing from the input file. The default schema would just be the 150 byte FILLER.
4. Dynamically build a schema file (just a text file), with a max 'record' length 150 bytes (see step2) that incorporates those missing fields (with proper datatypes, lengths, nullity, etc) and a final FILLER field.
5. Use the Column Import stage (from the Restructure group), and set the Column Method option to 'Schema File'. This can be parameterized, so your job can pass the name of the schema file you created in step4.
6. After the Column Import, your input stream should have all necessary fields. Use a Modify stage to KEEP all required fields for the buildop.

I hope this makes sense. It makes sense in my mind, but not sure how well I am communicating it in writing. It is kind of thinking outside of the box, but then again, when has programming ever really been straight forward?

HTH,

Brad.
DEVESHASTHANA
Participant
Posts: 47
Joined: Thu Sep 16, 2004 5:26 am
Location: India

Post by DEVESHASTHANA »

Thanks everyone,

My problem is solved by using Transformation operator in Generic stage,

Regards,

Devesh :)
Post Reply