Simple BuildOp

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
SettValleyConsulting
Premium Member
Premium Member
Posts: 72
Joined: Thu Sep 04, 2003 5:01 am
Location: UK & Europe

Simple BuildOp

Post by SettValleyConsulting »

I am trying to write a simple buildop and find myself disappearing in ever decreasing circles ...

Basically I have to apply some business rules to an input row to determine its Classification and write this value to a CLAS column. Every other column should be propagated straight through untransformed.

I have no problem with coding the logic in C++, but I cannot get a combination of reading, writing and transferring that works. At present my CLAS column gets populated corectly but every other field is Null in the output record.

As I understand it, the Buildop reads a record into an input buffer, transfers it to an output buffer and writes it, and these operations can be done automatically using the auto-read, write and transfer properties or explicitly in code using the readRecord(), doTransfer() and writeRecord() macros. I think I've tried every combination of these and get the same result with each - columns not explicitly set in the output record in code, end up as Null.

Another area where the Developer's guide is not very clear is how you should set the link metadata. You can assign a table definition from the repository to your input and output links, (the guide says you don't need to do this but is silent on what you would do otherwise.).

Here is what I have found:- as my input and output have identical metadata I first tried using the same table definition for each. When I tested the Buildop I got our old friend the ' When checking operator: Dropping component "GCDU_CUST_ID" because of prior component with same name' error message for every column.

This implies that you should name your output columns differently to your inputs, eg FIELDA to FIELDA_OUT, but then as there is no mapping page on a BuildOp, how does PX 'know' that FIELDA should be transferred to FIELDA_OUT?

I could workaround in code by saying

outrec.FIELDA_OUT = inrec.FIELDA;
outrec.FIELDB_OUT = inrec.FIELDB;
outrec.FIELDC_OUT = inrec.FIELDC;

etc, for all the fields mapped straight through but this seems unduly labour-intensive and will be a pain as there are over a hundred columns in the row, and surely this is what doTransfer() should be doing for me?

Could Runtime Column Propagation help here?

If I may cheekily ask for some free consultancy, has anyone done something similar and got it to work, and could post some code? I will willingly write the results up and post them here for future reference as the Developer's Guide seems to have significant gaps on this topic.

Thanks

Phil Clarke.
SettValleyConsulting
Premium Member
Premium Member
Posts: 72
Joined: Thu Sep 04, 2003 5:01 am
Location: UK & Europe

Post by SettValleyConsulting »

Nothing like writing a problem down for enabling you to see the bleedin' obvious solution staring you in the face.

The answer is indeed to use Runtime Column Propagation, and I will write up the steps I went through to get it working when the time pressure eases off again.

Thanks, DSXchange - sometimes you help just by being there ...
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

I posted a sample buildop in an unrelated topic, "convert datetime to timestamp".

viewtopic.php?t=93884

It has all the components listed, including a datafile to test it with. That may provide what you need. Let me know if you have any issues getting it to run.

We haven't used Runtime Column Propagation very often, at least not with buildops. If we have, then the only fields we define on the input are the fields that we need to explicitly handle. For example, we have some files that have hundreds of fields. For the most part, all we have to change are 4 or 5 fields - so why list all of them? We list just the fields we need in the input schema and set RCP to true for the input. The the output schema only lists the new output field(s) and sets RCP to true for the output. The logic jsut has the transformations needed for the explicitly listed fields. All other fields are automatically transfered.

In general, we don't use RCP for the buildops. We explicitly list the full input and output schemas, and all output fields must be accounted for in the logic. Any output field not listed will get set to NULL - and fail it if is a non-nullable field.

Hope this helps. We have found buildops to be enormously helpful - they are very fast, flexible, and they allow us to keep ETL and business rules centralized instead of spread out over multiple stages and/or jobs.

Brad.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Good on you Brad, I missed that thread first time around and it is very good to see real examples of buildop code. Anyone else has buildops examples out there please post. :D
Post Reply