Java stage - Multiple records processing

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ag_ram
Premium Member
Premium Member
Posts: 524
Joined: Wed Feb 28, 2007 3:51 am

Java stage - Multiple records processing

Post by ag_ram »

Hi
Can I give multiple (I mean all source records) records to java transformer stage as input and get the multiple records as output?

I think, java transformer processes as record by record. Does this words hold good?
Thanks...
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Why do you think that?

Pipeline parallelism is automatically implemented in parallel jobs through the mechanisms of virtual Data Sets (on each link) and transport buffers (used primarily for repartitioning).

Of course, internally, every stage processes record by record, but they are consuming and producing buffers of records (by default up to 3MB worth).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Many releases ago I remember trying this, to simulate an aggregator....you should be able to do a readrow() and then return output status ready (after saving your row somewhere) and then when invoked again, do another readrow(), and continue as desired until you decide to do a writerow() to the output link.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
ag_ram
Premium Member
Premium Member
Posts: 524
Joined: Wed Feb 28, 2007 3:51 am

Post by ag_ram »

Hi
Thank you for all the posts. It is quite informative for me as I am new to java stage.
I have one more query regarding the performance of the stage. Can it handle 200-250 million records. And also we are going to use a reference link for this java transformer stage. Will it buckle at this volume?
If not, can I have approx time will be taken for processing this volume?
Note: Each record size is 62 bytes.

I will greatly appreciate any information regarding this stage which is helpful for not to go with this stage.

Thanks..
ag_ram
Premium Member
Premium Member
Posts: 524
Joined: Wed Feb 28, 2007 3:51 am

Post by ag_ram »

Java transformer stage is capable to accept multple reference links as input. The PDF document doesnot talk about processing the records from reference link.
- Can anyone help me out about the API supportive to process reference data?
- I can see java transformer stage is capable of parallel processing and also partitioning. Can I implement the lookup/join stage behaviour in java transformer stage?

thanks..
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

JavaPack has been mostly used in "extension" scenarios, when a bit of logic needs to be reused to access a remote proprietary source or target, or a business function in an EJB or other java implementation, or something that DS can't otherwise do (write BLOBs, for example). It has been little used in a performance scenario, because that wouldn't be the right approach. No one expects it to perform, nor handle large volumes -- such things are better done in C with a custom plugin or build-op. Can it? I suppose, but using JavaPack for such things would not be my first recommendation. Use it when you need "quick implementation" of "extended functionality".

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
lstsaur
Participant
Posts: 1139
Joined: Thu Oct 21, 2004 9:59 pm

Post by lstsaur »

No, Java Transformer Stage doesn't support multiple reference links. You can use Java Client Stage as a Lookup Stage to perform lookup functions for a bulit-in Transformer stage and it's connected to the built-in Transformer stage using a referece link.
ag_ram
Premium Member
Premium Member
Posts: 524
Joined: Wed Feb 28, 2007 3:51 am

Post by ag_ram »

lstsaur wrote:No, Java Transformer Stage doesn't support multiple reference links. You can use Java Client Stage as a Lookup Stage to perform lookup functions for a bulit-in Transformer stage and it's connected to the built-in Transformer stage using a referece link.
Hi
But I have tried to give multiple reference link to transformer stage. It is accepting. Not sure wheather it can process or not !!

I accept java client stage is capable to accept reference links. How can i read rows from main source and reference rows differently? There is one method readRow() in java API doc and which doesnot accept any parameter such as link name. Can you suggest? Basically can we implement the lookup behaviour in java client stage? If yes How about partitioning will be handled?

thanks..
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

lstsaur is correct. You cannot have a reference "input" link. Both JavaPack stage types can have only one input link. As lstsaur noted, it is possible for JavaClient to have an output reference link, which would lead to another Stage that is performing the lookup....but that's a different thing --- the Java stage is not performing the lookup activity, merely being the "object" of the Lookup.

What are you trying to accomplish?

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
ag_ram
Premium Member
Premium Member
Posts: 524
Joined: Wed Feb 28, 2007 3:51 am

Post by ag_ram »

Thank You very much for all the information. I am keeping this thread open for couple of days. I may require some more general information on java stage.
Once again thank you.
Post Reply