Page 1 of 1

Java stage - Multiple records processing

Posted: Fri Jan 16, 2009 1:43 pm
by ag_ram
Hi
Can I give multiple (I mean all source records) records to java transformer stage as input and get the multiple records as output?

I think, java transformer processes as record by record. Does this words hold good?
Thanks...

Posted: Fri Jan 16, 2009 3:46 pm
by ray.wurlod
Why do you think that?

Pipeline parallelism is automatically implemented in parallel jobs through the mechanisms of virtual Data Sets (on each link) and transport buffers (used primarily for repartitioning).

Of course, internally, every stage processes record by record, but they are consuming and producing buffers of records (by default up to 3MB worth).

Posted: Sat Jan 17, 2009 4:38 pm
by eostic
Many releases ago I remember trying this, to simulate an aggregator....you should be able to do a readrow() and then return output status ready (after saving your row somewhere) and then when invoked again, do another readrow(), and continue as desired until you decide to do a writerow() to the output link.

Ernie

Posted: Sun Jan 18, 2009 12:39 am
by ag_ram
Hi
Thank you for all the posts. It is quite informative for me as I am new to java stage.
I have one more query regarding the performance of the stage. Can it handle 200-250 million records. And also we are going to use a reference link for this java transformer stage. Will it buckle at this volume?
If not, can I have approx time will be taken for processing this volume?
Note: Each record size is 62 bytes.

I will greatly appreciate any information regarding this stage which is helpful for not to go with this stage.

Thanks..

Posted: Tue Jan 20, 2009 10:59 am
by ag_ram
Java transformer stage is capable to accept multple reference links as input. The PDF document doesnot talk about processing the records from reference link.
- Can anyone help me out about the API supportive to process reference data?
- I can see java transformer stage is capable of parallel processing and also partitioning. Can I implement the lookup/join stage behaviour in java transformer stage?

thanks..

Posted: Tue Jan 20, 2009 11:43 am
by eostic
JavaPack has been mostly used in "extension" scenarios, when a bit of logic needs to be reused to access a remote proprietary source or target, or a business function in an EJB or other java implementation, or something that DS can't otherwise do (write BLOBs, for example). It has been little used in a performance scenario, because that wouldn't be the right approach. No one expects it to perform, nor handle large volumes -- such things are better done in C with a custom plugin or build-op. Can it? I suppose, but using JavaPack for such things would not be my first recommendation. Use it when you need "quick implementation" of "extended functionality".

Ernie

Posted: Tue Jan 20, 2009 12:20 pm
by lstsaur
No, Java Transformer Stage doesn't support multiple reference links. You can use Java Client Stage as a Lookup Stage to perform lookup functions for a bulit-in Transformer stage and it's connected to the built-in Transformer stage using a referece link.

Posted: Tue Jan 20, 2009 12:52 pm
by ag_ram
lstsaur wrote:No, Java Transformer Stage doesn't support multiple reference links. You can use Java Client Stage as a Lookup Stage to perform lookup functions for a bulit-in Transformer stage and it's connected to the built-in Transformer stage using a referece link.
Hi
But I have tried to give multiple reference link to transformer stage. It is accepting. Not sure wheather it can process or not !!

I accept java client stage is capable to accept reference links. How can i read rows from main source and reference rows differently? There is one method readRow() in java API doc and which doesnot accept any parameter such as link name. Can you suggest? Basically can we implement the lookup behaviour in java client stage? If yes How about partitioning will be handled?

thanks..

Posted: Tue Jan 20, 2009 4:07 pm
by eostic
lstsaur is correct. You cannot have a reference "input" link. Both JavaPack stage types can have only one input link. As lstsaur noted, it is possible for JavaClient to have an output reference link, which would lead to another Stage that is performing the lookup....but that's a different thing --- the Java stage is not performing the lookup activity, merely being the "object" of the Lookup.

What are you trying to accomplish?

Ernie

Posted: Wed Jan 21, 2009 12:05 pm
by ag_ram
Thank You very much for all the information. I am keeping this thread open for couple of days. I may require some more general information on java stage.
Once again thank you.