Random Shuffling of input stream data in DS MVS jobs

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
mouni
Charter Member
Charter Member
Posts: 49
Joined: Tue Jul 11, 2006 11:30 pm

Random Shuffling of input stream data in DS MVS jobs

Post by mouni »

Hi there,

I am trying to explore the following features in Datastage MVS Edition (7.5x2) for Mainframes:

1) A routine to generate Random number. In Server Jobs or Parallel Jobs, we have random routines which generate random numbers, but I am unable to find the same in MVS jobs ( or did I not check it properly? ). One option is to code this in COBOL. Since I am not a Pro in COBOL, it will take sometime for me to learn and code it.
Before doing this can anybody tell me whether there are any extra plug-ins for DS that I can install which has these routines?

2) I want to shuffle the input stream in some random order using Datastage MVS Jobs.
Ex:
Input:

CustId,Name
0001,Steve
0002,Bush
0003,Tony
0004,Stalin

I will shuffle CustId and Name separately.

If I run the job once the output would look like:

CustId,Name
0003,Bush
0001,Stalin
0004,Steve
0002,Tony

If I run the job for the second time the output would be entirely different from the 1st run:

CustId,Name
0002,Stalin
0004,Bush
0003,Steve
0001,Tony

i.e., the output would be randomly shuffled. I am unable to do this in Datastage, and hence we thought of doing this in COBOL ( Even COBOL seems to be having some serious limitations ). Is there any way this could be done in DS MVS?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

In PX you could build your own random/pseudo-random method to do this in various ways. You could write a buildop to get a real c++ random function with all the advantages that has (repeatability, real pseudo-random distribution, etc.) or you could do something simple like take the current time milliseconds portion, divide that into some numeric column in your data or by the previous milliseconds result and use a modulo on that result to "randomly" distribute your output links.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

But this is Mainframe edition. Surely there's a RND function in a standard COBOL library somewhere!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

The COBOL function is called RANDOM. Mainframe jobs have limited transformation functionality. There is no provided builtin function or combination of functions that I can think of to do this.

I think your only mainframe job option is likely to be to code a COBOL subroutine and invoke it with the External Routine stage.

You might want to weigh the option of using a server or parallel job for this requirement.

No need to fear COBOL. Go to IBM's website and search for the latest COBOL Language Reference. You could also enlist the help of a really "old" (i.e. experienced :wink: ) co-worker.

Oh. You don't necessarily have to code the subroutine in COBOL. You can code it in any language that can be called by a COBOL program (C for instance).

Mike
mouni
Charter Member
Charter Member
Posts: 49
Joined: Tue Jul 11, 2006 11:30 pm

Post by mouni »

Thanks Guys. We have a MainFrame expert in our team now who is helping in coding COBOL routines.

To shuffle the data we have now implemented a method similar Knuth Shuffle . It seems pretty ok for small volume data. But we still need to check the performance for larger volume and see how it goes on.
Post Reply