Address Shuffle

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kennyapril
Participant
Posts: 248
Joined: Fri Jul 30, 2010 9:04 am

Address Shuffle

Post by kennyapril »

I have a source file with 1M records which has addresses in it.

Please provide me an idea to shuffle the addresses with in the file and also the addresses should be from the same state.


I sorted the state_cd field and generated a key column to identify the state change.

Can any one help me out the next step or any other idea?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What do you mean by "shuffle"? Sort? Use a Sort stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Shuffle = Randomize, in a sense. As in re-arrange whom has which address within a given state.
Last edited by chulett on Thu Dec 06, 2012 2:34 pm, edited 1 time in total.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I think we need to wait for the OP's answer on this one.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
elsont
Participant
Posts: 16
Joined: Wed Oct 08, 2008 1:20 am
Location: Chicago

Re: Address Shuffle

Post by elsont »

I will try to explain using one example
suppose your records is like below one
"Name Address State"
Now you want to shuffle Name and Address with the state

Ans: Split the record into two streams

1: Name + State
2: Address + State

Now add new column "Order" for both streams and use use Random function to get the value (I haven't used the random function in DataStage. It should not give same sequence.. otherwise we have to find another way to so that it give different sequece each time). Then partition only using "State" and sort using "State, Order". This should give you different order in both the streams. Now add another column "Key" to both streams and assign values 0, 1, 2 etc for each State (or simply assigning @INROWNUM also should work).
Now you can join Both the streams on "State and Key" columns and output will be shuffled.
kennyapril
Participant
Posts: 248
Joined: Fri Jul 30, 2010 9:04 am

Post by kennyapril »

Thanks very much!

I will try the same scenario and let you know
Regards,
Kenny
kennyapril
Participant
Posts: 248
Joined: Fri Jul 30, 2010 9:04 am

Post by kennyapril »

Just to be clear my requirement is

Before: 1)John, 123 rew dr,chicago, IL
2)Anthony, 456 qwe dr, springfield, IL
3)Ronny, 789 hjg dr, queens, NY
4)Joseph, 345 kli dr, nyc, NY

After: 1)John, 456 qwe dr, springfield,IL
2)Anthony, 123 rew dr,chicago,IL
3)Ronny, 345 kli dr,nyc, NY
4)Joseph, 789 hjg dr, queens, NY

Thank you!
Regards,
Kenny
kennyapril
Participant
Posts: 248
Joined: Fri Jul 30, 2010 9:04 am

Post by kennyapril »

In the below scenario I was trying to partition only state , can you please suggest how to partition only one field and after that I used sort stage to sort state code and order.

Thank you!
Regards,
Kenny
suryadev
Premium Member
Premium Member
Posts: 211
Joined: Sun Jul 11, 2010 7:39 pm

Post by suryadev »

In transformer pass the flow in parallel and select Hash partition for that field and dont select the sort and then next step do sort for others

That should do it
Thanks,
Surya
Post Reply