Surrogate Key stage not working

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
vamsi.4a6
Participant
Posts: 334
Joined: Sun Jan 22, 2012 7:06 am
Contact:

Surrogate Key stage not working

Post by vamsi.4a6 »

I Updated state file value with 100.Then i am using below JOb

sequentialfile---->surrogateKeystage-->sequentialfileoutput

o/p I am getting
col1,col2,col3
"5","e","101"
"1","a","1101"
"6","f","2101"
"4","d","2102"
"2","b","3101"
"3","c","3102"


Excepted Output
col1,col2,col3
1,a,101
2,b,102
3,c,103
4,d,104
5,e,105
6,f,106

Col3 is my surrgoate key column
Poovalingam
Participant
Posts: 111
Joined: Mon Nov 30, 2009 7:21 am
Location: Bangalore

Post by Poovalingam »

I think your question is why the generated keys are not in order..? I think you are using 4 node apt file and so data stage created surrogate key in 4 different sequence.

If you need surrogatey key in sequence then you may need to execute in sequential mode. I'm not much worked with Surrogate key stage. It's better you wait for any other expert to provide other comments.

Cheers,
Poova.
jerome_rajan
Premium Member
Premium Member
Posts: 376
Joined: Sat Jan 07, 2012 12:25 pm
Location: Piscataway

Post by jerome_rajan »

That's not the issue here. Col3 is the surrogate key column. The problem here looks two-fold.

1. The SK is generated with a different pattern than just a one-up.
2. The data looks all jumbled up
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn

Life is really simple, but we insist on making it complicated.
Poovalingam
Participant
Posts: 111
Joined: Mon Nov 30, 2009 7:21 am
Location: Bangalore

Post by Poovalingam »

For both the question parallelism is the cause. As I told executing in sequential mode will resolve your problem. But we will lose the parallelism. What is your problem if we don't have surrogate key in same pattern? As per my understanding it's just a key and it holds no value and so it can be in different pattern. Data stage will ensure us same key will not be generated in further runs.

Cheers,
Poova.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Is the expected output what is actually required? Or to ask in another way: Is the actual output incorrect for the business rules being implemented?

Poova's analysis is correct...the output looks like it does because: 1) the SKG stage is running in parallel and 2) the block size w/in SKG is probably set to 1000. Within a partition, SKG is assigning keys from a block of numbers in order: p0--101, 102; p1--1001, 1002; p2--2001, 2002; p3--3001, 3002.

Why "jumbled up"? SeqFile writes rows out in the order they arrive at the stage. When running in parallel, you're not guaranteed which partition will deliver it's data first, so therefore output order is not guaranteed to match input order unless you specifically write the job to guarantee it, either by running it sequentially (as suggested) or sorting and collecting the rows so the output order matches the input order. This job does none of that, based upon the description given.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
Post Reply