Hi ,
I have been observing some weird behaviors, can someone explain these for me.
1> I have 2 jobs a and b. In the job a 2 datasets are created partitioned on key x and sorted on key x and key y. (The relation between key x and Y is one to many and between Y and X is one to many. e.g branch and account.).
In the job b when I join the 2 created datasets using key x,y keeping the partitions as same. The join is not going proper count.
When I do a explicit hash in the join stage on key x for the both the links, count is coming properly.
2> In a job , (I have source-->sort(a,b)--> removed dup(key a,b)-->target.)
There is sort stage on key (a,b) followed by remove duplicates on key (a,b). On execution I was getting the warning,"downstream operator does not fulfill the requirement".
As a solution, I deleted the link between the sort and remove duplicated and replaced it with a new on. The warning was removed.
3> In a another job, I have more than 2 inputs to the join stage, the count of records is getting garbled.
When I replace the above single join stage with multiple join stage with only 2 inputs, the count is coming properly.
Datastage strange behavirs
Moderators: chulett, rschirm, roy