DSXchange

pdntsap

Yes Sandeep. Your understanding of my job design is right. 1. Data Should be sorted on the key columns that were joined and should be applied with hash partitioning. I added the sort stages after the input of the sequential file stages. The first join produces the same number of record count as befo...

pdntsap

1) The OS line is meant to be the OS your DataStage instance is running in Ths OS is AIX 5.3 2) My job design: A sequential file input stage that feeds into a transformer to filter based on some conditons and then feeds into the first join stage. A second sequential file stage serves as the right li...

pdntsap

I am designing parallel jobs to convert the business logic in SAS to DataStage to get an improvement in the total run time. In one of the jobs, we have two flat files as input each having about 40,000 records. A join stage (inner join) is used and the join stage creates about 1,120,000 records. The ...

pdntsap

Thanks Sandeep. But, one of the key columns can be nullable according to the business logic and so I cannot replace null with a value. The aggregator drops some records and I then use a filter stage and then a join stage to get the non-key columns back. After the join stage, I get the same record co...

pdntsap

Yes DSGuru. The complete row is lost.

And like Craig and Sandeepgs suggest, nulls might be the reason. The grouping is done on 11 keys in the aggregator stage and one of the keys is nullable. So does this mean, that the record that has a null key column is being dropped?

Thanks.

pdntsap

Thanks DSGuru. I followed Shrikant's suggestion from the link below http://www.dsxchange.com/viewtopic.php?t=106508 I am able to get the correct record count of the unique records after the filter stage. But the aggregator stage drops some of the non-unique records. Any suggestions on why the aggega...

pdntsap

Hello, I have a requirement where I need to retain unique records (records that are present only once in the source) and discard all records that appear two or more times at the source. If a record appears more than once, I need to discard all the records and not just remove dupliates of that partic...

pdntsap

An update: The $APT_DISABLE_COMBINATION set to False, I get the following info when I run the job. APT_CombinedOperatorController(1),0: Numeric string expected for stage variable 'StageVar0_StageVar2'. Use default value. So, I change $APT_DISABLE_COMBINATION to true and when I run the job, I get the...

pdntsap

Hello, The parallel job has a sequential file input and a lookup stage that lookups into two files and outputs data to a sequential file. The following error happens occasionally. The job might abort the first time but if the job is recompiled and run again it runs fine and the process can be repeat...

pdntsap

Roland,

The join stage uses inner join and I need the key columns from both inputs for further processing downstream. Thanks for your suggestion of copy stage after the join stage. I believe copy stage might lead to better performance (in terms of processing time) than a transformer stage.

pdntsap

That was quick, Ray. My inputs are flat (sequential) files and not tables as previously mentioned. Sorry about that. I used a Copy stage after the Sequential file stage to create an additional column that will be a copy of the key column but it looks like the same input column cannot be mapped to mo...

pdntsap

Hello,

I have a join stage joining data from Table A and Table B based on join key Column A. Is there a way by which Column A from both Table A and Table B can be retained at the output of the Join stage?

Thanks.

pdntsap

There are about 13 million records in one of the files and about 30,000 records in the other file and the order of the links does not seem to matter as I get the same error even after changing the link ordering. So, I guess I have a lot of duplicates in either file. I split the 13 million record fil...

pdntsap

Please see below for the results from running the ulimit command inside the job Command: ulimit -S My output: unlimited Command: ulimit -H My ouptput: unlimited Comand: ulimt -a time(seconds) unlimited file(blocks) unlimited data(kbytes) 786432 stack(kbytes) 4194304 memory(kbytes) 32768 coredump(blo...

pdntsap

Hello, The parallel job has a join stage that is joining millions of records from two data sets and the job aborts with the following error: Join_8,2: The current soft limit on the data segment (heap) size (805306368) is less than the hard limit (2147483647), consider increasing the heap size limit ...

DSXchange

Search found 107 matches

Join stage takes a long time

Need to retain unique records

Job aborts due to error in writeBlock - could not write

Join stage: Need to retain the column key from both tables

Job aborts due to heap size allocation problem