C++ routine Vs Aggregator stage
Moderators: chulett, rschirm, roy
C++ routine Vs Aggregator stage
Hi,
I had a requirement to count the rows in the input file and write the final count to the output. For this the solution I thought of is to use a C++ routine which counts the number of lines and return the total. This is called from the datastage. This is working fine and quite fast.
My Customer is proposing to use a aggregate stage to do this, which I feel is a bit heavy just to count the input rows.( if input file is too huge )
I wanted to know the pros and cons of C++ routine Vs Aggregator stage. Please give your views.
I had a requirement to count the rows in the input file and write the final count to the output. For this the solution I thought of is to use a C++ routine which counts the number of lines and return the total. This is called from the datastage. This is working fine and quite fast.
My Customer is proposing to use a aggregate stage to do this, which I feel is a bit heavy just to count the input rows.( if input file is too huge )
I wanted to know the pros and cons of C++ routine Vs Aggregator stage. Please give your views.
In that case you will using 7.5x2 version of datastage.
Below is one of the many alternatives in addition to using routine and aggregator stage.
In the sequential file generate "Row Number Column"
Add a tail stage next to sequential file stage and run it sequentially.
Read 1st record using tail stage.
Read Value from the "Row Number Column" to get number of rows in the file.
Below is one of the many alternatives in addition to using routine and aggregator stage.
In the sequential file generate "Row Number Column"
Add a tail stage next to sequential file stage and run it sequentially.
Read 1st record using tail stage.
Read Value from the "Row Number Column" to get number of rows in the file.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You might be surprised how slick the Aggregator is for counting. You don't use the Column for Calculation method, you simply use Count. It's a very fast transit through the code.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
I think this depends on what you want to do with it.
If you have a complex process and you want to know if the records at the end are the same number as in the beginning I would not use aggregator. The moment you read a file in datastage it might reject records that don't have the correct format , in this case I would go for wc -l before starting.
If you have a complex process and you want to know if the records at the end are the same number as in the beginning I would not use aggregator. The moment you read a file in datastage it might reject records that don't have the correct format , in this case I would go for wc -l before starting.
unix code on windows
If you have to deploy on windows, your server's DataStage install is going to include the MKS toolkit (the parallel stuff requires it). I bet that common unix utilities, like "wc" will run just fine in either environment.mouni wrote:Our coding is on AIX server, but the code will be later ported to Windows. So, We want to make sure that we do not use anything Unix specific, that may not work on Windows.
In that scenario, I would worry more about hardcoding path references into your jobs or helper scripts than
Speaking of deploying on Windows: do you have other C++ custom code in your system? If not, you might be able to escape from getting a Windows C++ compiler (of course, if you already have one, then it doesn't matter).
John G.
Thanks guys for the help.
Telenet - Our customer is hesitant to use wc -l on Windows even though it works fine with the MKS Toolkit installed. So we were looking out for alternate method. Also they wanted us to do this using vanilla flavors of Datastage.
jgreve - We have a C++ compiler installed which is compatible with Datastage, and we have several complex routines coded in C++ used by Datastage.
We now concluded on using the tail stage. Every record coming into the tail stage will have @INROWCOUNT and @OUTROWCOUNT. The tail would give us the final count of the input and output records. This seems to be working fine with a single partition which solves our problem.
Telenet - Our customer is hesitant to use wc -l on Windows even though it works fine with the MKS Toolkit installed. So we were looking out for alternate method. Also they wanted us to do this using vanilla flavors of Datastage.
jgreve - We have a C++ compiler installed which is compatible with Datastage, and we have several complex routines coded in C++ used by Datastage.
We now concluded on using the tail stage. Every record coming into the tail stage will have @INROWCOUNT and @OUTROWCOUNT. The tail would give us the final count of the input and output records. This seems to be working fine with a single partition which solves our problem.