Hi there
I am loading a massive volume of records into a table. Source is a dataset and the target is SQL Server 2008, using ODBC Connector, using UPSERT (Insert then Update).
When I did the test in the sample table I was managed to get the load up to 23K rows / sec whereas in the real table starts with 10K, then down to 2200 rows / sec when the volume crossed 100 million!! I trust it could be due to the partitions, index.
But my concern is in Designer screen I saw a number (rows / sec) vs in the Director Monitor showing different number.
If the Rows / sec is 2200 in such case I am expecting the row counts should change in the screen which is not the case for a while?
How accurate are the rows /sec counts are in DataStage?
How can I find the real load counts?
Any idea?
How accurate the rows /sec counts are in Datastage?
Moderators: chulett, rschirm, roy
How accurate the rows /sec counts are in Datastage?
Thanks
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Sampling is at 10 second intervals, unless you've changed this via Options. This is pretty coarse sampling, particularly for small volumes. Further, rows/sec can include wait time for the generating (extraction) query to start delivering rows, which can be substantial (for example if the query includes an ORDER BY clause). Various other overheads are included in the denominator of this calculation too. I tend not to regard rows/sec as a particularly useful metric except when comparing apples with apples (usually different runs of the same job with the same data).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
The RPS can be misleading, especially on long-running front-end queries, since it starts timing from the second the query is sent. But overall, taking that into account it is moderately accurate.
DataStage keeps reducing the "report" rate as the number of rows grows, which is why you see it start reporting in 100k's or millions instead of exact row counts. However, the end number reported on a completed job (on the designer as well as the director log) is accurate.
There are numerous reasons a job would run slower with millions of rows instead of a small test set. You can usually use the RPS to isolate where the slowdown is occurring (note the drop in RPS after a stage) to get an idea where to start.
DataStage keeps reducing the "report" rate as the number of rows grows, which is why you see it start reporting in 100k's or millions instead of exact row counts. However, the end number reported on a completed job (on the designer as well as the director log) is accurate.
There are numerous reasons a job would run slower with millions of rows instead of a small test set. You can usually use the RPS to isolate where the slowdown is occurring (note the drop in RPS after a stage) to get an idea where to start.
It's important to note (in my mind) that it is not the RPS at any particular point in time but rather is the average RPS at that time. And as noted, since it includes wait time, that's why you can see during periods with 'no activity' the only thing changing is the RPS. Going down.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Rows per second is moderately accurate - it depends on various factors such as Source and Target/ Transformations
Thanks
Rao Kuntamukkala
www.networkzindia.com
Thanks
Rao Kuntamukkala
www.networkzindia.com
Thanks
Ravi
Ravi