Fluctuating Performance from Enterprise ODBC stage

StuM · Post by **StuM** » Thu May 24, 2007 7:47 am

Hi All,

I am suffering performance issues with the EE ODBC stage that I hope you might be able to help me out with. I have had a search, but couldn't find the solution.

Basically, we have several (Parallel) jobs that run sequentially, each of them loading up either sequential files or datasets to MS-SQL tables via the EE ODBC stage. The files are anything up to 50k rows, and the 'widest' one is 413 columns

Now, I have been running tests using cut down files of 500 rows, and have seen some odd degradation of the performance. Taking a single job as an example, if I drop the table, then run a job to create the table and populate it, I get 500 rows/second. If I then amend the job and run as Upsert, the performance falls away dramatically - three runs saw it fall to 167, then 19 and finally 25 rows per second.

Can anyone think of what this might be? Some of the jobs seem to now be running consistently at 250 r/s (although functionality is identical), though earlier runs (prior to dropping the tables) got down to single figures in terms of rows/second.

Thanks in advance for any help that you might be able to offer.

Kind regards,
Stu

ray.wurlod · Post by **ray.wurlod** » Thu May 24, 2007 3:21 pm

UPDATE is always much slower than INSERT. Split the stream into insert only and update only.

StuM · Post by **StuM** » Wed May 30, 2007 10:12 am

Thanks Ray, I will give that a go. Strangely enough, our retrieval from the tables is now displaying similar levels of poor performance.

Cheers,
Stu

DSguru2B · Post by **DSguru2B** » Wed May 30, 2007 10:35 am

The poor select performance might be due to 'back pressure' from your target database stage. If you replace your target with a dataset, you will see that the select will increase exponentially.

StuM · Post by **StuM** » Thu May 31, 2007 7:26 am

Thanks DSguru2B - I will replace the ODBC stage with a write to a DataSet and see how we go.

Cheers,
StuM

[edited to correct who it was addressed to!)

StuM · Post by **StuM** » Thu May 31, 2007 9:27 am

Hi again,

I think I misunderstood your reply DSguru2B.

Basically, I have a requirement that at any one time the table will hold the latest data for a number of rows. Each day when the suite is run, I am updating the table where the row exists, inserting new rows and leaving well alone rows which have not been updated. At the same time, I update another table with the key of the row that came in 'today'. Later on in the process, we select rows from the main table where the key matches the key on the 'changed today' table and this subset goes forward for processing. The requirement for this is that the 'changes today' table is updated in 6 seperate jobs. Sorry if this is unclear - pictures are so much easier than words!

I was trying to think if there was a way round using tables. Sadly, as datasets are not key driven there is not an update option available. Now, if I was using a Server job, I think that I am right to say that a Hashed file could achieve the same results... I considered using Lookup Filesets, but the Lookup stage does not allow you to use two lookup sets as input.

Any suggestions appreciated.

Thanks
StuM

DSguru2B · Post by **DSguru2B** » Thu May 31, 2007 9:50 am

StuM wrote:Thanks Ray, I will give that a go. Strangely enough, our retrieval from the tables is now displaying similar levels of poor performance.

Cheers,
Stu

My reply was to your problem stated above. If the select on a table is horribly slow, its more than likely due to back pressure cause by your target database. If your select job is different than your load job, then my debuggin technique of replacing the target with a dataset wont apply. We need to look elsewhere.

StuM · Post by **StuM** » Fri Jun 01, 2007 2:26 am

Thanks, that is a useful technique to bear in mind for future. In this case I do indeed have a seperate job to extract from the tables, so can eliminate the idea of 'back pressure'

Back to the drawing board, I need more coffee...

Cheers,
StuM

DSguru2B · Post by **DSguru2B** » Fri Jun 01, 2007 7:25 am

Well, your record size seems to be large and hence this might be the optimal performance that your might ever get. It also depends upon your network traffic.
Same goes for inserts and updates as well. What database are you accessing with the ODBC stage?

StuM · Post by **StuM** » Fri Jun 01, 2007 10:32 am

It is an SQL Server database. I spoke to our DBA about it and they suggested that the tables were not efficiently designed.

The odd thing is that it was running fine, but recently performance has gone bad - the use of the server recently has increased 8-fold, this must have had a detrimental impact.

I might think about using hashed files and server jobs if possible, although this will involve major reworking

Cheers,
Stu