Fluctuating Performance from Enterprise ODBC stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
StuM
Participant
Posts: 17
Joined: Thu Mar 23, 2006 10:45 am
Location: London

Fluctuating Performance from Enterprise ODBC stage

Post by StuM »

Hi All,

I am suffering performance issues with the EE ODBC stage that I hope you might be able to help me out with. I have had a search, but couldn't find the solution.

Basically, we have several (Parallel) jobs that run sequentially, each of them loading up either sequential files or datasets to MS-SQL tables via the EE ODBC stage. The files are anything up to 50k rows, and the 'widest' one is 413 columns :shock:

Now, I have been running tests using cut down files of 500 rows, and have seen some odd degradation of the performance. Taking a single job as an example, if I drop the table, then run a job to create the table and populate it, I get 500 rows/second. If I then amend the job and run as Upsert, the performance falls away dramatically - three runs saw it fall to 167, then 19 and finally 25 rows per second.

Can anyone think of what this might be? Some of the jobs seem to now be running consistently at 250 r/s (although functionality is identical), though earlier runs (prior to dropping the tables) got down to single figures in terms of rows/second.

Thanks in advance for any help that you might be able to offer.

Kind regards,
Stu
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

UPDATE is always much slower than INSERT. Split the stream into insert only and update only.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
StuM
Participant
Posts: 17
Joined: Thu Mar 23, 2006 10:45 am
Location: London

Post by StuM »

Thanks Ray, I will give that a go. Strangely enough, our retrieval from the tables is now displaying similar levels of poor performance.

Cheers,
Stu
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

The poor select performance might be due to 'back pressure' from your target database stage. If you replace your target with a dataset, you will see that the select will increase exponentially.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
StuM
Participant
Posts: 17
Joined: Thu Mar 23, 2006 10:45 am
Location: London

Post by StuM »

Thanks DSguru2B - I will replace the ODBC stage with a write to a DataSet and see how we go.

Cheers,
StuM

[edited to correct who it was addressed to!)
Last edited by StuM on Fri Jun 01, 2007 10:28 am, edited 1 time in total.
StuM
Participant
Posts: 17
Joined: Thu Mar 23, 2006 10:45 am
Location: London

Post by StuM »

Hi again,

I think I misunderstood your reply DSguru2B.

Basically, I have a requirement that at any one time the table will hold the latest data for a number of rows. Each day when the suite is run, I am updating the table where the row exists, inserting new rows and leaving well alone rows which have not been updated. At the same time, I update another table with the key of the row that came in 'today'. Later on in the process, we select rows from the main table where the key matches the key on the 'changed today' table and this subset goes forward for processing. The requirement for this is that the 'changes today' table is updated in 6 seperate jobs. Sorry if this is unclear - pictures are so much easier than words! :-)

I was trying to think if there was a way round using tables. Sadly, as datasets are not key driven there is not an update option available. Now, if I was using a Server job, I think that I am right to say that a Hashed file could achieve the same results... I considered using Lookup Filesets, but the Lookup stage does not allow you to use two lookup sets as input.

Any suggestions appreciated.

Thanks
StuM
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

StuM wrote:Thanks Ray, I will give that a go. Strangely enough, our retrieval from the tables is now displaying similar levels of poor performance.

Cheers,
Stu
My reply was to your problem stated above. If the select on a table is horribly slow, its more than likely due to back pressure cause by your target database. If your select job is different than your load job, then my debuggin technique of replacing the target with a dataset wont apply. We need to look elsewhere.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
StuM
Participant
Posts: 17
Joined: Thu Mar 23, 2006 10:45 am
Location: London

Post by StuM »

Thanks, that is a useful technique to bear in mind for future. In this case I do indeed have a seperate job to extract from the tables, so can eliminate the idea of 'back pressure'

Back to the drawing board, I need more coffee... :lol:

Cheers,
StuM
Last edited by StuM on Fri Jun 01, 2007 10:27 am, edited 1 time in total.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Well, your record size seems to be large and hence this might be the optimal performance that your might ever get. It also depends upon your network traffic.
Same goes for inserts and updates as well. What database are you accessing with the ODBC stage?
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
StuM
Participant
Posts: 17
Joined: Thu Mar 23, 2006 10:45 am
Location: London

Post by StuM »

It is an SQL Server database. I spoke to our DBA about it and they suggested that the tables were not efficiently designed.

The odd thing is that it was running fine, but recently performance has gone bad - the use of the server recently has increased 8-fold, this must have had a detrimental impact.

I might think about using hashed files and server jobs if possible, although this will involve major reworking :(

Cheers,
Stu
Post Reply