No rows process through job

tonystark622 · Post by **tonystark622** » Thu Jun 12, 2003 8:02 am

We ran into a problem when running a job, no rows go through or are processed. It appears to happen when using certain database tables as lookups. When it's running, you can look at the job monitor and no rows are read, or are processed by the lookup link. If I enable all the tracing options on the job when I run it, I get two rows written to the job log on every refresh, indicating some database activity (I can't remember the message that's logged. It's NOT an error or warning).

We made a simple job (from scratch) to test the problem and it fails too. The job reads from a flat file, does a lookup against a table in the database (doesn't matter if we use an Oracle OCI stage or an ODBC stage, the results are the same), and writes to a flat file. There are NO constraints or routines of any kind used in the job. The data is simply copied from input to output with a "join" on the lookup table in the transformer.

We have deleted these table definitions from the repository and "reimported" them, but it didn't help. Curiously enough, if I step through the simple test job in the debugger in Designer, it appears to work.

Does anyone have any suggestions?

Thanks for your help,
tony

chulett · Post by **chulett** » Thu Jun 12, 2003 8:52 am

Double-check your lookup keys. If I recall correctly, only *successful* lookups show in the monitor so if they all fail it would look like nothing was processed... but that, in and of itself, shouldn't prevent the rows from moving thru the job. Unless a constraint is involved and you've already said one isn't.

If you are saying that the lookups do not fail and the job processes rows when run in the debugger, but *not* outside of it, then I'd report that as a bug to Ascential support.

-craig

tonystark622 · Post by **tonystark622** » Thu Jun 12, 2003 10:27 am

Craig,
I believe that you're correct, that only successful look up rows are shown in the monitor. However, it doesn't show any rows read from the input flat file or written to the output flat file. I'll have to go look and see if any rows were actually written to the output file. As dumb as it sounds, I didn't look at it to see if anything had been written. Just didn't see any rows going through the monitor and didn't think anything was happening. Duh on me

Thanks for your reply.
Tony

inter5566 · Post by **inter5566** » Thu Jun 12, 2003 10:40 am

Tony,

How about trimming your key fields? If you press the view data button, do both sets of keys look to be the same format?

Steve

tonystark622 · Post by **tonystark622** » Thu Jun 12, 2003 10:49 am

That might be the answer to speeding it up, Steve. I just looked at the job a minute ago. It has been runnning for almost 24 hours and has processed 12,xxx rows. I think the problem is that the lookups are just taking a loooooooonnnnnngggggg time.

So, it actually is processing rows, just very, very slowly.

Sorry to bother you folks with this. I just need to go figure out what's holding things up.

Thanks again,
Tony

Creo · Post by **Creo** » Thu Jun 12, 2003 10:57 am

Hi Tony,

You might want to check if an index can be added on the actual table in the db that you use to do your lookup. The index must contain all the fields (and in the same order) you need to read in that table. I know it has resolved performance issues I had in the past so you might want to talk to your dba about it.

Hope it helps.

Creo

tonystark622 · Post by **tonystark622** » Thu Jun 12, 2003 11:36 am

Thanks, Creo. That was my exact recommendation to my co-worker that brought this problem to my attention. Glad to know I'm not completely off-base.

Best regards,
Tony

mhester · Post by **mhester** » Thu Jun 12, 2003 3:36 pm

Also you may consider building a Hash table to use as a lookup especially if the lookup table is large. The Hash will offer much faster response.

tonystark622 · Post by **tonystark622** » Thu Jun 12, 2003 5:02 pm

I think that's what he ended up doing.

He started the job yesterday,with the database lookup in it and just let it run (accidently). It took 23 hours and 40 minutes to process 12,xxx rows. In the meantime, I think he re-wrote the job the way that you mentioned.

Tony

ray.wurlod · Post by **ray.wurlod** » Thu Jun 12, 2003 6:06 pm

Are you doing the lookups over a network? [:0]
If so, load the lookup table into a local table (or, even better, hashed file) before processing, and stand by to be amazed at the gains.

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518

tonystark622 · Post by **tonystark622** » Thu Jun 12, 2003 6:20 pm

Yes, we were doing a lookup across the network (to the Oracle server). I think the guy that wrote that job did end up loading that table into a hash file and doing the lookup that way.

It was pretty weird that we couldn't see any data going through the job, though.

Tony

ray.wurlod · Post by **ray.wurlod** » Fri Jun 13, 2003 9:13 pm

Not at all. If you can get the database server to reduce the number of rows to be processed by DataStage, then so much the better. And if you can reduce the total network traffic then that's a bonus too.
Loading a local table or hashed file is decidedly recommended when the source of the reference keys is not the same database as the source of the stream data.

tonystark622 · Post by **tonystark622** » Wed Jun 18, 2003 2:43 pm

Just wanted to send an update along...

I ran into this problem in another job. I was doing a lookup against an Oracle table. Because I needed to return multiple rows from my lookup link for every input row, I couldn't just use a hash file.

I did end up copying the Oracle table into a hash file, and doing the lookup from the hash file through a Universe stage. This seemed approach was faster than using an Oracle lookup, but still only returned 3 or 4 rows every 5 seconds. I put a BUILD.INDEX statement in the before stage routine, so that it would build the index for the column I needed (Before Stage Routine was "ExecTCL BUILD.INDEX hg_hash LSACONXB"). This fixed it... It's now running > 100 rows/second through that lookup link.

I would like to thank everyone for their help. Ray, what you mentioned was just the hint I needed to help me solve this problem.

Thanks everyone!
Tony

tonystark622 · Post by **tonystark622** » Fri Jun 20, 2003 8:14 am

Just wanted to ask a question about another, ummm, "feature" that I found using this technique. When I have multiple columns that I need to join on in my hashfile/Universe stage, it runs slowly. It is slow even if I have indexes on each of the columns in the join.

To fix this, I added a column, when I built the hash/Universe table, that contains all the fields that I needed to join on, concatenated together (and separated by a tilde ('~')). Then I built an index on this concatenated key field and joined to it (again concatenating the fields from the source file together, separated by a tilde). This worked wonderfully.

Why didn't this work when I had separate indexes on all the join columns? In other words, when I had separate indexes on each of the fields that I needed to join on, why was the lookup still very slow?

Thanks for your help,
Tony

chulett · Post by **chulett** » Fri Jun 20, 2003 10:54 am

As you've found, you need a composite index to do a lookup on a composite key speedily. Having *individual* indexes on all three fields doesn't help - having one index that encompasses all three fields does, just like in the relational world. You did it in a artificial manner by creating a composite field and then indexing that.

Sorry I don't have the syntax off the top of my head, but you can build composite indexes thru TCL or the Administrator... or your way works fine (and used to be the only way to do it in the "old days").

-craig