understanding SQL*Loader ctl file in OE stage...

prabu · Post by **prabu** » Wed Oct 19, 2005 1:28 pm

Hi DS folks,
can some kind soul point out to me the fundamentals in which the control file is generated. is there any way i can control the defitions for some of the columns.

scenario: i am loading the data in Load-append mode in a Oracle Enterprise Stage (ver 7.5)

scenario 1:
the idea is i want the column defintion in the control file to be "my_sequence.nextval" for a column instead of the auto generation

anything helps

regards,
Prabu

prabu · Post by **prabu** » Wed Oct 19, 2005 1:41 pm

i can uderstand that i can use Write Method as Upsert and edit the generated SQL. from the definition of upsert

Insert if update fails
OR
Update if insert fails

, whereas i want to do only insert

i can say Upsert Order "Insert then update" and change the generated update statement to update.... where 1!=1, which interanlly stops any updation . Also i need to change the insert statement into seq.nextval type..

any idea...

pls thnrow some light ..

ArndW · Post by **ArndW** » Wed Oct 19, 2005 1:50 pm

Prabu,

If I want to force inserts and not use the bulk load process, I will choose the options 'update then insert' or 'insert then update' and afterwards choose 'custom sql' for both; then I'll cut-and-paste one to the other, so that both actually do the update or insert.

But if you only have inserts to do then I would recommend using the faster option!

prabu · Post by **prabu** » Wed Oct 19, 2005 2:13 pm

ArndW wrote:Prabu,

If I want to force inserts and not use the bulk load process, I will choose the options 'update then insert' or 'insert then update' and afterwards choose 'custom sql' for both; then I'll cut-and-paste one to the other, so that both actually do the update or insert.

But if you only have inserts to do then I would recommend using the faster option!

Arnd , i think giving an insert statement for a Update SQL statement is not logically correct.

i guess there is some some option missing in Write Method (please see 2. below) and that's why we choose 3. below, instead and make a work around

1.Load
2.Insert/Update
3.Upsert
4.t Delete Row

pls let me know if my understyanding is correct

prabu · Post by **prabu** » Wed Oct 19, 2005 8:26 pm

Ray, are you listening

kcbland · Post by **kcbland** » Wed Oct 19, 2005 8:46 pm

If your whole issue is that you want to sequential assign a column the new available unique number, why don't you manage the process yourself inside the tool?

There are many useful techniques for doing this. All rely on "seeding" the jobstream with the starting value from which to assign. Then, use one of the parallel friendly techniques for assigning unique values. The method for seeding is to either do a select max on the key key prior to running the job and passing that in as a parameter, or at the conclusion of processing storing the max value used in a control table that is used at the beginning of the process via a select. The only issue is that gaps will occur in the sequencing as each partition independently is assign a unique set of numbers.

If you persist in using a database sequence, doesn't that introduce a choke point in your loading paradigm, as direct partition loading would require a semaphore to manage doling out numbers to the parallel threads? Wouldn't you prefer to stay with an optimized loading method of allow all partitions to simultaneously and without delay load the data using the fast insert only load method (DIRECT path)?

ray.wurlod · Post by **ray.wurlod** » Wed Oct 19, 2005 11:46 pm

prabu wrote:Ray, are you listening

Not while I'm working. While I'm working I give my client's tasks my full attention. I only visit DSXchange during breaks, or before or after work.

prabu · Post by **prabu** » Thu Oct 20, 2005 10:42 am

ray.wurlod wrote:
prabu wrote:Ray, are you listening
Not while I'm working. While I'm working I give my client's tasks my full attention. I only visit DSXchange during breaks, or before or after work.

Hey Ray, take it easy. i have not meant this forum like a chatting window.

i was hoping to get your views, thats all

prabu · Post by **prabu** » Thu Oct 20, 2005 10:55 am

kcbland wrote: If you persist in using a database sequence, doesn't that introduce a choke point in your loading paradigm, as direct partition loading would require a semaphore to manage doling out numbers to the parallel threads? Wouldn't you prefer to stay with an optimized loading method of allow all partitions to simultaneously and without delay load the data using the fast insert only load method (DIRECT path)?

let me try to understand this. if i understand correctly , the parallel threads are responsible to run the sqlloader control files using the data file generated on the fly.

do you mean to suggest that sequence.nextVal() of oralce will serailize the parallel DS threads

example: Thread t2 will wait for t1's nextVal() operation is completed.
i presume nextVal() is serialized[/b]

kcbland · Post by **kcbland** » Thu Oct 20, 2005 11:19 am

Consider this thread:
http://asktom.oracle.com/pls/ask/f?p=49 ... 5961912937

GO about 2/3 of the way down to get the discussions about bulk loading 100's thousands of rows simultaneously. Transactional inserts with sequences isn't a big deal, I agree with Tom 100%. (Buy Kytes Expert-One-On-One, it's an excellent book).

I guess my point is that you have to look at the volumes you're loading, and since parallel streams of data will be accessing partitions simultaneously, you're using a sequence for a unique key. If you have a unique index, it therefore must be GLOBAL, which is anathema to the things you're supposed to be doing. That index update will cause more issues than the use of sequences.

You'll just have to experiment. You can always generate a file per table partition and manually invoke sqlldr for each naming the partition instead of the table. You can see if there's a lower DIRECT path performance. You can mitigate it with playing with the sequence cache size.

See this post, it's covered there:
http://asktom.oracle.com/pls/ask/f?p=49 ... 5886242221

Have fun.

kcbland · Post by **kcbland** » Thu Oct 20, 2005 11:21 am

Forgot to answer directly. Yes, I was trying to say that your partitioned loading sessinos will have to cooperate with the sequence, and that there is an amount of serializing that could take place. Tom Kyte seems to not worry so much about it, but I couldn't find where the topic of parallel partition loading with a sequence was specifically addressed.

DSXchange