understanding SQL*Loader ctl file in OE stage...

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
prabu
Participant
Posts: 146
Joined: Fri Oct 22, 2004 9:12 am

understanding SQL*Loader ctl file in OE stage...

Post by prabu »

Hi DS folks,
can some kind soul point out to me the fundamentals in which the control file is generated. is there any way i can control the defitions for some of the columns.

scenario: i am loading the data in Load-append mode in a Oracle Enterprise Stage (ver 7.5)


scenario 1:
the idea is i want the column defintion in the control file to be "my_sequence.nextval" for a column instead of the auto generation
:roll:

anything helps

regards,
Prabu
prabu
Participant
Posts: 146
Joined: Fri Oct 22, 2004 9:12 am

Post by prabu »

i can uderstand that i can use Write Method as Upsert and edit the generated SQL. from the definition of upsert

Insert if update fails
OR
Update if insert fails


, whereas i want to do only insert


i can say Upsert Order "Insert then update" and change the generated update statement to update.... where 1!=1, which interanlly stops any updation . Also i need to change the insert statement into seq.nextval type..

any idea...





pls thnrow some light ..
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Prabu,

If I want to force inserts and not use the bulk load process, I will choose the options 'update then insert' or 'insert then update' and afterwards choose 'custom sql' for both; then I'll cut-and-paste one to the other, so that both actually do the update or insert.

But if you only have inserts to do then I would recommend using the faster option!
prabu
Participant
Posts: 146
Joined: Fri Oct 22, 2004 9:12 am

Write Method - something is missing(standalonInsert/Update)

Post by prabu »

ArndW wrote:Prabu,

If I want to force inserts and not use the bulk load process, I will choose the options 'update then insert' or 'insert then update' and afterwards choose 'custom sql' for both; then I'll cut-and-paste one to the other, so that both actually do the update or insert.

But if you only have inserts to do then I would recommend using the faster option!
Arnd , i think giving an insert statement for a Update SQL statement is not logically correct.

i guess there is some some option missing in Write Method (please see 2. below) and that's why we choose 3. below, instead and make a work around :idea:
1.Load
2.Insert/Update
3.Upsert
4.t Delete Row


pls let me know if my understyanding is correct
prabu
Participant
Posts: 146
Joined: Fri Oct 22, 2004 9:12 am

Ray, are you listening

Post by prabu »

Ray, are you listening :)
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

If your whole issue is that you want to sequential assign a column the new available unique number, why don't you manage the process yourself inside the tool?

There are many useful techniques for doing this. All rely on "seeding" the jobstream with the starting value from which to assign. Then, use one of the parallel friendly techniques for assigning unique values. The method for seeding is to either do a select max on the key key prior to running the job and passing that in as a parameter, or at the conclusion of processing storing the max value used in a control table that is used at the beginning of the process via a select. The only issue is that gaps will occur in the sequencing as each partition independently is assign a unique set of numbers.

If you persist in using a database sequence, doesn't that introduce a choke point in your loading paradigm, as direct partition loading would require a semaphore to manage doling out numbers to the parallel threads? Wouldn't you prefer to stay with an optimized loading method of allow all partitions to simultaneously and without delay load the data using the fast insert only load method (DIRECT path)?
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Re: Ray, are you listening

Post by ray.wurlod »

prabu wrote:Ray, are you listening :)
Not while I'm working. While I'm working I give my client's tasks my full attention. I only visit DSXchange during breaks, or before or after work.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
prabu
Participant
Posts: 146
Joined: Fri Oct 22, 2004 9:12 am

Re: Ray, are you listening

Post by prabu »

ray.wurlod wrote:
prabu wrote:Ray, are you listening :)
Not while I'm working. While I'm working I give my client's tasks my full attention. I only visit DSXchange during breaks, or before or after work.
Hey Ray, take it easy. i have not meant this forum like a chatting window. :oops: i was hoping to get your views, thats all
prabu
Participant
Posts: 146
Joined: Fri Oct 22, 2004 9:12 am

Post by prabu »

kcbland wrote: If you persist in using a database sequence, doesn't that introduce a choke point in your loading paradigm, as direct partition loading would require a semaphore to manage doling out numbers to the parallel threads? Wouldn't you prefer to stay with an optimized loading method of allow all partitions to simultaneously and without delay load the data using the fast insert only load method (DIRECT path)?
let me try to understand this. if i understand correctly , the parallel threads are responsible to run the sqlloader control files using the data file generated on the fly.

do you mean to suggest that sequence.nextVal() of oralce will serailize the parallel DS threads

example: Thread t2 will wait for t1's nextVal() operation is completed.
i presume nextVal() is serialized[/b]
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Consider this thread:
http://asktom.oracle.com/pls/ask/f?p=49 ... 5961912937

GO about 2/3 of the way down to get the discussions about bulk loading 100's thousands of rows simultaneously. Transactional inserts with sequences isn't a big deal, I agree with Tom 100%. (Buy Kytes Expert-One-On-One, it's an excellent book).

I guess my point is that you have to look at the volumes you're loading, and since parallel streams of data will be accessing partitions simultaneously, you're using a sequence for a unique key. If you have a unique index, it therefore must be GLOBAL, which is anathema to the things you're supposed to be doing. That index update will cause more issues than the use of sequences.

You'll just have to experiment. You can always generate a file per table partition and manually invoke sqlldr for each naming the partition instead of the table. You can see if there's a lower DIRECT path performance. You can mitigate it with playing with the sequence cache size.

See this post, it's covered there:
http://asktom.oracle.com/pls/ask/f?p=49 ... 5886242221

Have fun.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Forgot to answer directly. Yes, I was trying to say that your partitioned loading sessinos will have to cooperate with the sequence, and that there is an amount of serializing that could take place. Tom Kyte seems to not worry so much about it, but I couldn't find where the topic of parallel partition loading with a sequence was specifically addressed.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Post Reply