How Do i read and write to a oracle table at the same time..

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

sivatallapaneni
Participant
Posts: 53
Joined: Wed Nov 05, 2003 8:36 am

How Do i read and write to a oracle table at the same time..

Post by sivatallapaneni »

Hi All,
I have a requirement from the client. he want me to read and write a oracle table in the same job, i dont have the option of using a hash file or table in the job. Any suggestions would be appreciated...

Thanks in advance.
Siva. :roll:
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

This is easy to do in DataStage.

You create a server job with an Oracle OCI input stage a transformer stage and an Oracle OCI output stage. Set the input and output stages to the same table name. Set the output action to be an UPDATE action. On your input stage you define the primary key and the fields to be transformed, you carry these fields through to the output stage with whatever transformation you require in the middle.

When the job runs it creates a snapshot of the table and it wont be affected by your update statements. If you are using the primary key of the Oracle table in your update statements you shouldn't get any problems.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Your requirement to read from and write to the same table is not a sound requirement. This design choice limits you greatly for restart capability, audit trail, performance, troubleshooting, etc. This is not a good design.

You should at least write your output to a sequential text file and then turn around and load that. Your design does not allow you to restart, unless you commit at the end of the job. For large volumes you may not be able to commit at the end. From a performance perspective, you cannot instantiate your job easily, because you will have to partition your data stream and that has impact on your update, as well as the saturation of the database handling off of that work. Furthermore, one row will update many target rows, your capability of doing so may mean you are having to commit more often because of a large transaction set on the receiving end.


At least prepare a sequential file and then load that. You'll be able to inspect the source data after it has been transformed. It's a heck of a lot easier to debug that, rather than keep recovering the target table.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Ken - you've expounded on your "flat file" methodology numerous times in this forum and I am in total agreement with you. However, I have a collegue who seems to find the whole concept ridiculous (or too much work) and is fighting me on this. :?

Looking for some help... I vaguely remember one of your longer (wham! here you go kid!) posts that rather eloquently discussed this whole concept in detail but I've been unable to turn it up in searching the forums. Can you do me a favor and provide a link to the post or perhaps just paste up a new one? Your choice if you think this is on-topic here or if we need a new post. :)
-craig

"You can never have too many knives" -- Logan Nine Fingers
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Craig, check your messages, I sent a specific reply to you.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
trobinson
Participant
Posts: 208
Joined: Thu Apr 11, 2002 6:02 am
Location: Saint Louis
Contact:

Post by trobinson »

Two Comments;
1. ETL is no longer synonymous with the word "BATCH". Staging data is meaningless in a real-time environment. Transaction Control rules here. It is paramount for restartability and recoverability.
2. I wonder how segment processing will evolve with the introduction of 7.0 or more reliance on PX type processing. That is, if I understand 7.0 correctly. In other words, vertical banding allows parallel processing but what if you want your "bands" or segments to make use of PX? They can't if each process is landing the data to a single staging file simultaneously. I think the Best Practice ETL paradigm is shifting away from landing data at all.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Read what Kimball has to say about real time data warehousing. The PX paradigm of never landing data, well, how does that approach address the bullet items at the top of my commentary? What about reloading data as opposed to reprocessing, kind of hard to reprocess if you're out of your windows. What about slowly changing dimensions? Kind of hard to do if you don't stall your source data long enough to look at your target and take the appropriate action, but yet not update your target.

ETL may no longer be synonymous with "BATCH", but "BATCH" is still the bread-and-butter of most data warehouses. Most data warehouses require static periods in marts and EDW's, just to give time for analysis on unshifting data.

Not trying to start a religious discussion. :wink:
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Kenneth

I am so sick of Kimball and Inmon. Who cares? Their books are just guidelines to help us do our jobs. With the new real time technology coming then all the rules are about to change. Customers have always wanted information ASAP. All those ideas or guidelines will change with the technology. I know and appreciate yours and their knowledge and wisdom. Sometimes we have to outgrow our mentors.

Kim.
Mamu Kim
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

There's a significant number of companies who build data warehouses where the primary source of data is file feeds. This is batch processing on the sending and receiving end.

Even with a trickle-fed EDW, data marts traditionally are static for a period of time (daily, weekly, monthly, etc) so that analysis can be done on static data. Again, the refresh would be a batch process.

For those who haven't read Kimball, please read this before you discard him as yesterday's news wrt real-time:
http://www.rkimball.com/html/designtips ... tip31.html

So, batch processing is dead or dying? Anyone else want to jump in here or do I start looking for another line of work? Real time has its time and place (sorry for the pun). Are you going to do real time to your ODS, EDW, and marts? Kimball comments better on it than me. I'll let everyone else out there do their research.

Kim, respectfully, I do a lot of reading. I meet onsite enough "data warehouse architects" who continually repeat the problems for which Kimball became famous for describing solutions. These "architects" don't even understand slowly changing dimensions, or how to model warehouses. They've never built aggregates, or handled volumes in the billions of rows. I'm not sick of Kimball, I wish more people understood what his books were trying to say. I've written a ton of posts, white papers, attended conferences, and given speeches and demonstrations. A tool is not going to replace due diligence on architecture, requirements gathering, and programming techniques. What I like about Kimball is his emphasis on structured, traditional approaches. Call me an old fart if you want. :wink:

If I'm sick of something, it's the lack of understanding that the ETL tool is not the silver bullet. RTI and PX doesn't change anything. Technology is an enabler, it's not the solution. A framework should be the right way to do things, then figure out how to make the technology work within the framework. You don't pick a tool or technology, then decide the right things to do.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Please let me clarify before people start calling Ascential and saying I'm an RTI hater. What I'm saying, is that RTI has its role in data warehousing. However, IMO, RTI is not the future replacement of ETL, but a component of it.

I absolutely agree that staging data, for a single transaction, seems silly and wasteful. But, IMO and from what I read, most data warehouses being built today are primarily batch oriented, daily processing. As real-time comes into play, there's a lot to learn from what Kimball is saying. Also, it should be apparent to everyone that real time cannot scale to massive volumes. Side-by-side, 10 million rows real time compared to 10 million rows batch, batch should win everytime. This is obvious if batch is using bulk loading, and real time is doing DML.

As for PX versus Server methodologies, they should be the same. It's a conscientious choice to land data. This is done for the previously mentioned reasons. If you go from source table to target table and don't land the data and go thru three ETL cycles and lose the database and have to recover, what if the last verified backup is three days old? That means you incur 3 days of reprocessing. Had you made the choice to stage your load ready files, you'd be reloading 3 days of data, instead of reprocessing. Your return to availability is longer had you not staged. If your processing was attempting to capture slowly changing dimension source data from the production system, well, you cannot recreate those lost days. In addition, reprocessing requires live connection to source systems. Their availability may inhibit your return to uptime on the warehouse. Too bad you didn't land your load files and archive them.

On another day I'll continue about creating a backup copy of the row you're going to update in the target. That way you can manually "rollback" data post commit. Or use that same data in calculating a delta to an aggregate table, because you need the before and after image of a row to compute its affect in an aggregate it's already been applied against.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
sivatallapaneni
Participant
Posts: 53
Joined: Wed Nov 05, 2003 8:36 am

Thanx for all the Info....

Post by sivatallapaneni »

Thanks a lot, u guys gave a lot of info. appreciate al .
Siva
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Kenneth

I agree with most of what you said. I think it absolutely required reading but their answers are evolving along with the technology. That is my point. If a data architect does not understand these theories then they are pretty much worthless in building good data warehouse solutions. When I was younger then all these theories were important to me and I would argue the importance of doing it text book perfect. The older I get the less I am concerned about being perfect. If a customer wants something done text book wrong then I will point it out but I no longer get offended if they insist on doing it their way. I did my job in pointing out a better way to do it but it is their money and can build poorly designed systems if they want. Hopefully when someone follows me on a job then they may understand that the work produced is not always what I wanted or even what I suggested. In the past I may have walked away from a situation like that because I did not want it to reflect on the quality of my work.

I must admit that it is frustrating to deal with architects that do not understand how to do it correctly and the customer has no clue. I have yet to figure out how to correct that situation without upsetting someone. The stress that creates within my own self is not worth the battle as well. I try to take work less and less as serious and my personal life has become more and more important. I think I do excellent work but my friends and family are far more important now. I really doubt if Kimball or Inman can find many data warehouses built correctly. What is worse is most of the DataStage sites use DataStage poorly or blame DataStage for poor designs. The core of a data warehouse is not the tools that build it or read from it. The core is the table structures. A poor design cannot be overcome be ETL tools or OLAP tools.

My gut feeling is that RTI is going to make data warehouse design even uglier and customers just want it to work. They do not realize that they are building systems that are very fragile because they have not used knowledgeable professionals.

Kim.
Mamu Kim
trobinson
Participant
Posts: 208
Joined: Thu Apr 11, 2002 6:02 am
Location: Saint Louis
Contact:

Post by trobinson »

Amen to that, brother.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Here's some musings while I'm at home from the hospital. My wife had anothr baby boy this morning, 8 lbs 4 ozs, Remington Suguru Bland.

1. An ETL tool CANNOT take the place of a proper software lifecycle development process. If your Warehouse Architect has not read Kimball and Inmon or at least understands the terminology and technology, then your warehouse project deserves to fail. The ETL tool is not the fault, it's a victim of who's in charge failure to enforce an SDLC.
2. An ETL tool CANNOT take the place of a proper programming architecture. Lack of sound principles in modularity, restart, documentation, and design techniques inevitably produces an ETL application that fails once it goes live. The ETL tool is not the fault, it cannot overcome design obstacles introduced because of a poor programming architecture. (Think of a screen door on a submarine. Who do you blame: the builder for building it, the designer for designing it, the military for requesting a screen door, or the captain of the submarine for trying to drive it.) Consulting companies sometimes build whatever the client wants, even if they know it won't work.
3. The ETL tool CANNOT overcome the limitations of the team using it. The best tool can be defeated by the worst developers. The best developers can overcome limitations of the worst tool. Give me the best engineered race car in the world and I still can't best the best driver in a bad car. I don't have the expertise to drive the best car at its upper limits.
4. All tools require such a disciplined approach to developing an ETL application that it usually is not found in a team that didn't realize their limitations when they did the warehouse the first time by hand. Hoping the tools solves all problems is wishful thinking.
5. Data warehouses are extremely complex and take EXPERTS to deploy, at least from the highest level. You can staff a data warehouse team internally, however, you need an architect who's done it before successfully and with good reputation. In my opinion, any company that outsources its data warehouse project to a vendor is nuts. Vendors usually blow smoke and say the tool does it all. A superior architect will layout a 12 month plan, with phased incremental rollouts of aspects of the warehouse. The necessary components may include a persistent staging/induction layer database, and enterprise database, and several marts including aggregates, with a complete integration to a version control tool and an enterprise schedule. Optional components would include a publish and subscribe framework for deploying retrievable data extracts, an intranet site for external monitoring as well as enterprise visibility to database currency, and a metadata framework where by all business and process metadata is published on said intranet. A tool doesn't do any of this, it takes experts. If you were going to build a house, and having never done it before, do you at least go hire an experienced foreman to assist/guide/review your designs/methods/work?

Naptime, been up since 3:00am.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Kenneth

Congrats. God bless.

Kim.
Mamu Kim
Post Reply