Doubt in - Sequence of records processed by DataStage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
vij
Participant
Posts: 131
Joined: Fri Nov 17, 2006 12:43 am

Doubt in - Sequence of records processed by DataStage

Post by vij »

Hi all,

I have a doubt in the way the records are processed in DataStage. I have about 100 records coming from the source and there is no filter or other such operations, so I want to have all the 100 records, which I am getting now, but I want to have the same order, how it is present in the source, ie., the first record from the source should be stored in the target table as the first record and so on, till the 100th record.

Can anyone please advice me how to get this in DataStage?

Thanks!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What you want is exactly what DataStage does.

The target database table, however, stores the records in whatever order it wants to. This is a fundamental property of relational databases; that storage order is irrelevant.

It's not a DataStage issue.

To get sorted data out of a table, the rules state that you specify an ORDER BY clause in your query.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jhmckeever
Premium Member
Premium Member
Posts: 301
Joined: Thu Jul 14, 2005 10:27 am
Location: Melbourne, Australia
Contact:

Post by jhmckeever »

... and if you're writing a sequential file as output (or some other stage where output order might be important) then use a sort Stage to sort by a key value.

If you need to preserve the input order in your output you can generate an appropriate key for the input data. Do a search on @INROWNUM.

J.
<b>John McKeever</b>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
vij
Participant
Posts: 131
Joined: Fri Nov 17, 2006 12:43 am

Post by vij »

I have SQL server table as the target, but even then, the same order is not preserved. Should I enable any maintain order option?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

vij - why would you want to maintain any particular order when writing to this SQL server table? Sometimes the order of keys can make a performance difference when loading, but that doesn't sound like your issue.

Basically no matter what the order is of the records you are inserting to SQL Server the same query to the database will return the same results.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

No database will allow you to control the order in which rows are stored in a table.

What you are asking for is simply not permitted.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Since you are using PX edition, the order is not guaranteed. Unless you are processing in sequential mode. Since the records are processed in different CPU's, based on the resource of each CPU the records will be processed, and the order will be changed.
If you strictly need the order to be maintained, you can load based on some sequence key generated during select. It can be even the rowid/rownum.
Interestingly, there is no guaranty that two simple select query will return in the rows in same order.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Cant he stick a transformer before the database stage and then choose combinability mode as "combinable" from the transforme stage properties ?
But as noted by others, this will still not ensure how the record is stored in the database. But if you are loading to a file then by setting the combinability mode, it should work. Still need confirmation on that from others.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

vij wrote:I have SQL server table as the target, but even then, the same order is not preserved. Should I enable any maintain order option?
The issue here is simply a lack of understanding on how relational databases work. As noted, there is no 'order' to be 'preserved'. If you want to get records back out of a table in the same order you inserted them, you'll need to add additional columns to the table to allow that.

Something like a batch number + record number in the batch would work. Then you can ORDER BY both fields and get things out like you put them in. Or a surrogate key - anything that allows you to ORDER the output properly.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Though the combinable option is enabled, its not guaranteed that the operators will be combined. Unless Dump score is overviewed. And more over this option will not affect the order of the row processing across nodes.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
vij
Participant
Posts: 131
Joined: Fri Nov 17, 2006 12:43 am

Post by vij »

Hi all,

Let me explain the query again: It is related to loading data into a SQL SERVER DB table from the sequential file, not retriving the data from the table.

Actually I use a Sequential file to load the table and this sequential file was inturn an ouput from a job, which runs in parallel and here is where the sequence got missed out.

In the job where I am loading the table, the same sequence what I am looking for is maintained.

And now the problem is resolved. Thanks for all your help.
Post Reply