Page 1 of 1

Doubt in - Sequence of records processed by DataStage

Posted: Wed Feb 07, 2007 12:56 am
by vij
Hi all,

I have a doubt in the way the records are processed in DataStage. I have about 100 records coming from the source and there is no filter or other such operations, so I want to have all the 100 records, which I am getting now, but I want to have the same order, how it is present in the source, ie., the first record from the source should be stored in the target table as the first record and so on, till the 100th record.

Can anyone please advice me how to get this in DataStage?

Thanks!

Posted: Wed Feb 07, 2007 2:37 am
by ray.wurlod
What you want is exactly what DataStage does.

The target database table, however, stores the records in whatever order it wants to. This is a fundamental property of relational databases; that storage order is irrelevant.

It's not a DataStage issue.

To get sorted data out of a table, the rules state that you specify an ORDER BY clause in your query.

Posted: Wed Feb 07, 2007 2:53 am
by jhmckeever
... and if you're writing a sequential file as output (or some other stage where output order might be important) then use a sort Stage to sort by a key value.

If you need to preserve the input order in your output you can generate an appropriate key for the input data. Do a search on @INROWNUM.

J.

Posted: Wed Feb 07, 2007 3:20 am
by vij
I have SQL server table as the target, but even then, the same order is not preserved. Should I enable any maintain order option?

Posted: Wed Feb 07, 2007 4:22 am
by ArndW
vij - why would you want to maintain any particular order when writing to this SQL server table? Sometimes the order of keys can make a performance difference when loading, but that doesn't sound like your issue.

Basically no matter what the order is of the records you are inserting to SQL Server the same query to the database will return the same results.

Posted: Wed Feb 07, 2007 4:47 am
by ray.wurlod
No database will allow you to control the order in which rows are stored in a table.

What you are asking for is simply not permitted.

Posted: Wed Feb 07, 2007 5:09 am
by kumar_s
Since you are using PX edition, the order is not guaranteed. Unless you are processing in sequential mode. Since the records are processed in different CPU's, based on the resource of each CPU the records will be processed, and the order will be changed.
If you strictly need the order to be maintained, you can load based on some sequence key generated during select. It can be even the rowid/rownum.
Interestingly, there is no guaranty that two simple select query will return in the rows in same order.

Posted: Wed Feb 07, 2007 8:16 am
by DSguru2B
Cant he stick a transformer before the database stage and then choose combinability mode as "combinable" from the transforme stage properties ?
But as noted by others, this will still not ensure how the record is stored in the database. But if you are loading to a file then by setting the combinability mode, it should work. Still need confirmation on that from others.

Posted: Wed Feb 07, 2007 8:30 am
by chulett
vij wrote:I have SQL server table as the target, but even then, the same order is not preserved. Should I enable any maintain order option?
The issue here is simply a lack of understanding on how relational databases work. As noted, there is no 'order' to be 'preserved'. If you want to get records back out of a table in the same order you inserted them, you'll need to add additional columns to the table to allow that.

Something like a batch number + record number in the batch would work. Then you can ORDER BY both fields and get things out like you put them in. Or a surrogate key - anything that allows you to ORDER the output properly.

Posted: Thu Feb 08, 2007 2:25 am
by kumar_s
Though the combinable option is enabled, its not guaranteed that the operators will be combined. Unless Dump score is overviewed. And more over this option will not affect the order of the row processing across nodes.

Posted: Thu Feb 08, 2007 4:53 am
by vij
Hi all,

Let me explain the query again: It is related to loading data into a SQL SERVER DB table from the sequential file, not retriving the data from the table.

Actually I use a Sequential file to load the table and this sequential file was inturn an ouput from a job, which runs in parallel and here is where the sequence got missed out.

In the job where I am loading the table, the same sequence what I am looking for is maintained.

And now the problem is resolved. Thanks for all your help.