Server Job or Parallel

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
vzmz
Participant
Posts: 36
Joined: Sun Nov 23, 2003 12:10 pm
Location: Dallas

Server Job or Parallel

Post by vzmz »

Hi friends,
I little confused between server and parallel job.
Project
1.I have set of validation on a flat file (record size)
2.Removing duplicates
3. Finding the new records modified and old records in that flat file taking with a teradata tabe.

If i select a server job i have easzy interaction and teradata
If i take a parallel job removing duplicates is ez.

The OS is unix

Or should i use combination of both jobs for my project and can i sequence a parallel job and then a server job
Thanks in advance
Peytot
Participant
Posts: 145
Joined: Wed Jun 04, 2003 7:56 am
Location: France

Post by Peytot »

Hi,
Yes you can combine PX and Server and I think that is the best solution.
1.I have set of validation on a flat file (record size)
If the file is not big, go with server and execute them in parallel.
2.Removing duplicates

Use PX with the Stage remove duplicate
3. Finding the new records modified and old records in that flat file taking with a teradata tabe.
Use PX with the stage Change Capture

Good luck,

Pey
vzmz
Participant
Posts: 36
Joined: Sun Nov 23, 2003 12:10 pm
Location: Dallas

Post by vzmz »

Peytot wrote:Hi,
Yes you can combine PX and Server and I think that is the best solution.
1.I have set of validation on a flat file (record size)
If the file is not big, go with server and execute them in parallel.
2.Removing duplicates

Use PX with the Stage remove duplicate
3. Finding the new records modified and old records in that flat file taking with a teradata tabe.
Use PX with the stage Change Capture

Good luck,

Pey
Pey as u told me to to use change capture for finding out the old and new records. When i was reading the doc for change capture stage. The stage take input from a dataset which was sorted first.
My question was can i give change capture a teradata table and sequential file. If yes do i have to sort it first.

Thanks
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Post by Teej »

The term Dataset here is really an internal term. For PX, each stage would read in datasets, and output datasets as defined.

So when you see the word "dataset", think, "the stream of data going in/out."

Also, yes - it needs to be sorted. However, it is automatically done by the stage if you leave the setting as "Auto".

-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
Peytot
Participant
Posts: 145
Joined: Wed Jun 04, 2003 7:56 am
Location: France

Post by Peytot »

Sort all your input.
First step, extract your data into a sequential file (sort your data in your Sql).
Second step, Use Change capture for recover all your new data and the updated data. You can use after the change capture a transform for filtrering your data.

But do not forget, you cannot use a dataset in Server (and you cannot use a HashFile in PX).

Pey
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Re: getting the reject ones

Post by Teej »

To keep the records that was dropped by the change capture, one method is to use a copy stage before and after the change capture stage. Split 2 streams on both stages, and the link of the before copy stage is the input link, while the link of the after copy stage is the lookup link.

Use your keys from the change capture for this lookup stage to identify records that has been dropped. The performance is not so great on very large datasets on the lookup link, so be forewarned.

A diagram:

Code: Select all

copy stage --------> change capture -------> copy stage ------ ... (Records changed)
    |                                            |
    |                                            | Lookup Link
    |                Input link                  V
    +-------------------------------------> lookup stage -----> copy stage (dead end)
                                                 |
                                                 | Reject Link
                                                 +------------ ... (Records not changed)
Another possibility is that the change capture drop the record to a reject link. See if there is such a thing for that stage.
Developer of DataStage Parallel Engine (Orchestrate).
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

Teej wrote:Also, yes - it needs to be sorted. However, it is automatically done by the stage if you leave the setting as "Auto".

-T.J.
A query regarding the sort, the documentation states "If the stage is partitioning incoming data the sort occurs after the partitioning"

Now to me this says that if the incoming data isn't sorted, it will be split first and then sorted. Thus potentially having unsorted data across partitions, is it me or am I reading it correctly?

I can fully understand it sorting within partitions if already partitioned but if it is actually responsible for the partitioning, then surely you would want it appropriately sorted first? I guess this means it is advisable to always sort before to ensure you have the data suitably aligned.

To keep along this thread, within the Stage->Properties of Change Capture, for the Key option it has a Sort Order sub-option. The tip says "Sort in ascending or descending order". Which implies this is a request to sort the data, is that right or is it actually a directive stating the incoming data is sorted in this manner?

Thanks
Post Reply