Server Job or Parallel

vzmz · Post by **vzmz** » Tue Dec 02, 2003 9:23 am

Hi friends,
I little confused between server and parallel job.
Project
1.I have set of validation on a flat file (record size)
2.Removing duplicates
3. Finding the new records modified and old records in that flat file taking with a teradata tabe.

If i select a server job i have easzy interaction and teradata
If i take a parallel job removing duplicates is ez.

The OS is unix

Or should i use combination of both jobs for my project and can i sequence a parallel job and then a server job
Thanks in advance

Peytot · Post by **Peytot** » Tue Dec 02, 2003 9:37 am

Hi,
Yes you can combine PX and Server and I think that is the best solution.

1.I have set of validation on a flat file (record size)

If the file is not big, go with server and execute them in parallel.

2.Removing duplicates

Use PX with the Stage remove duplicate

3. Finding the new records modified and old records in that flat file taking with a teradata tabe.

Use PX with the stage Change Capture

Good luck,

Pey

vzmz · Post by **vzmz** » Wed Dec 03, 2003 9:26 am

Peytot wrote:Hi,
Yes you can combine PX and Server and I think that is the best solution.
1.I have set of validation on a flat file (record size)
If the file is not big, go with server and execute them in parallel.
2.Removing duplicates

Use PX with the Stage remove duplicate
3. Finding the new records modified and old records in that flat file taking with a teradata tabe.
Use PX with the stage Change Capture

Good luck,

Pey

Pey as u told me to to use change capture for finding out the old and new records. When i was reading the doc for change capture stage. The stage take input from a dataset which was sorted first.
My question was can i give change capture a teradata table and sequential file. If yes do i have to sort it first.

Thanks

Teej · Post by **Teej** » Wed Dec 03, 2003 11:56 am

The term Dataset here is really an internal term. For PX, each stage would read in datasets, and output datasets as defined.

So when you see the word "dataset", think, "the stream of data going in/out."

Also, yes - it needs to be sorted. However, it is automatically done by the stage if you leave the setting as "Auto".

-T.J.

Peytot · Post by **Peytot** » Wed Dec 03, 2003 12:48 pm

Sort all your input.
First step, extract your data into a sequential file (sort your data in your Sql).
Second step, Use Change capture for recover all your new data and the updated data. You can use after the change capture a transform for filtrering your data.

But do not forget, you cannot use a dataset in Server (and you cannot use a HashFile in PX).

Pey

Teej · Post by **Teej** » Wed Dec 03, 2003 8:16 pm

To keep the records that was dropped by the change capture, one method is to use a copy stage before and after the change capture stage. Split 2 streams on both stages, and the link of the before copy stage is the input link, while the link of the after copy stage is the lookup link.

Use your keys from the change capture for this lookup stage to identify records that has been dropped. The performance is not so great on very large datasets on the lookup link, so be forewarned.

A diagram:

Code: Select all

copy stage --------> change capture -------> copy stage ------ ... (Records changed)
    |                                            |
    |                                            | Lookup Link
    |                Input link                  V
    +-------------------------------------> lookup stage -----> copy stage (dead end)
                                                 |
                                                 | Reject Link
                                                 +------------ ... (Records not changed)

Another possibility is that the change capture drop the record to a reject link. See if there is such a thing for that stage.

Kryt0n · Post by **Kryt0n** » Thu Jul 21, 2005 9:56 pm

Teej wrote:Also, yes - it needs to be sorted. However, it is automatically done by the stage if you leave the setting as "Auto".

-T.J.

A query regarding the sort, the documentation states "If the stage is partitioning incoming data the sort occurs after the partitioning"

Now to me this says that if the incoming data isn't sorted, it will be split first and then sorted. Thus potentially having unsorted data across partitions, is it me or am I reading it correctly?

I can fully understand it sorting within partitions if already partitioned but if it is actually responsible for the partitioning, then surely you would want it appropriately sorted first? I guess this means it is advisable to always sort before to ensure you have the data suitably aligned.

To keep along this thread, within the Stage->Properties of Change Capture, for the Key option it has a Sort Order sub-option. The tip says "Sort in ascending or descending order". Which implies this is a request to sort the data, is that right or is it actually a directive stating the incoming data is sorted in this manner?

Thanks

DSXchange

Server Job or Parallel

Server Job or Parallel

Re: getting the reject ones