Hi friends,
I little confused between server and parallel job.
Project
1.I have set of validation on a flat file (record size)
2.Removing duplicates
3. Finding the new records modified and old records in that flat file taking with a teradata tabe.
If i select a server job i have easzy interaction and teradata
If i take a parallel job removing duplicates is ez.
The OS is unix
Or should i use combination of both jobs for my project and can i sequence a parallel job and then a server job
Thanks in advance
Server Job or Parallel
Moderators: chulett, rschirm, roy
Hi,
Yes you can combine PX and Server and I think that is the best solution.
Use PX with the Stage remove duplicate
Good luck,
Pey
Yes you can combine PX and Server and I think that is the best solution.
If the file is not big, go with server and execute them in parallel.1.I have set of validation on a flat file (record size)
2.Removing duplicates
Use PX with the Stage remove duplicate
Use PX with the stage Change Capture3. Finding the new records modified and old records in that flat file taking with a teradata tabe.
Good luck,
Pey
Pey as u told me to to use change capture for finding out the old and new records. When i was reading the doc for change capture stage. The stage take input from a dataset which was sorted first.Peytot wrote:Hi,
Yes you can combine PX and Server and I think that is the best solution.If the file is not big, go with server and execute them in parallel.1.I have set of validation on a flat file (record size)2.Removing duplicates
Use PX with the Stage remove duplicateUse PX with the stage Change Capture3. Finding the new records modified and old records in that flat file taking with a teradata tabe.
Good luck,
Pey
My question was can i give change capture a teradata table and sequential file. If yes do i have to sort it first.
Thanks
The term Dataset here is really an internal term. For PX, each stage would read in datasets, and output datasets as defined.
So when you see the word "dataset", think, "the stream of data going in/out."
Also, yes - it needs to be sorted. However, it is automatically done by the stage if you leave the setting as "Auto".
-T.J.
So when you see the word "dataset", think, "the stream of data going in/out."
Also, yes - it needs to be sorted. However, it is automatically done by the stage if you leave the setting as "Auto".
-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
Sort all your input.
First step, extract your data into a sequential file (sort your data in your Sql).
Second step, Use Change capture for recover all your new data and the updated data. You can use after the change capture a transform for filtrering your data.
But do not forget, you cannot use a dataset in Server (and you cannot use a HashFile in PX).
Pey
First step, extract your data into a sequential file (sort your data in your Sql).
Second step, Use Change capture for recover all your new data and the updated data. You can use after the change capture a transform for filtrering your data.
But do not forget, you cannot use a dataset in Server (and you cannot use a HashFile in PX).
Pey
Re: getting the reject ones
To keep the records that was dropped by the change capture, one method is to use a copy stage before and after the change capture stage. Split 2 streams on both stages, and the link of the before copy stage is the input link, while the link of the after copy stage is the lookup link.
Use your keys from the change capture for this lookup stage to identify records that has been dropped. The performance is not so great on very large datasets on the lookup link, so be forewarned.
A diagram:
Another possibility is that the change capture drop the record to a reject link. See if there is such a thing for that stage.
Use your keys from the change capture for this lookup stage to identify records that has been dropped. The performance is not so great on very large datasets on the lookup link, so be forewarned.
A diagram:
Code: Select all
copy stage --------> change capture -------> copy stage ------ ... (Records changed)
| |
| | Lookup Link
| Input link V
+-------------------------------------> lookup stage -----> copy stage (dead end)
|
| Reject Link
+------------ ... (Records not changed)
Developer of DataStage Parallel Engine (Orchestrate).
A query regarding the sort, the documentation states "If the stage is partitioning incoming data the sort occurs after the partitioning"Teej wrote:Also, yes - it needs to be sorted. However, it is automatically done by the stage if you leave the setting as "Auto".
-T.J.
Now to me this says that if the incoming data isn't sorted, it will be split first and then sorted. Thus potentially having unsorted data across partitions, is it me or am I reading it correctly?
I can fully understand it sorting within partitions if already partitioned but if it is actually responsible for the partitioning, then surely you would want it appropriately sorted first? I guess this means it is advisable to always sort before to ensure you have the data suitably aligned.
To keep along this thread, within the Stage->Properties of Change Capture, for the Key option it has a Sort Order sub-option. The tip says "Sort in ascending or descending order". Which implies this is a request to sort the data, is that right or is it actually a directive stating the incoming data is sorted in this manner?
Thanks