configration file, paging standards?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
peep
Premium Member
Premium Member
Posts: 162
Joined: Mon Aug 20, 2012 6:52 pm

configration file, paging standards?

Post by peep »

OS: AIX
DS: 8.7
questuon 1:
In configration file for datasets are there any standard paging spaces ?
like while running a job it is showing on 3 nodes

node1 5001216
node2 0
node3 5001216

this is very consistant.
so do you where i can find these paging standards?
Question 2:
how can i archive dataset files in node1 ,node 2 ,node3
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

1) There are no standards for paging. What determines how much data goes to each node is the partitioning algorithm. Your scenario (populating only two nodes out of three) might be seen if the data were partitioned using a key-based partitioning algorithm based on a two-valued field, what is classed as an Indicator were you profiling it with Information Analyzer.

2) The best way to archive Data Sets is to use the orchadmin command. This guarantees that every defined data file is picked up. The data, or "segment" files reside on directories identified as disk resource in the configuration file, and the descriptor file (the one with the ".ds" suffix to its name) records the location of these segments files (along with other information, such as the record schema).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
peep
Premium Member
Premium Member
Posts: 162
Joined: Mon Aug 20, 2012 6:52 pm

Post by peep »

partitioning algorithm can be seen in config file?

if not whr can i see it?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The partitioning algorithm can not be seen in the configuration file.

It can be seen in the job design, on the input link of any stage executing in parallel mode.

It can also be seen in the job score.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
peep
Premium Member
Premium Member
Posts: 162
Joined: Mon Aug 20, 2012 6:52 pm

Post by peep »

Its can also be seen in the ?
in job design input link and where?
peep
Premium Member
Premium Member
Posts: 162
Joined: Mon Aug 20, 2012 6:52 pm

Post by peep »

is it something related to transport block size Env variable?
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Are you perhaps inquiring about operating system paging rather than dataset partitioning?
- james wiles


All generalizations are false, including this one - Mark Twain.
peep
Premium Member
Premium Member
Posts: 162
Joined: Mon Aug 20, 2012 6:52 pm

Post by peep »

Dataset ... not OS
but it is just a question..

so where can i find those values ?

i see the dataset.ds file size as 0, 5001216,131072 ... are these values which set in env variables via administrator client ? or these are default values ?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The file sizes are a result of the data that is put in them. This means that your file 0 has no records, which means that your selected partitioning algorithm isn't distributing the records evenly and you should rethink the algortihm used or the key used.
peep
Premium Member
Premium Member
Posts: 162
Joined: Mon Aug 20, 2012 6:52 pm

Post by peep »

wat ever ur reply is can u pls paste it in my inbox..
im nt able to read here
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Dear peep,

1. This is a forum, not an SMS telephone exchange. Please use real words and vowels since many here, including myself and yourself, are not native English speakers and communication is difficult enough with the technical nature of DataStage.

2. While most of the subject matter is freely visible, some is only visible to those who have opted to support the forum by becoming premium members.

The initial part of my earlier message is visible and provides enough information to suggest the path to a solution.
peep
Premium Member
Premium Member
Posts: 162
Joined: Mon Aug 20, 2012 6:52 pm

Post by peep »

ok I am sorry for using short forms.
where can I see the input link data ?
In my research I found this block values can be defined in job properties and it can be determined by considering the largest records in the link of a job and setting this value more than the largest record value.

we can set this value in job parameters APT_Default_TRANSPORT_BLOCK_SIZE.

Here is more info please correct me if i am wrong.

APT_AUTO_TRANSPORT_BLOCK_SIZE- does this take the required amount of space to pass the records . Rather using the defined amount of disk space.
By using this variable we can allow the job to use the required amount of disk space.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The APT_TRANSPORT_BLOCK_SIZE does not affect disk use, it affects the block size of data transport between stages at runtime.
What is your partitioning algorithm and key for your data set?
Try setting it to "round robin" and then your 3 part files will have almost identical sizes.
peep
Premium Member
Premium Member
Posts: 162
Joined: Mon Aug 20, 2012 6:52 pm

Post by peep »

auto partitioning..

where can i find the key of datasets?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Go to the job where you write to the dataset and see what the partitioning algorithm and key was set to in the job, the dataset inherits/uses these settings.
Post Reply