Suggestion: Separate input and output path vairables

PhilHibbs · Post by **PhilHibbs** » Wed May 17, 2006 3:47 am

All of our jobs have a "ROOT" parameter, which specifies th root directory where all files are to be located. This parameter is different, for example, for development, QA, and production runs. The three values are "/home/migration/Dev", "/home/migration/QA", and "/home/migration/Prod".

My suggestion is that if you are going to use a system like this, you should instead implement two parameters: input and output (or source and destination); "IROOT" and "OROOT", or "SROOT" and "DROOT" depending on your preference.

Recently I have wanted to run the development version of a job, but read the source data from the QA environment while still sending output to the development environment. I have done this by making a CopyOf the job, and editing all the source file paths to insert a hard-coded value instead of the parameter. If I had thought of this earlier, and implemented two parameters instead of one, then this kind of run would be much easier.

I am interested to hear other suggestions as to how file locations are parameterised in other peoples' projects.

ashwin141 · Post by **ashwin141** » Wed May 17, 2006 4:53 am

Hi Phil

We did this in a bit different way. We were supposed to three different directories for Development (dev) , UAT (int) and Production (prod).

The manner in which we did it is, we kept the entire directory structure and file naming convention same on all the three servers. While parametrising it for UAT and Production we simply changed 'dev' to 'int' and 'prod' respectively. This was a one time process and it helped us to have a smooth run for UAT and Production.

Regards
Ashwin

PhilHibbs · Post by **PhilHibbs** » Wed May 17, 2006 5:07 am

ashwin141 wrote:Hi Phil

We did this in a bit different way. ... While parametrising it for UAT and Production we simply changed 'dev' to 'int' and 'prod' respectively. This was a one time process and it helped us to have a smooth run for UAT and Production.

Seems to me that that is pretty much the same. All I am saying is that it might be useful to have two parameters, one for input and one for output, so you can run code in the development environment that takes its inputs from the QA environment.

Did you use project environment variables? We are using DS 7.0.1 so we don't have those available (or, we do, but they don't work properly) so I have a script that makes these changes (and a few others, plus some reports) in between export/import.

ashwin141 · Post by **ashwin141** » Wed May 17, 2006 5:30 am

No Phil

We didn't use any Project environmental variables.

Ashwin

chulett · Post by **chulett** » Wed May 17, 2006 7:07 am

While it's nice to be able to have the directory structures identical across dev, test and production it isn't really practical - or what you'll find out in the real world much of the time. For us, the 'dev' and 'test' projects and file areas are all on one machine, so they cannot be the same.

I've used both methodologies on different projects - do you use two sets of parameters for the 'same thing' to logically differentiate source from target or do you stick with a single set? This can apply to more than just file locations, database parameters need the same consideration. And whether you use Project, Environment or 'regular' job parameters doesn't affect this discussion one lick.

One 'set' is the easiest for people to understand and follow, but gives rise to the problem that Phil mentioned - when you are reading and writing in the same job using a single parameter set, how can you disassociate them, read from 'dev' and write to 'test' for instance? You can't.

When you define two parameter sets, let's say one for the 'Source' and one for the 'Target' side of the house, that can solve the above problem nicely. While they are defaulted to the same value, one or the other can be overridden for the specialized case at runtime. One 'problem' or area of confusion with that that I've found is when you string a series of jobs together, it can be interesting to decide when to use which. Job A reads from source parameters and then writes to target parameters. Job B needs to read the data created by Job A - does it do this via the source or target parameters? How does this decision affect your job control? I know ours can't handle using the same named parameter in different jobs running in the same logical job stream that require different values over the course of the run, job to job.

There have been a couple of recent posts on this subject by Ray Wurlod and Kim Duke, I do believe. People would be well served to hunt them down and read those as well. Most questions here have more than one answer, and there may not be one that is more 'correct' than any other. I'd be curious what other's experiences have been.

Phil, I'm curious about your "root" parameter where "all files are to be located"... do you have logical sub-directories under there for different sources or subject areas or whatever you'd prefer to call them? If so, does that portion of the pathname get 'hard coded' into your jobs? I've never gone that route because it just doesn't ever seem to work out that way in Production. We have specific parameters for specific locations, meaning different parameters for let's say... extract, load, hash, xml input, xml output, etc. While they may share a common base path or 'root' value, that more than likely isn't true for all of them.

We just recently went through a performance enhancement exercise with our Sys Admins where specific filesystems were setup for different activities so they could try out different stripping factors, cache values, etc for reading versus writing. It required one parameter change, for example, to move where all of the 'X' files were read from, or 'Y' files written to. Not sure I could have done that if all locations were assumed to always be under one root directory. Now, perhaps I am totally misunderstanding your situation, if so then (in those imortal words)... "never mind".

chulett · Post by **chulett** » Wed May 17, 2006 7:11 am

Ah... [sigh] I've not marked this as a Premium post initially but will change it to one 'shortly' - whatever that means.

PhilHibbs · Post by **PhilHibbs** » Wed May 17, 2006 7:28 am

chulett wrote:Phil, I'm curious about your "root" parameter where "all files are to be located"... do you have logical sub-directories under there for different sources or subject areas or whatever you'd prefer to call them? If so, does that portion of the pathname get 'hard coded' into your jobs? I've never gone that route because it just doesn't ever seem to work out that way in Production. We have specific parameters for specific locations, meaning different parameters for let's say... extract, load, hash, xml input, xml output, etc. While they may share a common base path or 'root' value, that more than likely isn't true for all of them.

The "ROOT" parameter is just the first - file names and hash paths are made up of a whole series of parameters that determine the shape of the directory hierarchy. A typical sequential file name would be "#ROOT#/#SITE#/#AOW#/#DMR#/FileName.#SITE#".

I have a nice little mechanism that expands all these parameters for me - I feed a project export through a perl script, and it fills in all the job default parameters and spits out a complete list of all files and hashes that are referenced in the export, as well as a series of "mkdir -m -771" commands. The script also scans for file names with spaces in them, sequential files without column headings, external SQL files, and a whole bunch of other common problem areas.

I appreciate that most projects have a mix of database, sequential file, SAP, etc. data sources and targets, but all of ours fit into a nice pattern of "extract", "transform" or "load", and all transforms just read and write sequential files and hashes.

Phil Hibbs.

chulett · Post by **chulett** » Wed May 17, 2006 7:48 am

Figured other parameters made up the entire filename. But are you still saying that the ROOT parameter only has a single value in any given environment? If so, I'm glad it's working for you as it wouldn't in our environment.

PhilHibbs · Post by **PhilHibbs** » Wed May 17, 2006 8:06 am

chulett wrote:... are you still saying that the ROOT parameter only has a single value in any given environment? If so, I'm glad it's working for you as it wouldn't in our environment.

I could easily run the entire project export through a grep that replaces #ROOT# with #VOL1#/#ROOT#, and manually (or programmatically, if a regex can identify the appropriate files) change #VOL1# to #VOL2# for the files that need to be on a different volume. The reason it is simply one parameter is that we have not needed it to be any more complex.

DSXchange