Restrict Access to Variables Using Four Categories and the Magical Number

When you write a program without any modularity, the difficulty seems to go up with the square of the number of statements. One way to understand why this might be is to note that any two statements might "interact". There are many such "interactions". For example, one statement might set a value of a variable is used by another. Such interactions are usually ordered, i.e. what happens depends on which statement comes first. To count potential interactions is the same as counting the number of ordered pairs of statements. This number is N(N-1) where N is the total number of statements.

In other words, without any modularity the potential difficulty of understanding potential interactions between statements goes up with the square of the number of statements.

With modularity, potential interactions are limited to two statements within one module. For example, consider a program that can be written with a hundred lines without modularity, or that can be written in three modules one having 30, 40, and 45 lines each.

The potential interactions between statements in the non-modular version is 9900. In the modular version, the largest number of potential interactions for you to worry about at one time is in the largest module, i.e. 1980 interactions.

The advantage to the modular version is clear.

The advantage to modularity can be had with any kind of module: package, class, subprogram, separately compiled files, etc. For the advantage to be real there must be no interactions between statements in different modules. This means, among other things, that your variables cannot be global.

In practice, some global variables are almost always wanted. Deciding which ones and how you will ensure their correct use is part of the same plan that drives your creation of modules. Creating and evaluating that plan will be the subject of other tips. This tip is concerned with how we restrict access to variables. You need it because without restricted access, you don't get the advantage of modularity discussed above.

I propose to give you a way of looking at variable accessibility that can be used with an older procedural language as well as with a more modern object-oriented language. You can use it when you have language support to enforce your decisions and you can use it when the only enforcement of your decisions is a convention. Conventions are more readily followed when they can be used in all your programming.

My way of categorizing a variable's accessibility isn't as flexible as the scope rules of many languages, but it can be remembered at 2 am and it can used to communicate with programmers who are writing in very different languages than you are. Besides many language nuances just enable you to do things you shouldn't.

Here are my categories:

local
The variable is declared locally within a procedure or function. It is not a parameter.

regional
The variable is declared within a separately compiled file, a module, or a class. It is known to a specific, and hopefully small, set of procedures or functions.

global
The variable is known to all procedures and functions -- usually without being passed as a parameter. Note, variables declared in a "main" program that are not known in procedures and function subprograms are not considered to be global -- they are local.

shifty
The variable is a parameter: it is declared locally within a procedure or function but it represents some other variable. Which other variable depends on the invocation of the procedure or function.

I think we can agree that the easiest category of variable to deal with is the local variable. This is mainly because the context in which you must think about a local variable is relatively short.

Beginning programmers instinctively think of shifty variables as the most difficult to deal with -- even when this category is not introduced with such a loaded name. I have come to believe beginning programmers are almost right on this one -- hence the name "shifty".

However, shiftiness is valuable -- without it, to print the number 2 you would have to write something like

  print_value := 2  print  

instead of

  print(2)  

The variable print_value would be global and would exist only to tell the procedure print what to print. Clearly, the shifty variable which is the parameter to the print procedure is quite useful!

Moreover, shifty variables can be better for program clarity than global ones. Look at the difference between these two procedure invocations


proc(A,B,C,D,E,F,G,H,I)

and


proc

Assume that the code in the first version of proc has no access to global variables and that the code in the second has access to all global variables. That means, when you read the first procedure invocation, you know exactly which variables proc may diddle with, but when read the second procedure invocation you haven't a clue. The first procedure invocation can give the programmer a big advantage, but not as big as the use of regional variables can.

Regional variables are known to a specific, hopefully small, number of procedures. These procedures may all exist in one separately compiled file, in one package, in one class, etc. The programmer makes use of the modularity features in the underlying programming language (or simply makes use of comments and conventions) to enforce a rule that only specifically approved procedures access regional variables in any way at all.

Suppose the variables C, D, E, F, G, H, and I are regional and that proc is one of the procedures which can access them. When proc is invoked, these regional variables are known always to be part of its environment. The variables A and B, however, must be specified in the code that invokes proc to say how proc should do it's job.

Under these conditions, neither of the two previous procedure invocations conveys a sense of what's going on. What does convey that sense is this procedure invocation,


proc(A,B)

along with documentation how proc and its brethren affect their regional variables C, D, E, F, G, H, and I. Regional variables are not passed as parameters -- they are access directly by the procedures in their regions. Regional variables do not tie all statements together with potential interactions the way global variables do -- only the procedures in their region are tied together.

The invocation of proc just shown says "do your thing using A and B ".

These days a common way of implementing regional variables is to put them in an object with the procedures that can play with them. For example, if O is an object containing proc's regional variables, the procedure invocation just shown becomes


O.proc(A,B)

which says to O "adjust yourself with proc using A and B.

Whether object oriented or not, the use of regional variables helps the programmers compartmentalize their thinking. Without such compartmentalizing, it is difficult to keep intellectual control over a complicated system.

A research article in psychology that was written forty years ago has become a part of the education of a software engineer. The author makes a believable claim that we humans can juggle up to (about) seven things at a time. More than that and we start losing it. (Have you seen anybody juggle 12 things? Juggling, by the way, is not one of the author's examples -- perhaps, like me, he couldn't juggle at all.)

When we must deal with more than seven things, we must put them in compartments. Then we can deal at some times with the compartments and at other times with the contents of individual compartments.

My categories of variable accessibility give you four categories of variable to juggle at any one time. Of these categories, only the global variables are the same in all parts of your code. The other variables change with the context. Because they change, you don't have to think about all your variables at once.

Most of the time you should be able to keep the variables in any one category to fewer than seven. And, when you cannot, I have another way of categorizing variables to help you.

I'll end this tip with another view of the four categories of variable accesibility. This view emphasizes how to use variables in each category:

local
These variables are scratch pads for temporary storage or for local calculations.

regional
These variables work together to make one part of your program function correctly. That part can be viewed as a class or as a virtual machine. If a virtual machine, it is implemented as a module, package, unit, or simply as a separately compiled file. Either way you can think of each region as being the implementation of an object.

global
These variables help define the environment in which the program is running. Changes to them reflect changes in that environment. You have considerable freedom to choose what the word "environment" means, but your choice must make sense to your successors and should not generate many global variables.

shifty
These variables are the primary way data is passed between a procedure and its invoking code. They are also the primary way data is passed to a function from its invoking code -- the return value being the primary way data flows in the other direction.

A shifty variable in a procedure proc refers to data that must be imported to guide one specific execution of proc or must be exported by proc back to the environment that called upon proc.

A shifty variable in a function should only be used to provide data that guides one specific execution of the function.

These categories and the magical number are not the only things to consider when determining accesibility to variables, but they provide a framework in which you can think and talk about your decisions.

Copyright and Permissions

Copyright 1995,1996 by J Adrian Zimmer

This tip is distributed to individuals free of charge from the Software Build and Fix web site. All other distribution (including but not limited to internal distribution within an organization and mirroring of any kind) is forbidden without written consent of the copyright holder.

Return to the top of this document.

Context  Some Tips for Programmers    Author J Adrian Zimmer  
Dated: August 23, 1995; Revised: Oct 07 1998