Var-Scoping, Private, and Public Data in CFCs
This post by Steve Nelson and this follow-up by Nat Papovich over at the Webapper blog has generated a lot of comments about the method-local (var) scope, the THIS scope, and the VARIABLES scope within a CFC. The result? It appears there's still a lot of confusion on these three issues.
I fought the long and difficult battle of getting these straight a while ago, and in my complacency I assumed that everyone else had as well.
Nope.
So I thought I'd post my own thoughts since I've already written a small book in the comments over at Webapper. I don't want to debate the fact that we have to declare var-scoped variables. Adobe is aware of this issue and if they can fix it or find an easier way to handle it without breaking too much existing code then I'm sure they will. I want to go more basic here and just make sure everyone who reads this understands these three absolutely critical elements of developing with CFCs.
First, the THIS scope. Inside a CFC, the THIS scope is public. That means any external code can access and modify any data in the THIS scope. Which is why the THIS scope is so dangerous. One of the most important jobs of an object is to encapsulate its internal implementation and state. The THIS scope breaks encapsulation in a terrible way. You're basically opening the guts of your object to the world, not only to see, but to change. Bad, bad, bad! For these reasons, most CF folks avoid the THIS scope. And unless you are very sure you know what you are doing and have a very good reason, you should too. Honestly, the only time I've seen a good use of the THIS scope is in a Transfer Object, which is nothing but a typed structure anyway.
Next, the VARIABLES scope. The VARIABLES scope is very similar to the THIS scope, except it is private. This means external code can't access or modify this data. This is a Good Thing. By hiding your data in a private scope, you force external code to access your data by using the public methods of your CFC. This is also called the API for the CFC. Forcing client code to use your API leverages encapsulation, and gives you a lot more freedom to make changes to how your CFC does things and stores internal data. As long as your API methods remain the same, you are free to change the internal implementation of your CFC as often as necessary.
Finally, the VAR scope. Within a method, if you want to declare a variable that only should exist for the duration of the method call, you must var scope it at the top of the method body like this:
<cfset var user = "" />
I think most people would agree that it's annoying but that's irrelevant. You HAVE to do this. And this is probably one of the most critical things anyone can know about using CFCs, so I'm putting it in bold: If you don't var scope a variable, that variable is set into the VARIABLES scope of the CFC. Let this become a mantra.
To put it into code, this:
<cffunction name="myMethod">
<cfset userName = "Brian" />
</cffunction>
Is the same as this:
<cffunction name="myMethod">
<cfset variables.userName = "Brian" />
</cffunction>
Depending on the situation, this can just be bad, or it can be horrible.
The bad: if the CFC instance only exists for one request (called a "transient" or "per-request" CFC), a non-var-scoped variable can cause unexpected behavior if the variable is var scoped in some methods and not in others. It's also very possible for your CFC to be transient in one context or application and persistent in another. So you should assume the worst and var scope it even if you think your CFC will only be used in a transient way.
The horrible: if the CFC instance is cached in the session or especially the application scope, non-var-scoped variables are almost guaranteed to cause crazy bugs that will seem impossible to track down. This is because multiple server threads can be using the CFC instance at the same time.
So to summarize: DON'T use the THIS scope unless you have a very good reason to. DO use the VARIABLES scope to keep your CFC's data private. And DO VAR-SCOPE all method-local variables no matter what.




THANK YOU!!!! I also though a post like this was unnecessarily obvious - until I saw Steve's post and the associated comments.
Required reading. If you know it, no harm, no foul, 30 seconds wasted. If not and you've been *wondering* why you were getting those strange occasional bugs in your singletons, probably the most valuable post of 2007 :->
Granted, I am not a huge OOP guy, so I do not speak from experience (maybe that is an advantage, maybe not)... but one of the HUGE benefits of the THIS scope is that it can be accessed without a method call. If you have an object that is heavily traffics and you have control over the whole coding environment, than accessing the THIS scope can actually save a lot of processing time (I have seen this happen, it is not a myth).
So anyway, I don't want to argue about it... I just want to tell people that THIS can be cool, she's just as nice and pretty as the rest of the scopes, don't write her off.
Though I must admit, I've built many high-traffic sites using OO techniques and never found the fact that I'm calling methods to manipulate an object to be a performance issue. There is a simple rule that states that there can only be one bottleneck in an application at a time, and in CF (or any web apps in general) this bottleneck is almost always the database structure/SQL code being used. Worrying about using the THIS scope instead of calling a method on a CFC would be way,way down on my list of probable bottlenecks in an application.
Sort of off topic at this point from the direction of the latest comments, but one thing that I find less than appealing with var scoped variables (aside from having to explicitly define them) is that you must make the declarations before any other processing or variable declaration occurs in your method. When you have a large method with lots of looping or such, you could end up having quite a few var scoped variable sets, which imo is ugly. I've found that creating just 1 var scoped structure to which I add key to, negates the need for multple declarations, for example:
<cfset var local = structNew()>
...Do a bunch of stuff ...
<cfloop from="1" to="10" index="local.i">
......
</cfloop>
It’s not a new concept but I thought I would share it anyway, incase other developers feel the same way I do about having multiple var declarations.
All true, but IMHO what this really means is your method is doing too much and should be refactored. Lots of small, cohesive methods always wins over fewer big, complex, procedural methods.
I'm mostly inclined to agree with you about the THIS scope, but I think that if one treats it as a "write once" property of the object it's OK. What I mean by this is that I have no problem exposing some stuff to the calling code via the THIS scope (for expediency's sake), but THIS-scoped variables only EVER appear on the left-hand-side of an expression within the CFC itself, so there's no(*) chance that the calling code can bugger up the CFC instance.
That said:
1) I hardly ever have had call to do this;
2) It'd be so much nicer if they could be set as read-only / final / something.
--
Adam
(*) Well: "hardly ever". I'd be thinking long and hard about using this technique if the CFC was ever destined to be cached in some persistent scope.
1) var variable1 variable2 variable3 = ""; // or something like that
2) getting rid of the restriction of declaring var scope variables at the top of the method.
Excellent and timely post which will surely help people to understand the true nature /limiation/features of various scopes within a CFC. Just to reinforce what you have already said, here is a PDF document by Ray Camden listing all CFC scope and their purpose.
http://ray.camdenfamily.com/downloads/cfcscopes.pd...
I can (and will) go on about this but the bottom line is the same. I disagree with the chant of "no this" and think people should really think about it first before joining in.
I think people from a more procedural mindset often focus on the data an object contains (and the related bean-pattern-mandated getter and setter methods) at the expense of the BEHAVIOR of an object.
Say I have a Product object. Within the object is an instance variable named price. What is the harm in letting people directly access (not even change though that is even worse) this instance variable? I would argue there is great harm, because the client code is bypassing the API (public methods) of the object and looking directly at internal state and internal implementation.
Right now my Product may only have a simple instance.price variable. So using THIS and bypassing my getPrice() public method might not seem so bad. But in the future, when my getPrice() method does all sorts of crazy tiered discounts and bulk discounts based on the price variable, very possibly using other objects to make these determininations at runtime, if my client code isn't going through the getPrice() method, I'm going to be in trouble.
And this doesn't even get into the arguably much more dangerous issue of letting external code CHANGE the state of my object directly. At least using private setters I can make instance data read-only.
Basically there are a whole lot of reasons to not use the THIS scope and exceedingly few reasons to do so. I'll never say "NEVER" use the THIS scope, but I will say I think in almost all situations it is a really bad idea.
We're talking code here. And again, the question is what code is changing what state, when, where and how is this different than most other changes to a state?
I'm writing a blog entry as we speak on this. If we step away from the 'someone' and look instead at code the question becomes what code can touch the object, who can alter the code and does all this really matter.
The bottom line is that the effort and overhead to protect the state of an object and force client code to go through the object's API is miniscule. If my previous example of the perils of directly accessing instace data, especially in terms of future change, doesn't make you reconsider your position, I'm not sure what will. You're essentially arguing against encapsulation, which is the foundation of every OO language in existence and has been lauded by virtually every great programmer you can find.
I suppose that until you experience the pain (and I have on too many occasions) it seems rather nebulous. Using my previous example, if you don't use myObject.getPrice() and instead use myObject.price and later you need a lot more complex behavior when asking an object for its price (a very likely possibility), that is when the cost of directly accessing instance data will become apparent.
So again, the cost of forcing external code to use the object API is virtually nothing, but the benefit is very great.
Like a lot of best practise stuff, the need for it isn't seen on the first pass of writing code. It's when you're making changes, doing maintenance or ading outside calls (3rd party apps) to it that you find the need.
One example is that if you have a shopping cart object, and various parts of your shop muck about with the data directly in the this scope, then if later on you need to make that cart much more complex (perhaps adding tax, promotion calculations etc) then you may well have to edit all the places in the shop that talk to the cart.
If instead you'd had the shop calling the getters and setters within the cart, you can simply make the changes within the cart object, and the rest of the shop would get the results you want it to.
-- "Someone" is any external client code or system. ---
If you have a component which forms part an API which is to be used by other client code on other projects, by other people on other systems etc. then it becomes even more critical to shield the client code from the internal implementation - they *should* be using the published public api - developing code which is strongly tied to the internal implementation of another component can lead to maintenance nightmares when that code/library/api is updated - particularly if its an open source library
Your examples make the case for setting and getting based on 'if we want to change later' but what of cases where we only expect the data to be exposed in a simple fashion, such as when the data is about the object itself? The application.cfc uses the this scope to contain data about the application itself, such as the application name. If I have a cf-talk instance of a list object, what's wrong with having the listid or listname exposed in the this scope? There is a need and the information describes attributes of the object itself. would that not fall under the oo paradigm?
And to follow up on your statement about it being easier to break encapsulation later, it doesn't seem like that would ever be done based on the same rules that make us want to use encapsulation in the first place.
Let me bring this discussion over to Javascript as I feel that I highly proficient in Javascript, at least as opposed to OOP in ColdFusion. Now, let's look at the document object model. Each DOM object is an object right? It has properties and methods that are available for it.... so how come each of these things available to the DOM not all method calls? Why is there:
DOM.parentNode
DOM.previousSibling
DOM.nextSibling
DOM.nodeType
DOM.childNodes
... but, then on the flip side, there ARE methods like:
DOM.getElementsByTagName()
DOM.appendChild()
... Is the belief that Javascript is just a poorly thought out OOP language? Maybe my problem with the whole bashing on the THIS scope is because I feel I work with a lot of really successful situations that do not use it.
Now, you might argue that things like parentNode are read-only and that javascript will throw an error if you try to modify it. I would argue that that is a moot point. If you are not supposed to modify a THIS-scope variable, and then someone does it anyway, well.... that's not a programming problem, thats a problem of incompetence of the programmer???
And what about constants in Java objects. I use Java objects that always have constants for reference OBJECT.CONSTANT_VALUE. How come these are not accessed as static class methods?
(I am NOT attacking with these questions... I am very curious as to why the inconsistency across languages and situations).
When I am using <cfcomponent> to create objects, as opposed to a library of functions, and I use the THIS scope, it stops feeling like an object, and more like a structure which just happens some methods you can access. And, to me, this just doesn't feel right.
That said, you will note that I have been saying "almost never" with regard to using the THIS scope. There are times (look at at a Reactor Transfer Object) where it is probably OK. I just think those times are very few and far between.
Application.cfc uses the THIS scope and it was a bad decision on Adobe's part, IMO.
Finally, there may indeed be times when you want to break encapsulation later. It is possible. But the same care should be take in deciding to do it as was put into making things encapsulated in the first place.
@Ben: Yes, JavaScipt is a horribly thought out OOP language. More to the point, it is a non-OOP language that had OOP features bolted onto it over a very long period of time.
Regarding Java constants, these are more acceptable because they are almost always marked as FINAL. That means once they are set, nothing else can change them.
You said your "page gets slowed down by the number methods calls that takes place", which is true, but only very slightly. Unless you have debugging with Report Execution Times turned on. I'd argue that the vast majority of the time, this tiny performance hit is worth the flexibility and ease of maintenance that using API methods provides.
And it is interesting and timely to mention consistency. Another reason to use methods as the norm is consistency. If you have some data that you want to expose directly (the THIS scope) and some that you need to require client code to access via a method (anything subject to change in the future), you now have an inconsistent interface to your object. Again, there is very little overhead in just having methods that client code can use.
Thanks for the great post - I'm just getting started in Mach-ii and OO so all these discussions are very informative!
I have a hard time believing that the overhead of 363 method calls added up to more than a few milliseconds, but if it did, and performance took precidence over all other design elements, including future maintainability, then you might have a perfectly valid case. I shy away from premature optimization like the plauge. And when optimization is required, I focus the majority of my effort on the most common performance hog: the database design and the SQL.
Not trying to pounce on you Michael. We all know you're a sharp guy and love all that you do for the community. I just feel strongly that encouraging the use of the THIS scope is a dangerous recommendation, especially to new and intermediate developers. If you really know what you're doing and truly have a valid need to use it, then go for it. I'm just saying those cases are going to be extremely rare.
I think everyone respects you and the work you do, so I wouldn't take this stuff personally. It is like Mixins - they can be a great tool but it is important to teach the dangers as well as to teach the possibilities. Please keep up the great work and the great postings - I always learn something from your postings and would hate to miss that just because people are clarifying the downside of using your specific tools in more general situations.
I think biggest issue is that you are in the business of doing something that most programmers should not be focusing on. You help people to optimize performance of systems and as a general approach to developing that is a horrible starting point, but when you have a creaking server and bring in Mr. D to do his stuff, it is *exactly* the right thing to focus on.
That said, I'd drop down to Java before you'd take my method calls from my cold dead fingers - but that's just me :->
I figured as much.... I still love it though :)
@Brian [Another reason to use methods as the norm is consistency]
I think this is one of the biggest selling points (at least in my young-OOP mind). I am a fan of consistency and use it everywhere from my naming conventions to my white-space usage. So why no with variable access??? So yes, I agree, consistency is a HUGE selling point.
E.g.
<cfargument name="foo" type="struct" />
<cfset var foo = structNew() />
is invalid....one would think that these would be completely separate in terms of accessing them...