Now I remember why PHP is so easy to hate…

(aka “why do my include/require/include_once/require_once files not work / seem NOT to be included, even though they are?”)

PHP has a mechanism for including files inside each other. The architects of PHP didn’t really think much about what they were doing with a lot of the core language features (witness the foolishness over Register Globals), and file import/include/require is a classic example.

This is one of the most fundamental features of the language, and it’s screwed up. It “seems” to work, so long as you write simplistic enough / small enough apps. The bigger your app, the more likely it is you’ll discover how poor this part of the language is.

In most languages, you have a distinction between

  1. importing (IMP)
  2. including (INC)
  3. internally-evaluating (INT)
  4. externally-evaluating (EXT)

(not all languages have all of the above – most have 2 or 3 of them)

The first one runs at the language-level, and is surrounded by all sorts of careful compiletime/runtime code to manage exactly what happens. The second one runs at the source-file-level, and simply “dumps” the contents of another file inside the main file. The last two are like the second, except that they “executes” the other file, and whatever that “execution” process outputs is what gets embedded – rather than the contents of the file. They differ from each other in that one executes using all the currently-in-scope info, the other executes without any access to currently scoped data.

PHP says “**** it, I can’t be bothered with writing code properly, I”ll just pretend all four of those are identical, and I’ll name the functions as if I meant INC, even though I don’t”.

Net result: the include/require/include_once/require_once statements in PHP don’t really work properly, because they have to do the work BOTH of importing AND of including AND of evaluating … all in one go.

(by the way … this is subtly documented in the fourth paragraph of the docs for include(), but there’s no warning in the other statement/function docs)

Here’s the “compromises” that have been made (with the use-cases in brackets):

  • Files are treated as plain HTML (INC)
  • …but also, simultaneously, as “executable, may contain PHP tags” (EXT)
  • When a file is brought in, it is only executed when that line of code is executed (INC, INT)
  • The scope of anything executed is the scope of the currently-executing function (INT)
  • Any (de facto) closures (in the form of defined functions) found inside the brought-in file … are shunted into the current environment (INC, IMP)

Several of those are clearly mutually incompatible already. It’s simply not possible to put all three of those features onto a single language-statement. Unfortunately … PHP has.

That’s just irritating – it means that basic PHP apps suddenly break when you add a line of code, because e.g. you’ve been using include() to do EXT, or IMP, and that was fine … but you just added some code that expose it’s incompatible behaviour by showing off some of its INT functionality (or vice versa).

But far worse is what happens when you throw “_once()” into the mix. In practical terms, it’s actually surprisingly easy to write code that doesn’t run correctly at runtime (and this is undocumented, too). Because “_once()” is, essentially, an undefined language feature.

Don’t believe me? Go read the docs. Find for me the point where they state the definition of:

“if the code from a file has already been included”

(or … don’t bother. Take my word for it – it’s not defined).

If IMP, INC, and EVAL were *not* squashed into some ugly mess like they are in PHP, this *would not be a problem*, because people would just work on the “obvious” interpretation of “has already been included”, i.e.:

you called “include*()” or “require*()” on this filename already

you called “include*()” or “require*()” on this filename AT THIS SCOPE already

What? TWO definitions? Well, yes, dear reader, because most humans will pick the first definition. It appears correct. And for IMP and EXT – by definition – it is correct, since they ignore scope. However, sadly, INC and INT explicitly require scope to be obeyed, and they *require* the second definition to be used.

Guess which definition PHP uses? Bearing in mind that PHP *treats the include() file differently* depending upon which scope it’s imported at…

Did you guess the second option? Ha! Wrong!

And so, in PHP, it’s easy to write something like this:

file 1:
if( !isset( $A) )
   $A = "not blank"

file 2:
run();
function run()
{
   require_once 'file 1';
   require_once 'file 3';
}

file 3:
crashout();
function crashout()
{
  require_once 'file 1';
  if( isset($A) )
    echo "this will never happen!";
  else
    echo "PHP will claim that A is not set, even though"
      . " it is explicit set in the file that is explicitly included above!";
}

I bashed my head against a wall with this (kind of) problem until I realised what was going on: PHP has a very poor definition of “a file has already been included”.

And what the heck can you do to workaround this? Actually, I’m not sure yet. I’ve only recently worked out how and why the runtime does what it does. I’ve not yet worked out a simple approach to PHP programming that avoids the above problems, beyond “never use an include* statement from within a function – never ever, under any circumstances”. At least that forces the behaviour to be predictable (NB: as soon as you do an include* from within a function, each and every one of your PHP scripts will potentially cease to work). But it’s very annoying to be actively prevented from ever doing an INC/INT/EXT.

16 thoughts on “Now I remember why PHP is so easy to hate…

  1. Ted Howard

    Free and open source is like politicians wrapping themselvesin the American flag and kissing babies. Always evalutate the tech when evaluating cost.
    It sounds like PHP is not well-designed and that causes significant problems, at least in this case. How’s the debugging model for PHP? Does it cover the basics: step in/over/out, break on thrown exceptions, conditionial breakpoints, easy/automatic watch variables.

  2. adam Post author

    Debugging PHP is generally “good, but not trivial to get working” – all the usual stuff is there, but you need a specialised IDE and have to setup the hooks into a runtime interpreter (IIRC not too bad to setup, but not easy for a novice, and kind of counter-intuitive for a language that is mostly “works out of the box”) … or you can use the easy-to-hook local interpreter – but then you probably aren’t even using the same vendor as your live site uses, let alone same version, which causes obvious problems of its own.

    Personally, I had so much hassle just getting very very basic execution / builds to run on PDT (the eclipse IDE for PHP which sadly is very poor at doing basic exec/build/debug) that I gave up getting the live-server hooks to work – so I’m currently using PHP without debugger.

    Most of the time, not much loss – because it’s a stateless language, so generally easy to debug. There are just a few edge cases like this one where a debugger would be handy. But, generally, if your code needs a debugger, you shouldn’t be writing in PHP anyway – your application has got too big for the language. IMHO!

  3. sidereal

    First: PHP is terrible. There is no doubt.

    But…

    I’ve written a few hundred thousand lines of php in fairly complicated architectures and never run into this problem, because

    “never use an include* statement from within a function – never ever, under any circumstances”.

    I’ve never seen the need. Each architecture has its own requirements, of course, so I’m not going to say that you can do whatever you’re doing better some other way, but it seems to me you’re trying to stretch include and require beyond their reasonable utility. I’ve always seen them as a mechanism to get all of your classes declared concurrently. And since I started using __autoload I rarely even use them for that.

    I guess this is the core question: is there a substantive difference between importing a file that contains php statements X (on the assumption that importing it will execute them) and importing a file that declares a function that executes php statements X? The latter is easy to setup, easy to debug, and works sanely, and I can’t think of any case where it doesn’t preserve the functionality of the former.

  4. adam Post author

    @sidereal

    AFAICS, you’re saying you’ll *force* every include* into an “import”. Which works fine … until the day you want to perform any kind of “embed”?

    e.g. what I suspect is the base case that PHP added this feature for in the first place – including a piece of HTML in all pages (e.g. a navbar).

    Going back to the problem cases I’ve seen – because I’m still mulling over how they came about – FYI most of the situations where I’ve used – or seen – an include* from within a fn are, I think, one of two cases:
    1. dynamic authorization to put “extra” buttons/text/html at a particular point inside another file – e.g. “if( user has admin rights ) … then: insert_additional_button_here” – which is extremely heavily used in any kind of dynamic content system – from CMS to Wiki to Forums to Blogs etc.

    In the simple case, you start by just INC’ing a file – but quickly you start to add logic to that file, and the INC becomes one of an INT or an IMP or some combination of all three.

    It’s that “evolution of the code breaks the whole thing” that I’ve seen a few times now and seen people tearing their hair out trying to understand.

    2. assembling all the dynamic content for a page first, then blitting/unifying them with a page template “at the last moment” to create the actual page.

    With simple templates, and simple content, and simple logic to choose amongst them … that works fine.

    But again as soon as you convert from “there’s only one template” to “choosing one of multiplate templates” or “complex logic for assembling the data”, you rapidly end up needing to invoke the include* from within a function.

    For 2. above, it seems to me that you can escape the problems by introducing a complex templating system. But then again, the moment you need to switch to a tempalte system, you should quit PHP, because your app is now – by definition – too complex for PHP … since PHP itself is a first-class templating language.

    i.e. if you find PHP can’t template for you well enough, you’re effectively rejecting the whole language for your problem domain (although many people don’t realise this implication, and go on to make horrendously large, unwieldy, massive systems in PHP which are a nightmare to maintain and grow, IME).

  5. adam Post author

    Also, FWIW … I’m more than happy for someone to shout “you’re an idiot! why aren’t you just doing X instead?” here – although I’ll then mutter and grumble about how PHP world seems incapable of sharing best-practices widely, and that such info ought to be in the API docs :).

    i.e. I don’t see what I’ve been doing to workaround this stuff as “correct”, it’s just the best workarounds I’ve come up with so far.

  6. sidereal

    Well, I’m not going to call anyone an idiot. Like I said originally, every architecture has its own needs and there’s nothing worse than some know-it-all in the comments who can tell you exactly how your system should be written sight unseen.

    But I’ve written and seen some pretty complicated templating systems that don’t require includes anywhere other than the global scope. And as I’ve said, I’ve mostly abandoned even those for __autoload and a well organized file system.

    I think I/we stumbled onto a best practice that probably avoided any of this, which is that we don’t really treat PHP like a traditional templating language in which the outermost scope is uninterpreted text with a computational scope inside of it. When you go from that to importing other templates, you run into the problems you’re running into.

    In most of my systems php is the outermost scope. The first script you hit starts with a ?php and ends when the script ends. Output-generation is initiated from within php, either by an echo for short chunks or by the (rarely appreciated) escaping out of php from within a function. Also, (and this is only slightly related), but I get out of global php score as soon as humanly possible.

    So to your specific example 1), I’d have something like:

    Page.php:

    ...
    ButtonGuy::streamAdminButton($myUser);
    ...

    ButtonGuy.php:

    ...
    public static function streamAdminButton($myUser) {
    if ($myUser->isAdmin()) {
    echo 'button foo';
    }
    }

    or if you need a lot of html:

    ...
    public static function streamAdminButton($myUser) {
    if ($myUser->isAdmin()) {
    ?>
    foo
    foo
    foo
    bar
    bar
    <?php
    }
    }

    Now you might say that this is avoiding problems by avoiding features, but I don’t actually think the features I’m abandoning (like hanging out in global scope) or at all valuable. Explicitly writing out your html makes for much easier debugging, much easier stacktracing, etc, etc.

    (Also, the chance that this comment is going to come out looking correct with all the code and escaping and no preview is tiny. Knock on wood :)

  7. Matthew Weigel

    I agree, the require* and include* functions have evolved to support different aspects of how other languages do include, import, and evaluate. As long as you stick to a few particular widely-used idioms or use cases, everything is hunky-dory (and I think that even applies on a per-file basis, e.g., this file is always imported, that file is always included as a header, etc.).

    However, once you wander outside those idioms (which probably sounds pretty reasonable if PHP is not your main development language, depending on what other languages you’re used to), things go sideways. I’ve written maybe a thousand lines of PHP, and my general rule of thumb is similar to sidereal’s – but it’s borne out of a general mistrust of most of PHP’s features.

    Frankly, I don’t much care for working in a language I don’t feel like I can trust. At least with Perl, there’s the communal expectation that libraries DWIM, and copious documentation for the core language (not so much for “everything on CPAN,” but that’s true of libraries for every language). Not that I’m recommending Perl for web development, just comparing the two languages. :)

  8. adam Post author

    @sidereal

    So … how do you handle global variables? (or do you just not have them?)

    I see globals as one of PHP’s core features (although sadly emasculated by being forced to declare *inside the function* (oh, yeah, like that’s the best place to do so. Sigh) which of them you intend to use in the function body)

  9. adam Post author

    @MatthewW

    I don’t think you’re safe even staying inside an idiom per-file, simply because “scoping” laughs at that precaution, and then proceeds to piss all over it.

    e.g. when doing embeds of chunks of file (e.g. a nav construct slightly bigger than just a navbar – perhaps a navbar with a login-box embedded inside it?) that include sub-parts that you want to only embed once, and then having some pages that try to embed some shared sub-parts, the problems come to the surface again, because the embed-checking was *not* done scope-aware, but the evaluation was.

    Unless you *also* put strict limits on what files that will be the targets of include* are “allowed” to do internally … but of course if you can make arbitrary, language-unsupported rules like that and be 100% sure that every code contributor and every maintainer will always adhere to them, then you probably don’t have any problems with writing everything in BCPL.

    (sorry. Cheap dig; I hate that language. I especially hate the author’s statements about the wonders of non-checked, typeless programming. ARGH!)

    EDIT: clarification: BCPL is typeless. So you can do anything. You can, theoretically, adhere to any arbitrary set of conventions and write nice, maintainable code. You can equally (and a lot more effortlesly) write munged horrors that make obfuscated perl go weak at the knees and start writing bad poetry.

    EDIT2: clarification of clarification: it’s really MR’s praise of the wonders of typelessness, and how good it is for programmers to be “allowed to do anything, without restriction” as the ultimate pinacle of computer programming, that drives me up the wall. My opinions on BCPL are undoubtedly (in some ways) unfair. I am also bitter that his very first lecture on programming contained non-compilable C code because he got his pointers muddled up *and he didn’t bother explaining C-style pointer syntax to the audience*. ARRRRGHGHHH!!!!

  10. sidereal

    So … how do you handle global variables?

    I pretty much don’t. I attach that data to shared objects with static references instead. In a typical webapp, I’ll have a single $session object, which I’ll get to through a static method, but which has a bunch of data hanging off of it. $session->user, $session->state, etc. This avoids the globals system completely. It’ll get instantiated at the top of an initialization script which all of my individual page scripts require, and which is incidentally my only typical use of require (or in this case, require_once).

    So something like:
    whatever_site_page.php:

    require_once('TheApp.php');
    streamPage();

    function streamPage() {
    $session = Session::getSession();
    if ($session->user->isRegistered()) {
    ...
    }
    }

    TheApp.php:

    Session::initSession();
    Session::initUser();
    ...

    Session.php:

    class Session {
    private static $session;

    public function initSession() {
    $session = new Session();
    ...
    return $session;
    }
    ...

  11. sidereal

    Whoops, that should be $this->session in the initSession method, which should also be static.

  12. adam Post author

    OK. In that case – and I’m not being facetious, promise! – why do you use PHP at all?

    (instead of, say, .NET or Java, both of which would appear to be almost zero extra typing/design work with that architecture, and yet of course both have tonnes of framework/runtime, performance, and library advantages)

  13. Matthew Weigel

    @adam: different language preference? I know *I* prefer a weakly-typed language for web applications in particular, and a language with a lot of built-in and easy-to-use text-processing facilities is a big win as well, which has led me to generally stick with PHP and Perl for web applications even when I had other options (and even though I really don’t like working in PHP at all).

  14. Jon

    It’s been a long time since i’ve done much serious PHP way back in PHP3 but back then require and include seemed to have much better definitions.

    Require was almost textbook INC (using your terms). The require statement was replaced with the text of the given file and compiled on its own merits. This occured _whether or not that line of code was executed_ .

    Include was either INC or INT (depending on the details of your definitions). Essentially it did the same thing however if you executed that (original) line again it would re-evaluate the file to be included (thus allowing you to include any number of dynamically different files).

    To be honest i don’t think I ever built php files that would attempt to close their parents however based on the manual in php-3.0.18 i’d say it’d work with require but not with include.

    In both cases _once had the same basic meaning as the #ifndef/#define idiom in C. If i’ve already imported it once, it silently does nothing.

    For most practical purposes I used require_once for things i’d use #include for in C. Config and library files get “require_once”d. Otherwise it got “include”d.

    After having a spark of intuition I think i know what’s going on.

    When you run file 2 The first require is running and the second one is not (as makes sense… you’ve already required the file once) however because file1 is being run in the scope of “global->run” and you’re checking $A in the scope of “global->run->crashout” naturally $A doesn’t exist in File 3.

    There are two workarounds that i can see working (and indeed do for 5.2.8).
    – use Global. (bleah… global variables are the devil)
    – change the call to crashout in file 3 to take $A as a parameter in both declaration and call.

    I don’t know how/if this applies to what you’re really doing.

    (GOD i can’t believe i’ve ended up posting technically in support of PHP… hand me my rifle theres some aerial bacon flying overhead)

  15. Jon

    So to sumarise… your problem with *_once is occuring because *_once is the natural “import this file once globally” whereas the results of importing the file are applied locally.

    That would imply a third workaround. Use include and your own handrolled version of _once that works locally. (actually in the case of your example you already have with your if (!isset($A)). Using include will work fine)

  16. sidereal

    OK. In that case – and I’m not being facetious, promise! – why do you use PHP at all?

    It’s easy to run natively in apache, it doesn’t have a compilation step, and it’s pretty fault tolerant (meaning most problems are only warnings and it makes a good go of continuing execution rather than just crapping the page, which is important for most web apps). Also, the difference between running a script through the webserver and running a script from the command line are pretty minor (as opposed to, say, Java, where most web frameworks have an intrinsic and profound assumption about it being run through a webserver), which lets you do some cool things with generating static pages and so on. There are some other minor reasons.

    To go almost completely off-topic, I think most of the purported benefits of more ‘agile-friendly’ languages are mistaken. PHP’s weak typing is of almost no benefit to us but has forced 4 hour debugging bouts trying to figure out why true == “true” and true == 1 but “true” != 1, and it is almost the entire foundation of the SQL Injection industry. Same with closures. Developers get all starry-eyed but don’t mind that they tend to radically complicate debugging, which reduces developer productivity, which is sort of the opposite of agile.

    Rant off!

Leave a Reply

Your email address will not be published. Required fields are marked *