Alloca For New And Delete - Things You Probably Should Not Do In C++

Chew'n on a stick - in a C++ kind'a way!
Don't do this at home - replacing _all_ new and delete calls with stack based allocation?


Yep - sounds stupid and it probably is. However, I cannot help but think it might have some very specialist applications. Malloc is so very, very, horribly slow! Wouldn't it be great if we did not have to be forced to call it just because we want to use the stl or some other library code? Yes - it very much would be nice and it is very much possible as long as you do not expect your code to work without masses and masses of very detailed testing.

#import "../taskParser.h"
#import "taskAllTests.h"
#import "alloca.h"

volatile static bool localAlloc=false;
void* operator new(size_t size)
{
  cout << "Asking for " << size << " bytes " << endl;
  return localAlloc?alloca(size):malloc(size);
}

void operator delete(void* what)
{
  if(localAlloc)
  {
    cout << "Wanted to delete " << what << " but did not" << endl;
    return;
  }
  cout << "Deleting " << what << endl;
  free(what);
}

namespace taskUnitTest
{
  using namespace std;

  class testNewADeleteA : taskUnitTest
  {

  public:
    testNewADeleteA()
    {
      addUnitTest(this);
    }

    const taskUnitTestResult run()
    {
      puts("Entering NewADeleteA");
      localAlloc=true;
      string* test = new string("12345678901234567890");
      size_t l=test->size();
      delete(test);
      localAlloc=false;
      taskUnitTestResult ret = taskUnitTestResult(toString(),l==20,"Dummy test which should pass!");
      puts("Leaving NewADeleteA");
      return ret;
    }
    virtual const std::string toString()
    {
      return "Test NewA and DeleteA";
    }
  };
  static testNewADeleteA test;
}


In the above code I have replaced new and delete with custom operators. This replaces them throughout the program. I compiled this on my Mac using gcc 4.2. I am running the test code using the unit test framework from Task. This code is not multi-threaded because of the single global switching on stack based new/delete. The program just fails to run at all with stack based new/delete turned on for all the program. This is easy to explain in that the guts of the unit test code uses new to create objects which last even after their allocating stack frame as been reclaimed!

However, if we localise the use of the stack based allocation to just that string in the  taskUnitTestResult then we find it works just fine. This might be a bit of a shock. There is no guarantee that it will work. If this technique were to be used in production code, every single code path through the stl code would need to be unit tested including exceptions. Please note that now only do we call new directly in the example but std::string then calls it to allocate space for the chars which are stored inside the string. Thus, but replacing new and delete we are replacing two calls to malloc and two calls to free for each string instance.

Is it worth it?
YES - OH MY GOODNESS ME - YES!

One thing a lot of C and C++ programmers do not realise is that malloc and free are stupidly slow. They are very, very slow when running single threaded and even worse - as in mind blowingly slow - when the system is running multi-threaded. If there is one way to make C++ run faster it is not to use malloc, which normally means don't use new. 

Don't believe me? Well - below are some timings using Boost auto_timer. The test code (at the bottom of this post) is very very favourable to malloc and still the alloca solution runs 4.8 times faster!

Test using default new/delete:
About to run 4 unit tests. PASS: Lexer stage 1 simple (Simple white space splitting and character agregation) PASS: Lexer stage 1 special (Simple white space splitting and character agregation for special tokens) Entering NewADeleteA 1.632592s wall, 1.630000s user + 0.000000s system = 1.630000s CPU (99.8%) Leaving NewADeleteA PASS: Test NewA and DeleteA (Dummy test which should pass!) PASS: Test test (Dummy test which should pass!)
Test using default alloca new/delete:
About to run 4 unit tests. PASS: Lexer stage 1 simple (Simple white space splitting and character agregation) PASS: Lexer stage 1 special (Simple white space splitting and character agregation for special tokens) Entering NewADeleteA 0.346338s wall, 0.340000s user + 0.000000s system = 0.340000s CPU (98.2%) Leaving NewADeleteA PASS: Test NewA and DeleteA (Dummy test which should pass!) PASS: Test test (Dummy test which should pass!)

Conclusion - maybe this sort of trick is worth looking at for very specialist code. A better solution would be to use a stl allocator rather than just replace new and delete - but this way was more fun!

The timing test code:
#import "../taskParser.h"
#import "taskAllTests.h"
#import "alloca.h"
#include <boost/timer/timer.hpp>

volatile static bool localAlloc=false;
void* operator new(size_t size)
{
  return localAlloc?alloca(size):malloc(size);
}

void operator delete(void* what)
{
  if(localAlloc)
  {
    return;
  }
  free(what);
}

namespace taskUnitTest
{
  using namespace std;

  class testNewADeleteA : taskUnitTest
  {

    void doIt()
    {
      boost::timer::auto_cpu_timer t;
      localAlloc=true;
      for(int i=0;i<10000000;++i)
      {
        string* test = new string("12345678901234567890");
        delete(test);
      }
      localAlloc=false;
    }
  public:
    testNewADeleteA()
    {
      addUnitTest(this);
    }

    const taskUnitTestResult run()
    {
      puts("Entering NewADeleteA");
      doIt();
      taskUnitTestResult ret = taskUnitTestResult(toString(),true,"Dummy test which should pass!");
      puts("Leaving NewADeleteA");
      return ret;
    }
    virtual const std::string toString()
    {
      return "Test NewA and DeleteA";
    }
  };
  static testNewADeleteA test;
}