Runtime Detection Inside Constexpr Functions

If you're just looking for the solution/implementation you can skip the background and motivation by clicking here.


Background

Note, before I begin, there's a proposal by Daveed Vandevoorde that talks about a constexpr() operator to tell between constexpr and runtime context. He provides additional background and motivations.

constexpr is a really powerful tool since it allows us to move computation to compile time rather than having to compute it at runtime.

The advantage is that the same code can be used at both runtime and compile time. However, as a result of this, we have more restrictions on constexpr code. The biggest restriction being that we can't modify the global state (ie a pure function). On the other hand, this restriction goes away when a function is called in a non-constexpr (ie at runtime) context, we can still call runtime functions assuming a constexpr branch never enters it.

Consider the following example of a factorial function:

constexpr int factorial(int n){
  if ( n < 0 ) {
    std::cout << "Got an invalid N: " << n << std::endl; 
    throw std::invalid_argument("N must be >= 0");
  }
  else if ( n == 0 ) return 1;
  else return n * factorial(n-1);
}

constexpr int a = factorial(5);

// This will be a compiler error
// constexpr int b = factorial(-5); 

// This will be a runtime error but message is printed
int c = factorial(-5);

We can still call std::cout in the factorial function since if n < 0, we will get a compiler error. If we dont know the value (ie at runtime), the runtime version of factorial will be implemented, which is just like a regular function that allows for std::cout.

The problem arises when we want to do something at runtime in a constexpr context.

constexpr int factorial(int n){  
  std::cout << "Computing for N: " << n << std::endl;

  if ( n < 0 ) { throw std::invalid_argument("N must be >= 0");
  }
  else if ( n == 0 ) return 1;
  else return n * factorial(n-1);
}

This will be a compiler error since we can't call std::cout in the constexpr context.


Motivation

Now some might say that this is the advantage of a constexpr function since no state can be modified and that it should be implemented as two separate functions. There are good reasons for why you would want this, but this does have some cons. For example, you'll have to maintain pretty much identical code in two implementations, which can be a maintenance nightmare. For this post I want to focus on use cases where having a constexpr function with two different code paths is advantageous:

smart_asserts in constexpr functions:

Having the ability to assert in constexpr functions that fail at compile time or runtime is a really strong advantage. If we have a constexpr/compile time value, we can error at compile time, or if we have a runtime value we can assert at runtime.

Use more efficient methods at runtime than at compile time:

Runtime methods can make use of intrinsics or caching results leading to better runtime performance, but these features can not be used at compile time. As a result, using only constexpr-safe code is slower.
A better way to look at this case is to consider mutable members inside a class. Since mutable variables only affect internal state it is safe to modify from the outside point of view. Similarly, as long as the run-time path only affects the internal state it should be considered constexpr from the outside point of view. Consider our factorial example, we can cache the result of factorial every time it's called and save a lot of compute.

int factorial_cache[30] = {0};
constexpr int factorial(int n){
  if ( n < 0 ) { 
    throw std::invalid_argument("N must be >= 0");
  }
  else if ( n == 0 ) return 1;
  else {
    if ( in_constexpr() ) { 
      return n * factorial(n-1);
    }
    else {
      // Since we're in runtime, we can cache results.
      if ( factorial_cache[n] == 0 ) {
        factorial_cache[n] = n * factorial(n-1);
      }
      return factorial_cache[n];
    }
  }
}

Daveed also has a good example of this with his power example.

Print or log something in constexpr functions:

Printing messages is really helpful when debugging and so being able to print in constexpr functions is another advantage. Furthermore, we can then implement some sort of function call tracing, which can be helpful if you want to sample profiling information.


Implementation

To preface, this is not meant to be a proper solution (this should be trivial to implement on the compiler level) but a potential one with what we've got right now. So far I've been able to get this to work with C++14/17 with GCC 5+ and Clang 3.9+ and on x86 only.

The solution is actually quite short but lets go through the building blocks. The solution works by relying on having a constexpr function that returns a flag (lets call it IS_CONSTEXPR_FLAG), which is implicitly converted to a boolean. At compile time, this function is evaluated to true and at runtime, a function call is made to get a result. This is better showed with an example.

#include <iostream>

#if defined (__GNUC__) && !defined(__clang__)
  #define no_opt optimize("O0")
#elif defined (__clang__)
  #define no_opt optnone
#endif

constexpr uint32_t IS_CONSTEXPR_FLAG = 0x5EEEEEFF;

template<typename T>
__attribute__((no_opt))
constexpr auto in_constexpr_impl(T) {
  return IS_CONSTEXPR_FLAG;
}

template<typename T>
constexpr auto some_transform(T &&v) {
  // This unused_temp is necessary so that the compiler
  // actually makes a a function call to in_constexpr_impl
  int unused_temp = 0;
  // In the case of a runtime call, in_constexpr_impl is 
  // a function call
  if ( in_constexpr_impl(unused_temp) )
  {
    return v + 10;
  }
  else {
    return v * -1;
  }
}

// This is noinline just for making the assembly easier to read
__attribute((noinline)) 
void print(int a)
{
  std::cout << a << std::endl;
}

int main()
{
  constexpr int a = 11;
  volatile int b = 2;

  constexpr auto x = some_transform(a);
  auto y = some_transform(b);

  print(x);
  print(y);
}

To confirm the code is behaving as expected, we need to look at the assembly (Side note, godbolt is amazing!). I'm going to be looking at the assembly generated from gcc but clang effectively generates the same thing. The interesting part of the assembly to look at is in the following section:

auto in_constexpr_impl<int>(int):   # We can see this is a function
  mov DWORD PTR [rsp-4], edi
  mov eax, 1592717055               # This is our IS_CONSTEXPR_FLAG
  ret

main:
  push rbx
  xor edi, edi                      # Initialize the unused_temp
  sub rsp, 16
  mov DWORD PTR [rsp+12], 2         # Initialize b

  call auto in_constexpr_impl(int)  # Call the test
  test eax, eax                     # Compare eax (returned from above)

  mov ebx, DWORD PTR [rsp+12]       # Load b into ebx (what end up as y)
  jne .L17                          # If in_constexpr_impl returned true
                                    # go to the code that returns v + 10
  neg ebx                           # Otherwise if false negate the value

.L15:
  mov edi, 21     # x has been resolved and is pushed here to be printed
  call print(int) 
  mov edi, ebx    # b is pushed here to be printed. 
  call print(int)
  add rsp, 16

.L17:             # This is the true branch of if in_constexpr_impl
  add ebx, 10     # Add 10 to b
  jmp .L15        # Go back to main

We can see from the assembly that x has been resolved at compile time and print is called directly with 21, while the value of y depends on the result of in_constexpr_impl().

Note this is at O3 optimization level since without any optimizations, the function call to in_constexpr_impl might not get inlined even though it can happen at a higher optimization level. Since O3 is the highest, we guarantee that the function has had maximum likelihood to get inlined.

Running the program, we get the following output:

> g++ -O3 -std=gnu++14 source.cpp 
> ./a.out
21
12

So why do we want to do this? We want this since we can change the result of the function after the binary is compiled! (I did say I've only tested this on x86 architecture didn't I?) After compilation we can replace all instances of our IS_CONSTEXPR_FLAG (of which there should only be 1) with 0. As a result when we run our application, the function actually returns 0 not IS_CONSTEXPR_FLAG. >:)

Now let's see what happens when we do a little post-processing on the binary.

> g++ -O3 -std=gnu++14 source.cpp 
> sed -i 's/\xFF\xEE\xEE\x5E/\x00\x00\x00\x00/g' a.out
> ./a.out
21
-2

There we go! Now that we've replaced our flag with 0, in_const_expr_impl now returns 0 and we call the second branch for evaluating y and so we get -2. Though this works, in the code we end up declaring a meaningless variable and taking an extra branch. We can make use of the pre-processor and the compiler __builtin_expect to come up with slightly nicer syntatic sugar version:

#define CAT(x, y) x##y
#define CAT2(x, y) CAT(x,y)
#if 0
// A nicer C++17 approach
#define in_constexpr() \
  int CAT2(__unused, __LINE__) = 0; \
  __builtin_expect(in_constexpr_impl(CAT2(__unused, __LINE__)), 0)
#else
// A C++14 approach
#define in_constexpr()  bool CAT2(canary, __LINE__) = true) {} \
  int CAT2(__unused, __LINE__) = 0;                            \
  if ( __builtin_expect(in_constexpr_impl(CAT2(__unused, __LINE__)), 0)
#endif

template<typename T>
constexpr auto some_transform(T &&v) {
  if ( in_constexpr() )
  {
    return v + 10;
  }
  else {
    return v * -1;
  }
}

Note that we can also do the flag replacing at runtime by modifying the text segment of the program to replace the flag in memory - but more on that in a different post.


Examples

Implementing a smart_assert

Since we can now detect if we're in a constexpr, we can now write a cleaner assert where we don't have to throw an exception. (Note that we still throw an exception so that in the compile time case we get a compiler time error).

template<int LINE> struct assert_line {};
template<typename...Args > 
constexpr inline bool assertion_failed(Args ... args) { return true; }

#define smart_assert(expr, message)\
  if ( in_constexpr() ) {\
    if ( !(expr) ) { \
      throw assertion_failed(assert_line<__LINE__>()); \
    }\
  } else { \
    assert((expr) && message);\
  }

Updating factorial

We can now write our cleaned up factorial function making use of a smart_assert, caching our results and printing!

int factorial_cache[30] = {0};
constexpr int factorial(int n){
  smart_assert(n >= 0, "N >= 0");

  if ( n == 0 ) return 1;
  else {
    if ( in_constexpr() ) { 
      return n * factorial(n-1);
    }
    else {
      std::cout << "Calling factorial " << n << std::endl;
      // Since we're in runtime, we can cache results.
      if ( factorial_cache[n] == 0 ) {
        factorial_cache[n] = n * factorial(n-1);
      }
      return factorial_cache[n];
    }
  }
}


int main() {
  volatile int a = 5;
  volatile int b = 6;
  std::cout << factorial(a) << std::endl;
  std::cout << factorial(b) << std::endl;

  constexpr int c = factorial(3);
  // Compiler error!
  // constexpr int d = factorial(-5);

  std::cout << c << std::endl;
  // std::cout << factorial(d) << std::endl;
  return 0;
}

Running this we get the following output:

# Without 
> g++ -O3 -std=gnu++14 factorial_final.cpp
> ./a.out
120
720
6

# With 
> g++ -O3 -std=gnu++14 factorial_final.cpp
> sed -i 's/\xFF\xEE\xEE\x5E/\x00\x00\x00\x00/g' a.out 
> ./a.out
Calling factorial 5
Calling factorial 4
Calling factorial 3
Calling factorial 2
Calling factorial 1
120
Calling factorial 6
Calling factorial 5
720
6

Caveats

Don't call it with an if constexpr

It is important that you don't call this with an if constexpr()

if constexpr (in_constexpr())

Calling it like this will mean the in_constexpr is forced to evaluate at compile time and thus will always be true and can't be switched to false.

Compiler can call a function in a constexpr context for the const values it knows at compile time

This is immediately obvious when you look at the assembly generated. Revisiting our some_transform example,

int main() {
  int a = 11;
  const int b = 2;

  auto x = some_transform(a);
  // This is evaluated to 12 at compile time!
  auto y = some_transform(b);

  print(x);
  print(y);
}
main:
  ... # Do stuff to calculate x and push           
  call print(int)
  mov edi, 12 # y has been compiled to 12!
  call print(int)
> g++ -O3 -std=gnu++14 caveat.cpp
> sed -i 's/\xFF\xEE\xEE\x5E/\x00\x00\x00\x00/g' a.out
> ./a.out
-11
12       # We wanted -2!

Running this with clang

> clang++ -O3 -std=gnu++14 caveat.cpp
> sed -i 's/\xFF\xEE\xEE\x5E/\x00\x00\x00\x00/g' a.out
> ./a.out
-11
-2

Clang actually doesn't call the constexpr version of the some_transform function and so we get the expected output, even though on gcc we get a different behavior! This is because gcc knows at compile time that b = 2 and so it calls the constexpr context. This is why you need to make sure you aren't relying on the side effects 100%. If you have a faster algorithm at runtime, you need to make sure it returns the same result as it would at compile time and vice versa, otherwise you're going to have unexpected behaviors.

Not very portable

This approach is probably not going to work on other architectures where if the flag is stored as something different. x86 has a nice assembly mov instruction that allows us to load the entire flag in one go. (Ie the flag is part of the machine code) As a result, we can just search for the flag in the binary and replace it. Some architectures might need to load the upper and lower halves separately and as a result, searching and replacing is not trivial but is still doable. A potential solution is to compile down to some form of assembly, replace the flag (or call instruction), and then convert the assembly to machine code.


Conclusion

To finish off, this was just one way to achieve an in_constexpr() functionality and is probably not the best way to go about it, but it is one approach. By making a constexpr in_constexpr_implfunction that is not inline-able, returns true at compile time and is a function call at runtime, we can get an in_constexpr() functionality. This method does have some overhead as there will be an additional branching needed; but since it is going to be the same branch, branch prediction can mitigate most of these effects.

You can find the final in_constexpr, smart_assert and example code here. Feel free to contribute any other functions that might be helpful. I feel like people will have really nice use cases and it'll be interesting to see!

ScanMe

More information on implementation to come later. This project was our final year design project to create a 3D scanner. Realizing a lot of these systems exist for small objects, the idea was extended it to 3D bodies because why not? I would love to try and buy clothes online. So pretty much the goal was to come up with a scanner that would create an animatable 3D model of a person. Did it work well? Not amazingly but it worked to prove a concept.

More description on the scanner from the old project page for school here

Example on one of my friends.

Generated Point Cloud:

Reconstructed 3D Model:

3D Model with Weights

Using PWM on the Beaglebone Black

Problem

After searching the internet I found that most PWM and pin muxing code had been rendered obsolete due to the new 3.8 kernel used in the BBB. I found two sources of information that were helpful in order to get PWM working (here and here). From this I played around until I got the EHRPWM pins working. I have written up a C++ library so that you guys can use it from within your applications.

The main problem I found with the steps was a similar problem that other people were having.

  1. I couldnt change the period of the pin
  2. If I changed the period for one channel of the PWM module, I didnt get period/duty/polarity files that I could modify for the other channel

The reason for not being able to set the period is that both channels need to share the same period. As a result if both channels are enabled (such as P8 Pin 13 and 19) then setting either period will conflict with the other. If the period was initially changed for one of the channels then the other period (which is initialized from the device tree module) has a conflicting period and thus does not actually allow the pin driver to be setup.

Solution

To fix these issues I modified the pwm_test kernel driver so that it would not initialize the channel if it has a period of 0. (Line 273 in kernel/drivers/pwm/pwm_test.c


/* Only set the config if the period is > 0. Otherwise the period will be set later dynamically */
if ( pwm_test->period > 0 )
{
/* polarity is already set */
rc = pwm_config(pwm_test->pwm, pwm_test->duty, pwm_test->period);
if (rc) {
dev_err(dev, "pwm_config() failed\n");
return rc;
}
} else {
pwm_test->period = 0;
}

The next thing that needed be changed was the device overlays for the bone_pwm_P* pins were changed so that they had a period of 0 and were disabled at the start.

pwms = <&ehrpwm2 1 500000 1>;
enabled = ;

became (for each bone_pwm_P module)

pwms = <&ehrpwm2 1 0 1>;
enabled = ;

Once I made those changes, I recompiled the kernel (following the instructions here or here). Copied over pwm_test.ko to /lib/drivers/, copy the compiled dtbo files into /lib/firmware and then manipulated the period as described in the previous links. The only thing to make sure now is that you set the periods for both channels at the start to whatever you desire and then enable it by echo'ing 1 to the run fille.

Code/Files

I've uploaded the compiled files and the library on a git-hub repo (https://github.com/SaadAhmad/beaglebone-black-cpp-PWM)  and installation instructions can be found there

 

Hope you guys find this useful!