MSVC C++ ABI Member Function Pointers

This is a detailed discussion of MSVC C++ ABI pointers to member functions, including some of the trade-offs they make. The format of pointers to member functions in C++ is implementation-defined, and a lot of incomplete and misleading information about this topic can be found around the web. There’s a popular article at Code Project that provides a reasonable overview of a several implementations of pointers to member functions, but not all the details are accurate. Raymond Chen has a very incomplete and somewhat misleading blog post. As far as I know, there is no publicly available formal specification for the MSVC C++ ABI.

In this article, MSVC C++ ABI with default options is assumed unless stated otherwise. Some of the values assumed to be int may actually be long int or int32_t, but without access to an LP32 or LP64 target using the MSVC C++ ABI it’s impossible to confirm one way or the other. This description is mainly based on the behaviour of MSVC 19.29 for x86-64 and AArch64. The usual disclaimers about implementation-defined behaviour apply: depending on this behaviour produces code that is not portable, details may change at any time, and the accuracy is limited by my understanding.

Casting pointers to member functions

According to the C++ standard, pointers to base class member functions may be cast to pointers to pointers to derived class member functions with the same signature for non-virtual bases only. Casting member function pointers across virtual inheritance relationships is forbidden. This rule simplifies implementation of member function pointers by avoiding the need to obtain this pointer offsets to virtual bases from the virtual table at the time a pointer to a member function is called. (You can still call a pointer to a member function of a virtual base class by casting the object to a reference to an instance of the virtual base class. The virtual base class offset is obtained from the virtual table as part of the cast to the base class, not as part of the pointer to member function invocation.)

By way of example, assume the following declarations:

class a { };

class b : public a { };

class c : public virtual a { };

using afunc = void (a::*)(int);
using bfunc = void (b::*)(int);
using cfunc = void (c::*)(int);

In standard C++ it is legal to cast a value of type afunc to type bfunc because the class a is a non-virtual base of the class b. On the other hand, it is not legal to cast a value of type afunc to cfunc because the class a is a virtual base of the class c.

As an extension to the C++ standard, MSVC does allow casting a pointer to a base class member function to a pointer to a derived class member function for virtual base classes. With the declarations from the example above, MSVC allows casting a value of type afunc to cfunc. This is the cause of some of the complications in the implementation of pointers to member functions in the MSVC C++ ABI.

Pointer to member function representations

As an optimisation, there are four different pointer to member representations used in different situations. I call them “single inheritance”, “multiple inheritance”, “virtual inheritance” and “unknown inheritance”. There are options to change the way the compiler selects a member pointer representation, for example MSVC has /vmb, /vmg, vmm, /vms and /vmv command-line options and a #pragma pointers_to_members directive. Unless otherwise noted, the rules described here assume /vmb or #pragma pointers_to_members(best_case) is in effect.

Single inheritance

Pointers to member functions of classes with single inheritance are equivalent to this structure:

struct {
    uintptr_t   ptr;    // function pointer
};

This representation is the same size as a non-member function pointer. This makes it efficient to store, copy or pass as a function parameter, as it can usually fit in a single address register or general-purpose register.

This representation is used when either:

The class definition is available, neither the class nor any of its direct or indirect base classes has any virtual base classes, neither the class nor any of its direct or indirect base classes has more than one base class, and neither the class nor any of its direct or indirect base classes declares any virtual member functions while deriving from a base class that has no virtual member functions.
The class definition is not available and a forward declaration of the class with the __single_inheritance qualifier is available.

MSVC will use this representation for all pointers to member functions with the /vmg and /vms options or the #pragma pointers_to_members(full_generality, single_inheritance) directive in effect. In this situation, declaring a pointer to a member of a class with multiple direct base classes, a class with virtual base classes or a class with virtual member functions that derives from a class with no virtual member functions results in “error C2287: ‘c’: inheritance representation: ‘single_inheritance’ is less general than the required ‘multiple_inheritance’” where “c” is the name of the class.

This minimal representation can be used because two assumptions can be made:

With non-virtual single inheritance, the base class (if any) always appears at the start of the class. A pointer to an instance of the class will not require adjustment when cast to or from a base class. Therefore, a pointer to a base class member function will not require this pointer adjustment when called.
For virtual member functions, the compiler will generate an out-of-line stub that fetches the appropriate virtual table entry and jumps to it.

It is possible to invoke a member pointer using this representation without access to the class definition. Performance for calling this representation is similar to calling a non-member function pointer for non-virtual member functions. For virtual member functions, there is an additional fetch and indirect branch. However, there are no conditional branches involved, which avoids performance penalties on deeply pipelined and/or highly parallel processors.

Multiple inheritance

Pointers to member functions of classes with multiple inheritance are equivalent to this structure:

struct {
    uintptr_t   ptr;    // function pointer
    int         adj;    // this pointer displacement in bytes
};

Note that on typical architectures, pointers and pointer-sized integers have natural alignment and int is no larger than a pointer, so the overall size is twice the size of a pointer. On typical LLP64 targets (including Windows on x86-64 and AArch64), the structure has four padding bytes for a total size of sixteen bytes.

This representation is used when either:

The class definition is available, neither the class nor any of its direct or indirect base classes has any virtual base classes, and the class or one of its direct or indirect base classes has at least two base classes.
The class definition is available, neither the class nor any of its direct or indirect base classes has any virtual base classes, and the class or one of its direct or indirect base classes declares at least one virtual member function while deriving from a base class that has no virtual member functions.
The class definition is not available and a forward declaration of the class with the __multiple_inheritance qualifier is available.

MSVC will use this representation for all pointers to member functions with the /vmg and /vmm options or the #pragma pointers_to_members(full_generality, multiple_inheritance) directive in effect. In this situation, declaring a pointer to a member of a class with at least one direct or indirect virtual base class results in “error C2287: ‘c’: inheritance representation: ‘multiple_inheritance’ is less general than the required ‘virtual_inheritance’” where “c” is the name of the class.

The this pointer displacement is necessary for the purpose of casting a pointer to a non-virtual base class member function to a pointer to a derived class member function with the same signature. The offset of the base class within the derived class is calculated when the pointer is cast, and applied (added to the this pointer) when it is invoked.

It is possible to invoke a member pointer using this representation without access to the class definition. This representation has twice the space cost of the single inheritance representation, but minimal additional performance cost to invoke – just one additional integer fetch and addition.

Virtual inheritance

Pointers to member functions of classes with virtual inheritance are equivalent to this structure:

struct {
    uintptr_t   ptr;    // function pointer
    int         adj;    // this pointer displacement in bytes
    int         vindex; // byte offset to base class offset in virtual table
};

Note that in the LLP64 data model, the two int members fit into the size of a pointer, so this representation has the same size as the multiple inheritance representation on typical LLP64 targets (including Windows on x86-64 and AArch64).

This representation is used when either:

The class definition is available, and either the class or at least one of its direct or indirect base classes has at least one virtual base class.
The class definition is not available and a forward declaration of the class with the __virtual_inheritance qualifier is available.

There is no combination of options or directives that will cause MSVC to use this representation for all pointers to member functions.

The virtual table index is necessary for the purpose of casting a pointer to a member function of a virtual base class to a pointer to a derived class member function with the same signature. The virtual table for the derived class contains offsets to all virtual base classes. The location of the offset to the virtual base class in the virtual table is populated when the pointer is cast; the offset is fetched from the instance’s virtual table and applied when the pointer is invoked, in addition to applying the this pointer displacement stored in the member function pointer directly.

It is not possible to invoke this representation of a pointer to a member function without access to the class definition – attempting to do so results in “error C2027: use of undefined type ‘c’” where “c” is the name of the class that was forward declared with the __virtual_inheritance qualifier. This requirement comes from a combination of two factors:

Structure layout rules mean that the virtual table pointer is not necessarily at the location the this pointer points to. (The virtual table pointer may not be at the location the this pointer points to in some situations where the first base class has no virtual member functions or virtual bases, but a virtual table pointer is inherited from another base class. It’s very rare to actually encounter such a case in practice.)
Invoking this representation requires access to the virtual table pointer, and hence knowledge of the offset to the virtual table pointer from the location the this pointer points to. This requires the base classes to be known.

The offset to the virtual base class and the this pointer displacement must be interpreted relative to the location of the virtual table pointer, which is not necessarily the location the this pointer points to. In pseudocode, the sequence for invoking this representation looks like this:

vptr = this[vadj]
this += vadj + vptr[vindex] + adj
CALL ptr

The offset to the virtual base class found in the virtual table will always be zero if the pointer does not represent a pointer to a member function of a virtual base class. In standard-conforming code, this will always be the case, as casting across virtual inheritance relationships is not permitted.

Compared to the multiple inheritance representation, this representation requires two additional fetches (the virtual table pointer and offset to the base class), at least two additional integer additions (the offset into the virtual table and the offset to the base class), and possibly a third addition of a constant (the offset to the virtual table pointer from the location the this pointer points to). On most architectures, some of these additions are implicit in addressing modes for the fetches. This representation still avoids the need for conditional branches: because the class is known to have a virtual table and the location of the virtual table pointer within the object is known, the offset to the virtual base can be fetched and added unconditionally even if it will be zero in most cases.

Unknown inheritance

Pointers to member functions of classes with unknown inheritance are equivalent to this structure:

struct {
    uintptr_t   ptr;    // function pointer
    int         adj;    // this pointer displacement in bytes
    int         vadj;   // offset to vptr or undefined
    int         vindex; // byte offset to base class offset in vtable or zero
};

Note that on typical LLP64 targets (including Windows on x86-64 and AArch64), the structure has four padding bytes for a total size of twenty-four bytes.

This representation is used when the class definition is not available, and the forward declaration of the class has no __single_inheritance, __multiple_inheritance or __virtual_inheritance qualifier.

MSVC will use this representation for all pointers to member functions with the /vmg and /vmv options, the /vmg option without the /vms or /vmm options, or the #pragma pointers_to_members(full_generality, virtual_inheritance) directive in effect.

If the virtual table index is non-zero, the offset to the virtual table pointer is added to the this pointer, and the offset to the base class is obtained from the virtual table and added to the this pointer. After this, the this pointer displacement is added to the (possibly already adjusted) this pointer. In pseudocode, the sequence for invoking this representation looks like this:

IF 0 != vindex:
    vptr = this[vadj]
    this += vadj + vptr[vindex]
ENDIF
this += adj
CALL ptr

It is possible to invoke this representation of a pointer to a member function without access to the class definition. Invoking this representation of a pointer to a member function requires a conditional branch and the associated performance penalties on deeply pipelined and/or highly parallel processors. This is necessary because without the class definition, it is not possible to know whether the class has a virtual table at all, and hence it may not possible to provide an offset to a zero value in the virtual table when an offset to a virtual base class is not required.

Comparison to Itanium C++ ABI

The Itanium C++ ABI is currently one of the most popular C++ ABIs, despite the market failure of the Itanium CPU architecture. The Itanium C++ ABI has been widely adopted on UNIX-like systems and by Open Source/Free Software development tools. Exact details vary by architecture, but conceptually the Itanium C++ ABI always represents pointers to member functions as a tuple containing three values:

A union containing a function pointer or virtual table index
A displacement to apply to the this pointer
A flag to discriminate between a function pointer or a virtual table index

Disadvantages compared to the MSVC C++ ABI include:

No provision for obtaining an offset to a virtual base class, so casting pointers to members of virtual base classes to pointers to members of derived classes cannot be supported
Pointers to member functions are always larger than non-member function pointers, even in the simplest cases
A conditional branch is required to invoke any kind of member function pointer in order to handle either a function pointer or a virtual table index

Advantages over the MSVC C++ ABI include:

All member function pointers types are the same size and can be invoked in the same way
Layout rules mean the virtual table pointer will always be at the location the this pointer points to if present, so there is no need to account for the offset to the virtual table pointer
If a pointer to a virtual member function is to be called repeatedly, it is simple to resolve the function address and avoid repeated virtual table fetches and additional indirect branches

Calling conventions

The proliferation of incompatible calling conventions for 32-bit i386 or i686 targets is well-known. For member functions, explicit arguments are pushed onto the stack in right-to-left order, the this pointer is passed in register ECX, and the called function removes the arguments from the stack on return. However, it is widely assumed that on x86-64, member functions are equivalent to non-member functions with the this pointer as an implicit first parameter. This is not true. The MSVC C++ ABI for Windows uses a subtly different calling convention for member functions on both x86-64 and AArch64.

This is not a comprehensive discussion of Windows calling conventions on x86-64 and AArch64. It’s intended to be just detailed enough to highlight the differences between non-member functions and member functions.

Non-member functions

Both x86-64 and AArch64 pass parameters and return results in registers. However, only scalar types (integers, floating point types, pointers and enumerated types), references, and small aggregate structures and unions (trivially constructible, destructible, copyable and assignable) may be returned in registers. In cases where the return type may not be returned in a regsiter, the caller allocates space for the return value (typically on the stack) and passes a pointer to the area for the return value as an implicit parameter.

On x86-64, register RCX is usually used for the first integer or pointer argument. However, if the return type cannot be returned in a register, the pointer to the area for the return value is passed in register RCX and explicit parameters are shifted by one position:

Return value in register	Pointer to return value area
RCX = first integer/pointer argument RDX = second integer/pointer argument R8 = third integer/pointer argument …	RCX = pointer to return value area RDX = first integer/pointer argument R8 = second integer/pointer argument …

On AArch64, registers X0 to X7 are used for integer or pointer arguments. If the return type cannot be returned in a register, the pointer to the area for the return value is passed in register X8, which would otherwise be a volatile register with no special significance. Explicit parameters do not need to be shifted:

Return value in register	Pointer to return value area
X0 = first integer/pointer argument X1 = second integer/pointer argument X2 = third integer/pointer argument …	X0 = first integer/pointer argument X1 = second integer/pointer argument X2 = third integer/pointer argument … X8 = pointer to return value area

Member functions

There are three key differences in the calling convention for member functions:

The this pointer is passed as an implicit first parameter.
Structure and union type values are never returned in registers.
The pointer to the return value area for structure and member types is passed as an implicit second parameter after the this pointer.

Note that for scalar types that may not be returned in registers, the pointer to the result area is passed in the same way it would be for a non-member function. An example of a type returned this way is a pointer to a member function of a class with unknown inheritance: it is a pointer, and hence a scalar type, but with a size of twenty-four bytes it is too large to return in registers.

For x86-64, these are the three possible situations on entry to a member function – note that when a scalar value cannot be returned in a register, the this pointer is shifted by one position:

Return value in register	Pointer to return value area (scalar)	Pointer to return value area (structure/union)
RCX = `this` pointer RDX = first integer/pointer argument R8 = second integer/pointer argument …	RCX = pointer to return value area RDX = `this` pointer R8 = first integer/pointer argument …	RCX = `this` pointer RDX = pointer to return value area R8 = first integer/pointer argument …

For AArch64, these are the three possible situations on entry to a member function – note that the this pointer is always in X0:

Return value in register	Pointer to return value area (scalar)	Pointer to return value area (structure/union)
X0 = `this` pointer X1 = first integer/pointer argument X2 = second integer/pointer argument …	X0 = `this` pointer X1 = first integer/pointer argument X2 = second integer/pointer argument … X8 = pointer to return value area	X0 = `this` pointer X1 = pointer to return value area X2 = first integer/pointer argument …

Why the difference?

The different calling convention for member functions on x86-64 has been in place since MSVC added support for the architecture. AArch64 seems to follow x86-64 by analogy.

Initially I thought the different calling convention for member functions was to ensure the this pointer would always be in the same register for convenience. That was before I realised there are situations where this is not the case, and there are different rules for which types may be returned in registers.

I can only speculate as to what the reasoning behind the decision to use a different calling convention was. It may simplify interoperability with some other language, or it may simplify COM implementation in some way.

The problems for delegates

Unless you’re writing assembly language code or a compiler that generates it (or debugging a low-level issue), the real motivation for getting into the gory details of member function pointer implementations almost always comes back to the desire to implement fast delegates. Invoking pointers to member functions can be slower than invoking pointers to non-member functions, and mitigating that is a common goal.

The MSVC ABI presents three major problems for the purpose of implementing fast delegates without limiting developers:

It is not practical to distinguish between the multiple inheritance and virtual inheritance representations of pointers to member functions in a template on LLP64 platforms. It’s simple to distinguish between the single inheritance, multiple inheritance and unknown inheritance representations using the result of the sizeof operator. However, the multiple inheritance and virtual inheritance representations have the same size on LLP64 platforms due to alignment and padding requirements. There’s no standard type trait for determining whether a class has at least one direct or indirect virtual base, and as far as I know there’s no MSVC extension for doing so either.
There’s no standard way to obtain the location of the virtual table pointer within an object. In certain situations, the virtual table pointer will not be at the location the this pointer points to. There’s no standard way to obtain the offset to the virtual table pointer, and as far as I know there’s no MSVC extension for obtaining it, either. This makes it impossible to safely support the virtual inheritance representation even on platforms where it can be detected reliably.
The subtle difference in calling conventions for non-member and member functions means that it is not possible to convert a pointer to a member function to an equivalent pointer to a non-member function if it returns a structure or union value that is not trivially default constructible and destructible. There is no equivalent non-member function signature that will cause the result value to be constructed in the correct location. For trivially constructible and destructible types, the area for the return value can be treated as a reference parameter following the this pointer. Using this approach requires a temporary variable that the compiler might not elide, and if your delegate implementation supports non-member functions as well as member functions, a conditional branch is required to select the correct equivalent non-member function signature before the call. This causes a performance penalty for all calls, working against the original goals of designing a fast delegate type.

In practice, many developers just naïvely assume the virtual table pointer can be found at the location the this pointer points to, even though this isn’t guaranteed by the layout rules. It’s possible to work around the differing calling conventions by instantiating an adapter function when binding a delegate to a member function that returns a structure or union by value, but there are real-world delegate implementations that don’t do this. There are also real-world delegate implementations that ignore the differences between the multiple inheritance and virtual inheritance representations of member function pointers.

How do they get away with it?

So how do delegate implementations that don’t account for these seemingly insurmountable issues work at all? Well it actually turns out that situations that trigger the issues don’t come up as frequently as you might expect. Even if you aren’t being careful to avoid the problematic code, a combination of several factors means you may not ever encounter the issues:

Virtual inheritance is used sparingly: Several of the issues only come up when virtual inheritance is involved. Virtual inheritance is one of the less-frequently used C++ features. It has a space penalty, it adds indirection to base class member accesses, and it complicates base class construction. It’s only used when it’s really necessary. You may never need to write a class with any virtual bases, and even if you do, you may not need to use it with delegates. If you don’t use classes with virtual bases, you won’t get a situation where you need to find the virtual table pointer and fetch a value from the virtual table in order to invoke a pointer to a member function.
Classes where the this pointer doesn’t point to the virtual table pointer are rare: To make this happen, you need specific conditions involving a class with multiple base classes where the first non-empty base class has no virtual member functions and no direct or indirect virtual base classes, but the class inherits a virtual table pointer from another base class. It’s very rare to write a class that meets the requirements and has at least one virtual base class by coincidence. For example in many real-world cases, classes inherit a virtual destructor or a virtual base class via their first base class. This means assuming the virtual table pointer can be found at the location the this pointer points to rarely causes issues in practice.
Casting pointers to member functions across virtual inheritance relationships is non-standard: The offset to the virtual base class obtained from the virtual table when invoking a pointer to a member function will only be non-zero if the pointer represents a pointer to a function member of a virtual base class that has been cast to a pointer to a member of a derived class. Since this is not permitted by the C++ standard, it will never happen in portable code. Conveniently, this extension to the standard cannot be supported with the Itanium C++ ABI, so any code that uses it will fail to compile in many configurations (e.g. MinGW GCC on Windows x86-64, pretty much any Linux configuration, or macOS). The extension is unlikely to be useful in conjunction with delegates because the instance can be cast to a reference of the virtual base class when setting the delegate rather than casting the member function pointer. This means a situation where the offset to the virtual base class must be obtained from the virtual table when invoking a pointer to a member function is impossible in most portable code, and highly unlikely in code only built with the MSVC C++ ABI, especially considering the sparing use of virtual inheritance.
Functions return scalars more frequently than structures: The majority of functions return void or some kind of scalar value. Functions returning references (especially const references) are quite common, too. The different calling convention for member functions doesn’t affect functions that return void, scalars or references. Simply not using delegates with member functions that return structures or unions can be used as a workaround to avoid having to deal with the different calling convention used for member functions. You can also use a trait to prevent a delegate from being instantiated for pointers to member functions returning structure and union types.

These factors combine to allow code to work most of the time when various implementation difficulties are ignored.

Updates

Updated on 6 June 2023 to correctly cover the case where a class declares at least one virtual member function while deriving from a single base class with no member functions (thanks to ykiko for pointing out this error).

This entry was posted on Tuesday, 21 September, 2021 at 9:40 pm and is filed under C, Development, Technology. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

6 responses to “MSVC C++ ABI Member Function Pointers”

23 May, 2023 at 9:03 pm
ykiko says:

Hi, I am a student from China. I have read the whole article and try all the description of member function pointer on my computer. The result is same as what you say above. The article is so nice that I want to translate it to Chinese and publish in our community. I would like to request your permission. I will mark the original source and include the your name. Thank you very much.

23 May, 2023 at 9:25 pm
ykiko says:

Apologies for the grammar errors in my previous email. Here is the improved version:

Hi, I am a student from China. I have read the entire article and tried all the member function pointer examples on my computer. The results are consistent with what you mentioned above. The article is excellent, and I would like to translate it into Chinese and publish it in our community. I kindly request your permission, and I will properly attribute the original source and include your name.

24 May, 2023 at 1:19 am
vastheman says:

Hi ykiko. Thanks for your kind words. I’d be happy for you to publish a Chinese translation as long as you provide proper attribution. If you make your translated version publicly available, please link to it in a comment here.

5 June, 2023 at 5:21 pm
ykiki says:

Thank you for your reply. The article is long and fantastic, but there may be something wrong. In fact, in the part about single inheritance, you mentioned:

This minimal representation can be used because two assumptions can be made:

With non-virtual single inheritance, the base class (if any) always appears at the start of the class. A pointer to an instance of the class will not require adjustment when cast to or from a base class. Therefore, a pointer to a base class member function will not require this pointer adjustment when called.
For virtual member functions, the compiler will generate an out-of-line stub that fetches the appropriate virtual table entry and jumps to it.

I have found that for virtual member functions, things are not always the same.:

When the base class has a virtual pointer:

struct A
{
    void Test_A() {}
    virtual void Test() {}
};

struct B : public A
{
    void Test_B() {}
};

int main()
{
    auto a = &A::Test_A;
    auto b = &B::Test_B;
    std::cout << sizeof(a) << std::endl;
    std::cout << sizeof(b) << std::endl;
}

Both A and B have a virtual pointer at the start of their memory layout, so there is no need to store extra information.

However, if A does not have a virtual pointer but B does, because the virtual pointer of B is still at the start of its memory layout, at this time, the class A needs to offset by 8 bytes (the size of the virtual pointer).

struct A
{
    void Test_A() {}
    char data[32];
};

struct B : public A
{
    virtual void Test_B() {}
};

struct fp
{
    void* address;
    int offset;
};

int main()
{
    auto a = &A::Test_A;
    decltype(&B::Test_B) b = &B::Test_A;
    std::cout << sizeof(a) << std::endl;
    std::cout << sizeof(b) << std::endl;
    std::cout << reinterpret_cast<fp*>(&b)->offset << std::endl;
}

The sizeof(b) is 16 and the offset is 8. Although it is single inheritance, the reason behind this can be easily found.
So in this situation, it is similar to "Multiple inheritance"

Besides, I am not sure about the meaning of the following statement "It is possible to invoke a member pointer using this representation without access to the class definition. Performance for calling this representation is similar to calling a non-member function pointer for non-virtual member functions. For virtual member functions, there is an additional fetch and indirect branch. However, there are no conditional branches involved, which avoids performance penalties on deeply pipelined and/or highly parallel processors."

Could you please provide me with some examples to help me understand better? Thank you very much. Best wishes.

6 June, 2023 at 1:41 am
vastheman says:

Hi ykiki, thanks for this analysis.

You are correct, I did not consider the case of a class with virtual member functions with a base class that has no virtual member functions. The “multiple inheritance” implementation is used in that case. I will update the content to reflex this. Thank you for thinking of this and pointing out the error.

I will try to explain the other statements in more detail.

“It is possible to invoke a member pointer using this representation without access to the class definition,” means it is possible to call a “single inheritance” or “multiple inheritance” member function when only a forward declaration of the class is available. Consider the following code fragment:

struct __single_inheritance   A; // forward declaration of class with single inheritance
struct __multiple_inheritance B; // forward declaration of class with multiple inheritance
struct __virtual_inheritance  C; // forward declaration of class with virtual inheritance

typedef void (A::*a_func)(int); // pointer to member function of class with single inheritance
typedef void (B::*b_func)(int); // pointer to member function of class with multiple inheritance
typedef void (C::*c_func)(int); // pointer to member function of class with virtual inheritance

void invoke_a_func(a_func f, A &obj, int i)
{
    // can call "single inheritance" pointer to member function without access to class definition
    (obj.*f)(i);
}

void invoke_b_func(b_func f, B &obj, int i)
{
    // can call "multiple inheritance" pointer to member function without access to class definition
    (obj.*f)(i);
}

void invoke_c_func(c_func f, C &obj, int i)
{
    // cannot call "virtual inheritance" pointer to member function without access to class definition
    // error C2027: use of undefined type 'C'
    (obj.*f)(i);
}

Definitions for classes A, B and C are not available (only forward declarations are available). In this situation, it is possible for the compiler to generate code to invoke a pointer to a member function of “single inheritance” class A or “multiple inheritance” class B. However, it is not possible for the compiler to generate code to invoke a pointer to a member function of “virtual inheritance” class C, so calling the member function pointer in invoke_c_func results in an error.

To understand the performance analysis, it’s necessary to look at how code is generated to invoke the function pointers. Consider this fragment of code:

struct A
{
    void test_direct() {}
    virtual void test_vtable() {}
};

typedef void (A::*a_func)();

void invoke(A &obj, a_func f)
{
    (obj.*f)();
}

void test()
{
    A obj;
    invoke(obj, &A::test_direct);
    invoke(obj, &A::test_vtable);
}

We have a “single inheritance” class called A, and we use pointers to member functions to invoke both a non-virtual member function and a virtual member function.

Here are the important parts of the intermediate assembly language output when compiled with MSVC 19.29 for x86-64, with explanatory comments added (remember this is Intel assembler syntax, so destination operands come before source operands):

this$ = 8
void A::test_direct(void) PROC
    mov     QWORD PTR [rsp+8], rcx              ; save rcx ('this' pointer) in home space on stack
    ret     0                                   ; return
void A::test_direct(void) ENDP

this$ = 8
virtual void A::test_vtable(void) PROC
    mov     QWORD PTR [rsp+8], rcx              ; save rcx ('this' pointer) in home space on stack
    ret     0                                   ; return
virtual void A::test_vtable(void) ENDP

[thunk]:A::`vcall'{0,{flat}}' }' PROC
    mov     rax, QWORD PTR [rcx]                ; load address of vtable into rax
    jmp     QWORD PTR [rax]                     ; jump to first function in vtable
[thunk]:A::`vcall'{0,{flat}}' }' ENDP

obj$ = 48
f$ = 56
void invoke(A &,void (__cdecl A::*)(void)) PROC
$LN3:
    mov     QWORD PTR [rsp+16], rdx             ; save rdx ('f') in home space on stack
    mov     QWORD PTR [rsp+8], rcx              ; save rcx ('obj') in home space on stack
    sub     rsp, 40                             ; allocate minimal stack frame

    ; (obj.*f)();
    mov     rcx, QWORD PTR obj$[rsp]            ; load 'obj' into rcx from stack ('this' argument)
    call    QWORD PTR f$[rsp]                   ; call function pointer 'f' on stack

    add     rsp, 40                             ; deallocate stack frame
    ret     0                                   ; return
void invoke(A &,void (__cdecl A::*)(void)) ENDP

obj$ = 32
void test(void) PROC
$LN3:
    sub     rsp, 56                                             ; allocate stack frame

    ; A obj;
    lea     rcx, QWORD PTR obj$[rsp]                            ; construct instance of A on stack ('obj')
    call    A::A(void)

    ; invoke(obj, &A::test_direct);
    lea     rdx, OFFSET FLAT:void A::test_direct(void)          ; put address of 'A::test_direct' in rdx ('f' argument)
    lea     rcx, QWORD PTR obj$[rsp]                            ; put pointer to 'obj' in rcx ('obj' argument)
    call    void invoke(A &,void (__cdecl A::*)(void))          ; call 'invoke'

    ; invoke(obj, &A::test_vtable);
    lea     rdx, OFFSET FLAT:[thunk]:A::`vcall'{0,{flat}}' }'   ; put address of virtual function call stub in rdx ('f' argument)
    lea     rcx, QWORD PTR obj$[rsp]                            ; put pointer to 'obj' in rcx ('obj' argument)
    call    void invoke(A &,void (__cdecl A::*)(void))          ; call 'invoke'

    add     rsp, 56                                             ; deallocate stack frame
    ret     0                                                   ; return
void test(void) ENDP

Let’s analyse the code one piece at a time:

The content of A::test_direct and A::test_vtable is not important. They do nothing useful anyway. They’re just included to show that they’re regular functions.
The compiler has generated a stub function for calling the first member function from the vtable of an instance of class A. Note that it consists of two instructions: an integer load to get the vtable address (one fetch), and an indirect jump to the first member function in the vtable (one fetch, with address dependent on previous fetch).
The function invoke calls the member function pointer in the same way it would call a pointer to a non-member function, using a single indirect call (after setting up arguments).
To create the pointer to the non-virtual member function in test, the address of the function A::test_direct is used directly.
To create the pointer to the virtual member function in test, the address of the generated stub function for calling the member function from vtable is used.

Note that there are no conditional jump or conditional move instructions. This is beneficial for performance on CPUs that can execute large numbers of instructions in parallel (e.g. with a large number of execution units and/or a long multi-stage instruction execution pipeline). Conditional branches can harm performance in multiple ways. They consume limited branch prediction resources, and if a conditional branch is predicted incorrectly any speculatively executed instructions following the branch need to be discarded.

Considering what happens when the member function pointers are called:

When the pointer to the non-virtual member function is called, the indirect call instruction will jump directly to A::test_direct. There is no additional overhead compared to calling a non-member function pointer.
When the pointer to the virtual member function is called, the indirect call instruction will jump to the stub function for calling the first member function from the vtable. This function consists of two instructions (a load and an indirect jump), and performs two memory fetches (the vtable address and the virtual member function address) in order to jump to the correct implementation of the virtual member function.

The additional overhead for calling a pointer to a virtual member function comes from the generated stub function used to call the member function at the appropriate position in the vtable.

Now consider what will happen when this code is compiled for the Itanium C++ ABI. Here are the important parts of the intermediate assembly language output when compiled with MinGW GCC 11.3 for x86-64, with explanatory comments added (remember this is Intel assembler syntax, so destination operands come before source operands):

A::test_direct():
    ret                                 ; return

A::test_vtable():
    ret                                 ; return

invoke(A&, void (A::*)()):
    sub     rsp, 40                     ; allocate minimal stack frame

    ; (obj.*f)();
    mov     rax, QWORD PTR [rdx]        ; load function address/vtable index union into rax
    mov     r8, QWORD PTR 8[rdx]        ; load 'this' pointer offset into r8
    mov     rdx, rax                    ; copy function address/vtable index union into rdx
    test    al, 1                       ; test virtual member function flag
    je      .L4                         ; if non-virtual member function, jump to local label .L4
    mov     rdx, QWORD PTR [rcx+r8]     ; load address of vtable into rdx
    mov     rdx, QWORD PTR -1[rdx+rax]  ; load address of appropriate member function from vtable into rdx
.L4:
    add     rcx, r8                     ; add offset to 'this' pointer
    call    rdx                         ; call member function

    nop                                 ; padding
    add     rsp, 40                     ; deallocate stack frame
    ret                                 ; return

test():
    push    rsi                         ; save callee-preserved register rsi on stack
    push    rbx                         ; save callee-preserved register rbx on stack
    sub     rsp, 72                     ; allocate stack frame

    ; A obj;
    lea     rax, vtable for A[rip+16]   ; get pointer to vtable for A in rax
    mov     QWORD PTR 56[rsp], rax      ; set vtable pointer in instance of A 'obj' on stack

    ; invoke(obj, &A::test_direct);
    lea     rax, A::test_direct()[rip]  ; put address of 'A::test_direct' in rax
    mov     QWORD PTR 32[rsp], rax      ; store address on stack for function address/virtual table index union
    mov     QWORD PTR 40[rsp], 0        ; store zero on stack for 'this' pointer offset
    lea     rsi, 32[rsp]                ; put address of member function pointer into rsi so it will be saved across call
    lea     rbx, 56[rsp]                ; put address of 'obj' into rbx so it will be saved across call
    mov     rdx, rsi                    ; put address of member function pointer into rdx ('f' argument)
    mov     rcx, rbx                    ; put address of 'obj' into rcx ('obj' argument)
    call    invoke(A&, void (A::*)())   ; call 'invoke'

    ; invoke(obj, &A::test_direct);
    mov     QWORD PTR 32[rsp], 1        ; store virtual member function index and flag on stack for function address/virtual table index union
    mov     QWORD PTR 40[rsp], 0        ; store zero on stack for 'this' pointer offset
    mov     rdx, rsi                    ; put address of member function pointer into rdx ('f' argument)
    mov     rcx, rbx                    ; put address of 'obj' into rcx ('obj' argument)
    call    invoke(A&, void (A::*)())   ; call 'invoke'

    nop                                 ; padding
    add     rsp, 72                     ; deallocate stack frame
    pop     rbx                         ; restore callee-preserved register rbx from stack
    pop     rsi                         ; restore callee-preserved register rsi from stack
    ret                                 ; return

Let’s analyse this version of the generated code:

Once again, the content of A::test_direct and A::test_vtable is not important. They’re included to show that they’re regular functions.
The function invoke must test the flag indicating whether the member function pointer f represents a virtual member function or a non-virtual member function, and fetch the address of the member function from the vtable if necessary. This involves a conditional jump (je instruction).
Even though it can be determined that class A does not use multiple inheritance, the member function pointer implementation still provides support for adjusting the this pointer.
To create the pointer to the non-virtual member function in test, the address of the function A::test_direct is used directly.
To create the pointer to the virtual member function in test, the offset to the address of A::test_vtable in the vtable is used with the least significant bit set to indicate that it represents a virtual member function.

The following conclusions can be made:

With the Itanium ABI, there is always additional overhead for calling a pointer to a member function compared to calling a pointer to a non-member function. The caller must determine whether the pointer represents a virtual member function before the address to call can be determined. This involves a conditional jump, which can have a detrimental effect on performance.
With the Itanium ABI, provisions are always made for multiple inheritance. This causes additional overhead even in the simplest cases.
If the same member function pointer is to be called multiple times, the function address and adjusted this pointer can be cached. This amortises the overhead across multiple calls.

I hope this adequately explains the statements. Please let me know if you require further clarification.

6 June, 2023 at 4:09 pm
vastheman says:

I have updated the text to cover the case where a class declares at least one virtual member function while deriving from a single base class that has no virtual member functions. Thanks again for bringing this to my attention, ykiko.

Rants from Vas