That one time with MSVC and ASAN
In september I was hard at work with Brainroll, the first game I am about to release with my newly formed company Nullsson. I was reafactoring some of the internal state for the deterministic movement within the game and I accidently made an error that looks like the following:
struct x
{
int a;
};
struct state
{
x *A;
x B[1024];
};
int main ()
{
state State = {0};
x Thing = { 1 };
State.B[0] = Thing;
State.A = &Thing;
x *One = State.A;
x *Two = &State.B[0];
return 0;
}
Here we have our internal state within the state struct. We are initializing our state’s A and B with a pointer to a newly created Thing of type x. This is a clear case of UB in C since if we are going out of scope, the pointer A will point to some value that we are not allowed to access. In this case, “not allowed” means that the language standard says that you are not allowed which means that the compiler will most likely assume that you will never do this.
If you do there are many different things that can happen. For example:
The compiler can produce really bad code
The compiler can produce code that reads the memory at that location (Which now potentially contains garbage)
The compiler will produce a warning/error saying that you’re doing something that is not allowed.
In my case, the second point happened. Everything “worked” as expected but this was out of “pure luck”.
I tracked this bug down after it caused a crash once or twice and it wasn’t really that difficult of a bug to solve in my case but I still wanted to take some sort of precaution so that It wouldn’t happen again.
I asked around and got a tip of to check out something called “Address Sanitizer”, which on most compilers now is just a flag you can pass to it such as -fsanitize=address
for clang or /fsanitize=address
for msvc.
This will tell you which “bad” memory you are accessing, at which location it happens and at which location the memory was allocated.
Here is an example of the output the Address Sanitizer (ASAN) would give you upon an error:
I thought, “this is awesome for a debug build. This way, I can always be safe during development that I’m not making any mistakes!” So I added the flag to my Brainroll build and compiled. After that, I was no longer able to start Brainroll.
I opened the VS Debugger and found the following messages within the debug output:
Exception thrown at 0x00007FF7176296EF in Maraton.exe: 0xE0736171: Access violation reading location 0x000004AB0F340000.
The 0xE0736171 exception is quite normal it turns out.
These silent access violations in the debugger’s output window are normal under x64. They are related to the way the ASan runtime is requesting virtual address space to hold the AddressSanitizer shadow bytes. You can ignore these access violations; the debugger will correctly break if AddressSanitizer detects memory access issues.
No luck there. All I had to go on was that my application crashes unexpectedly if I compile with /fsanitize=address
. Brainroll consists of two compilation units, one executable and one dll. So far I have been compiling both units with this flag. I made an attempt only compiling the dll with ASAN enabled. This gave me another error message:
Exception thrown at 0x00007FF8BE173416 (ntdll.dll) in Maraton.exe: 0xC0000005: Access violation writing location 0x0000000000000024.
After some reading I found out that this error might be because of my VS version which was 16.11.1. I made a desperate attempt and updated my visual studio version to 16.11.3 and hey! It actually worked. Now I was able to atleast run the application again without crashes.
I made an attempt with the first example code I posted here to trigger an ASAN error but nothing really happened. I tried just starting the application in VS. It started without exceptions but nothing happened. I let it run for 5 minutes and nothing happened. I used the VS debugger function to pause execution and it would always pause its execution in “External Code”… Strange to say the least.
I took a step back and put a breakpoint at the first line of main and started stepping.. I stepped through the entire application all the way untill the first glBindVertexArray
call where the application freezes.
Another strange thing i noticed here was that when stepping, it sometimes seemed like it was jumping over lines of code and sometimes making big jumps back in the code.. Kind of the behaviour you get when debugging an optimized build. And this is where I understood that something really strange is going on with my code.
I continued investigating as well as consulting the Handmade Network discord for help. I got help from one of the best programmers I know, Martins who I want to thank once again for taking his time and helping me investigate this.
I tried adding /MDd
to my build which causes the application to use the debug multithread-specific and DLL specific version of the run-time library. This also causes the compiler to place the library name MSVCRTD.lib into the .obj file which basically is the debug version of the Microsoft C runtime library.
Now the build complained on some missing .dll files for ASAN. I grabbed them from the bin directory of my Visual Studio tools directory and put them next to my build. The build worked again but now my problems were even more strange. Now it seemed like the application would hang on every other glBindVertexArray
call. It no longer hangs on the same one every time.
I tried compiling another OpenGL snippet with the ASAN flag and it worked fine so it is not something with just OpenGL on my machine, it has to be something else. I was completely out of ideas.
Clang has MSVC compatability so you can compile your MSVC application with clang using the same flags you use for MSVC using clang-cl. I tried it and the application consistently works using clang-cl. So I got the application working again by switching to clang, awesome! Or is it?
I still have to know why the MSVC build all of a sudden doesn’t work, otherwise the problem may not be solved.
I created a minimal repro and sent it over to Martins which found out something interesting. XInput which I am using is doing something strange in the background like loading DLLs in a background thread. Basically the problem could be solved if I put a Sleep(100); after initializing XInput.
This could be spotted if you looked at the threads window in the VS debugger and looking at the call stack for the thread that is loading XInput.
Double click thread 25212 since that is the one running XInput.
We look at the call stack for it and right-click and select “Show External Code” and select Load Symbols. Then look and behold:
With this we could trim down my 6 file long repo to this small snippet of code:
#include <windows.h>
#include <GL/gl.h>
#pragma comment (lib, "gdi32.lib")
#pragma comment (lib, "user32.lib")
#pragma comment (lib, "opengl32.lib")
int main()
{
HWND window = CreateWindowA("STATIC", "name", WS_OVERLAPPEDWINDOW, 10, 10, 10, 10, 0, 0, 0, 0);
HDC dc = GetDC(window);
PIXELFORMATDESCRIPTOR pfd =
{
.nSize = sizeof(pfd),
.nVersion = 1,
.dwFlags = PFD_DRAW_TO_WINDOW | PFD_SUPPORT_OPENGL | PFD_DOUBLEBUFFER,
.iPixelType = PFD_TYPE_RGBA,
.cColorBits = 32,
};
int pf = ChoosePixelFormat(dc, &pfd);
SetPixelFormat(dc, pf, &pfd);
HGLRC rc = wglCreateContext(dc);
wglMakeCurrent(dc, rc);
typedef void (APIENTRY* PFNGLBINDVERTEXARRAYPROC) (GLuint array);
typedef void (APIENTRY* PFNGLGENVERTEXARRAYSPROC) (GLsizei n, GLuint* arrays);
PFNGLGENVERTEXARRAYSPROC glGenVertexArrays = (void*)wglGetProcAddress("glGenVertexArrays");
PFNGLBINDVERTEXARRAYPROC glBindVertexArray = (void*)wglGetProcAddress("glBindVertexArray");
LoadLibraryA("xinput1_4.dll");
for (int i = 0; i < 10; i++)
{
GLuint vao;
glGenVertexArrays(1, &vao);
glBindVertexArray(vao);
}
return 0;
}
So I basically I triggered the most unlucky situation ever where ASAN, XInput and Nvidia is all doing something together that caused my application to hang. Looking back at the problems we saw previously, our main thread hung when it was waiting for something from the nvidia dll.
It seems like nvidia is allocating memory in other threads and the best guess is that the allocation is interfearing with the XInput dll loading.
With this Martins trimmed the repo down even further:
#include <windows.h>
int main()
{
LoadLibraryA("xinput1_4.dll");
for (int i = 0; i < 100; i++)
{
HGLOBAL g = GlobalAlloc(0, 32);
GlobalSize(g); // Hangs here !!!
GlobalFree(g);
}
}
So to sum everything up, something is weird with GlobalSize in ASAN when used together with XInput 1.4. The nvidia dll is using GlobalSize for its work so that is why my application hung on the glBindVertexArray
calls.
I summed things up and reported the bug to the msvc team here: https://developercommunity.visualstudio.com/t/AddressSanitizer:-Execution-hangs-at-glB/1533793?entry=myfeedback&space=62
Learn more about ASAN here: https://docs.microsoft.com/en-us/cpp/sanitizers/asan?view=msvc-160