Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is there any particular utility to this trick, or is it just a neat side-effect of the linker and compiler being very permissive and treating something that most languages would call a compilation error as merely a warning?


In C and C++, "main" is special. Too special. For historical reasons, its argument and return types are not checked.

I once argued on the C standard forum that a C compiler should not know about "main". "#include <unix.h>" should contain the usual Unix declaration for "main", and "#include <windows.h>" should contain the Windows declaration, which at the time was, roughly:

    int WINAPI wWinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, PWSTR pCmdLine, int nCmdShow);
It's then up to the user to define their startup function to match, with normal type checking.

This gets the compiler out of handling "main" as a special case.

This was generally considered to be the right answer, but would break too much existing code.


On visual studio, the main() is not the entry point of the program.

The entry point is automatically generated by the compiler, it calls a few functions depending on what the program does then calls the main, I think it had to do with initializing the standard library. You can see the stub using a debugger or a disassembler.

It's possible to set the entry point point to any function name. See advanced project settings.

Now about the arguments and return type. With main the caller is responsible for pushing arguments onto the stack before the call, then popping the stack after the call. the return code is in the EAX register if I remember well.

Because of that, it doesn't matter what's the signature of the main, the invocation will work irrelevant of the arguments.

People may ask what's the point of knowing any of this? One major use case is to write executable compressors like UPX. Another use case is to make a custom entry point written in assembler.


Not just in Visual Studio. main is usually (always?) not the entry point on unixoid systems either, that's much more likely to be _start, which calls main() down the line.

Nevertheless, main is treated specially by the compiler for the aforementioned historical reasons, for example to not warn/error out if it does not return a value despite the type clearly telling so.

Observe:

% echo 'int main(void) { }' > foo.c; clang -c foo.c

<no output>

% echo 'int foo(void) { }' > foo.c; clang -c foo.c

foo.c:1:17: warning: non-void function does not return a value [-Wreturn-type]

int foo(void) { } ^ 1 warning generated.

As you can see, clang is happy to ignore the missing return value for main(), but not for foo().


Also, just as execution doesn't start in main, it also doesn't end with main, either. In C, you can register `atexit` handlers. You can do that in C++, too. In C++, you can also have "user" code executed before & after main by virtue of static initialization & destruction.


There is no special treatment of main in the major C compilers, the only "magic" thing the compiler does is including the CRT startup object file in the link, which defines _start as a function ultimately calling main, and having the default linker script set the address of "_start" as the executable entry point.

You can pass -nostdlib to gcc to disable linking the CRT startup object (or use ld directly) and you can pass --default-script /dev/null to ld to disable the linker script.

There is no need to declare main or check arguments or return types since in C arguments are both pushed and popped by the caller and the language provides no typing guarantees and thus there is no problem in calling functions with mismatched argument or return type declarations.


Yes there is. I've demonstrated this in a sibling (or rather, cousin) comment, but in short, you can happily not return a value in main even if its type is "int main(void)". Try that with another function, and the compiler should at least warn. This might not be a special case of code generation, but it is a special case of error handling at least.


Not quite true: there’s the weird thing where gcc on i*86 will align the stack on entry to a function called main but not any other.

  $ gcc -m32 -O2 -fno-pie -fno-asynchronous-unwind-tables -fomit-frame-pointer -S -masm=intel -xc -o - -
  int foo(void); int main(void) { return foo(); }
  ^D
   .file "<stdin>"
   .intel_syntax noprefix
   .text
   .section .text.startup,"ax",@progbits
   .p2align 4
   .globl main
   .type main, @function
  main:
   push ebp
   mov ebp, esp
   and esp, -16
   call foo
   leave
   ret
   .size main, .-main
   .ident "GCC: (GNU) 11.1.0"
   .section .note.GNU-stack,"",@progbits
It doesn’t do that if you set the historical stack alignment, though (-mpreferred-stack-boundary=2), or if you name the function anything else but main (it even does a tail call). Presumably it’s trying to (somewhat) recover from the time when the GCC authors accidentally the SysV i386 ABI[1,2].

[1]: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838 [2]: https://stackoverflow.com/a/49397524


to nitpick: windows also needs to provide a main function(why shouldn't?)

but I agree that the fact that the standard allow for 2 different declarations of main in a language without poliformism doesn't help.

Not to start with the whole implementations are free to define extra entry points part.


Ken Thompson wrote a regex engine which compiled (at runtime) regexes into data structures containing executable machine code, and invoked them (from C source) by jumping into the data i.e. treating its location as a function pointer. That's what's happening here except it's the start code inserted by the linker which is jumping into main.

So there's the utility, if you're hardcore enough to build machine code at runtime.

If you wanted to abuse main() particularly, I guess you've got argc and argv in registers, and your hand-compiled main 'function' could maybe have some self-modifying code?


I don't know that that would work since if the code is generated at runtime it would live in .data and not .text. At least for the architecture being targeted, you aren't allowed to create executable code at runtime like that (note that the original poster had to declare his main array as const to be able to have it in the .text segment.


Coercing a data pointer into a function pointer is undefined behavior in standard C (they don't even need to be the same size), but at least on POSIX platforms the compiler must do the right thing because `dlsym` depends on it working. Generating and executing native code at runtime is not that special, mind; after all JIT compilers are ubiquitous these days!


Popularity of the no-execute NX bit significantly postdates Unix. As I recall, Microsoft only started flipping it on by default for the 64-bit Windows NT kernel, since so many preexisting 32-bit applications relied on self-modifying code.


Request: URL pointing to the code for this regex engine.


Neat side effect




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: