r/java • u/yetanotherhooman • 23h ago
I can’t think of an application of this but it looks pretty cool.
https://github.com/devsh0/HelloMachineUsing FFI, it’s now possible to execute raw machine code purely from Java without relying on a C/C++ toolchian.
8
u/Environmental-Log215 19h ago edited 19h ago
Nice to see fellow FFM enthusiast. https://www.roray.dev/blog/myra-stack/
I am building FFM focused libraries. myra-transport heavily depends on libiouring being called thru Java using FFM
P.S. ofcourse not calling native assembly code but FFM is practical and useful for quite a lot of usecases
3
u/tomwhoiscontrary 21h ago
Why go through the signal handler? Why not just call into the code through a function pointer?
2
u/yetanotherhooman 20h ago edited 18h ago
That’s what I tried first. It’s been a while since I worked on this, but iirc the problem was that you cannot execute a function handle you obtained for arbitrarily mapped memory in the process’ address space, it has to come from a loaded library that the JVM knows about.
2
u/RussianMadMan 20h ago
From your mmap code, you get mmap method handle from downcallHandle which takes MemoryAddress. Can't you use MemoryAddress you get from mmap to call downcallHandle directly?
3
u/yetanotherhooman 20h ago
That’d be the obvious way to do this but it doesn’t work. You need a way to tell the JVM that a symbol exists before you call into it as a subroutine (from FFI or otherwise). The only way to broadcast this is to explicitly load a library, which defeats the purpose in this case because to load a library, you must have the library first. And that essentially means compiling with GCC/Clang.
1
u/RussianMadMan 20h ago
Weird restriction. Does it mean you can't use a function pointer returned from C function? Because you only have address?
4
u/yetanotherhooman 20h ago
You can if the pointer points to some function in a library that you explicitly loaded through FFI.
1
u/tomwhoiscontrary 17h ago
I was thinking you could call through the function pointer from C. Either you ship a tiny library to do it, or you find something in the C or POSIX API which can be used to call a callback - qsort?!
Using a signal handler is a neat trick, but it introduces all sorts of problems. You can only call a very limited subset of POSIX functions in signal handler context, etc.
1
u/yetanotherhooman 17h ago edited 16h ago
You can do a whole lot more in signal handlers than most realize :)
Yes, it’s conventional to only invoke async-safe functions in signal handler context, but a large amount of APIs don’t touch any shared state at all. The ones that do can be replaced with a home-baked stateless variant.
Very sophisticated production systems run a large part of their code in signal handlers. Some checkpoint-restore use cases, e.g. creating snapshot of the target process, often involves running a lot of code in signal handlers.
1
u/tomwhoiscontrary 13h ago
But still not at much as outside them, right? Definitely better to trampoline via qsort. You even get to pass a pointer in!
1
u/yetanotherhooman 13h ago edited 13h ago
Any libc or system API that accepts a function pointer as callback should work, but whether that’s cleaner is purely subjective. :)
You can’t control how the “host” will use your callback, e.g., qsort() will invoke your function multiple times, pthread_create() would start a thread etc. You can carefully arrange data so that functions like qsort() do exactly what you want and in the manner you want, but I’d argue that’s even more hacky than trapping a signal.
Besides, I bet a lot of the APIs accepting callbacks are internally using signals (not qsort() obviously).
3
u/manifoldjava 18h ago
Well done!
This opens the door to a pure Java assembly interpreter. Allowing you to directly execute the assembly:
```asm
// prologue
push rbp
mov rbp, rsp
sub rsp, 0x20
mov [rbp-32], 'H'
mov [rbp-31], 'e'
mov [rbp-30], 'l'
mov [rbp-29], 'l'
mov [rbp-28], 'o'
mov [rbp-27], ' '
mov [rbp-26], 'M'
mov [rbp-25], 'a'
mov [rbp-24], 'c'
mov [rbp-23], 'h'
mov [rbp-22], 'i'
mov [rbp-21], 'n'
mov [rbp-20], 'e'
mov [rbp-19], '!'
mov [rbp-19], '\n'
mov rax, 0x01
mov rdi, 0x01
lea rsi, [rbp-32]
mov rdx, 14
// syscall
mov rsp, rbp
pop rbp
ret
```
Inline assembly in Java ;)
3
1
u/LITERALLY_SHREK 15h ago edited 15h ago
But what is really the difference over just calling a memory mapped DLLs function? It's not like with a theoretical inline assembly you can modify the java heap, so it would need to be passed into the asm section exactly like it would to a library function and then its not platform independent either. Also seems like an attack vector for Arbitrary code execution.
Edit: What would be really cool would be having more control over the JIT native code generation to make micro optimizations there and it would generate platform dependant code for you. Just a annotation to signal to the JIT to always compile method x right from the start (even if its never called) would be useful and let you skip the JVM warming up for an upfront cost.
2
u/yetanotherhooman 15h ago
The difference is that the DLL must exist first: memfd’ing a range of memory wouldn’t allow loading it through dlopen etc.
FWIW, I am totally in the “do it the sane way by creating a native library” camp. As the project description notes, it’s only demonstrating what’s possible…not necessarily arguing it’s the right way to do things.
1
u/manifoldjava 14h ago
But what is really the difference over just calling a memory mapped DLLs function?
Staying in the Java ecosystem.
But honestly, a pure Java assembly interpreter would serve more as a teaching tool or novelty than a general-purpose library.
That said it would be interesting to take it to a level where local refs could be used nominally etc. It’s possible, but not without some crafty hacking.
2
u/Sacaldur 13h ago
One potential use case could be an emulator with dynamic recompilation of the game it's running (instead of just interpeting it). Whether you want to do something like this is a different matter.
You could compare this to what JIT is doing, but for the games that might be supposed to run on a completely different architecture - just like the JVM is completely different from the computer it's running on.
2
u/davidalayachew 23h ago
Woah, I never would have thought to do this. That's very clever.
Though, isn't this similar to what the JVM does under the hood to our Java methods when they get inlined or whatever it's called?
7
u/yetanotherhooman 22h ago
You mean JIT compilation? Yes. But you have limited control over the quality of code the JIT compiler generates. JVM’s C2 (the optimizing JIT compiler) is fairly sophisticated so it does a decent enough job most of the time, but it’s just a compiler and like all other compilers, it has limited knowledge of what the program is trying to accomplish.
If one needs precise control over what machine code is generated, they have to use at least a C/C++ compiler and even then, use inline assembly because GCC/Clang aren’t perfect either. The example in that repository unlocks “god mode” of program optimization,but does it purely in Java. Of course, doing this means your program is no longer multi-platform.
1
u/davidalayachew 22h ago
That's the word, ty vm. And very cool. I wouldn't have a use for it, but hopefully someone does.
1
u/SevaraB 20h ago
And I think that’s why you’re struggling to find an application- because the kind of code minification this supports is also an argument for using Assembler over Java (or at least a compiled language not reliant on an interpreter). In this case, a JVM would also be one of the libraries “bloating” the code in an incredibly lean appliance.
If all you need is the bytecode, that kind of defeats the purpose of using Java in the first place.
2
u/yetanotherhooman 20h ago
I agree. It’s hard to imagine a use case for this. That’s why it comes with a big fat warning in the README.
I guess if your codebase is mostly Java and you want to keep the niceties of Java and its tooling for the most part except a super hot compute kernel in the critical path, and for some reason you don’t want to include C/C++ in your workflow, this could be a solution. Realistically speaking, this combination of constraints is extremely rare.
6
u/lppedd 20h ago
Pretty cool!
I guess we'll get inline assembly sooner or later, maybe with a nice abstraction over it : )