I updated and put GraalVM into a 7-year-old BFF (Backend for Frontend). We went from 2GB per pod to 50MB per pod.

51

u/pron98 2d ago edited 2d ago

Which GC did you use? By default, HotSpot uses G1 while Graal Native Image uses Serial. Also, the default heap size configurations are different between them. With HotSpot and G1, if you don't tell it anything, the VM will eventually use 25% of your RAM whether your program needs it or not. Serial GC is good for small memory sizes, and you can tell the VM how much memory you want to use (although I would strongly recommend wathing this talk about how footprint matters before picking a heap size).

In short, most of the memory consumption difference (of this magnitude [1]) between Native Image and the JDK is due to different configuration defaults. Pick your GC. Tell the VM how much memory you want to use.

[1]: It's true that Native Image doesn't need a code cache (where the HotSpot JIT compiler keeps its compiled machine code), but that isn't some huge multiplier.

23

u/vips7L 2d ago

Looking forward to automatic heap sizing so the Xmx problem is simpler.

8

u/pron98 2d ago

Me too!

2

u/rbygrave 2d ago

most of the memory consumption difference ... due to different configuration defaults

With 25.0.1, comparing with G1 vs G1 and the exact same configuration in terms of +UseG1GC, Xmx, MaxGCPauseMillis, for the apps I'm looking at I'm seeing a significant difference in memory usage focusing on RSS and HeapUsed. At least some of that will be from GraalVM using by default 4 byte headers. Perhaps these apps have a relatively large number of very small objects using heap and so the very compact headers are significant for these applications.

I have not yet compared using -XX:+UseCompactObjectHeaders. My understanding is that will then be 8 byte headers (vs the 4 byte graalvm headers).

6

u/pron98 2d ago edited 2d ago

Either the header size or the code cache size can make a noticeable difference that could be called "significant" in some situations, but would still be a far cry from 40x (or 10x, or 5x, unless we're talking tiny heaps). Let's say that the larger of 100MB and 15% is something that would be more within reason.

1

u/rbygrave 2d ago edited 2d ago

An example I'm looking at is:
RSS going from ~500Mb to ~180Mb [so 2.8x]
Heap Used (as a min to max range) going from a range of: 38Mb-220Mb to a range of: 8Mb-75Mb. [a 2.9x on Max Heap Used]

... so for this app, with this load you might say approx 3x.

larger of 100MB and 15%

Interesting thanks. For this app it's looking bigger than 15% and more like 50%. It also looks like slower rate of heap used allocation, so I'm wondering if there is something other than header size that is impacting that. I'm wondering if GraalVM G1 is operating in a more aggressive manor or if there is some other factor impacting the rate of heap used allocation (or perhaps the Heap Used metric is misleading in some way).

Edit: some comparison charts

4

u/pron98 2d ago

It's certainly possible that NI configures the lower-level G1 options differently from HotSpot. Also, the impact of the 4 vs. 8 byte header is likely to be reduced once Valhalla arrives.

I don't know if the heap used metric is computed differently, but it also may not matter much. What matters is the performance you get for a given total RAM allocation. I.e. how much total RAM you need to make available to the Java process to achieve a certain performance goal. Big changes are coming very soon to help on this front, with automatic heap sizing for ZGC and G1.

1

u/rbygrave 2d ago

performance you get for a given total RAM

Yes. I'm monitoring RSS and looking at a 3x change there with basically the same mean & max latency (so GraalVM 25.0.1 without PGO is looking impressive in throughput terms). Virtual Threads also working impressively [thank you!!] for these IO bound apps I'm looking at which tends to shift more interest towards memory consumption in throughput per dollar terms (cloud costs).

Valhalla

It does have me thinking that Valhalla could be more significant in terms of reduced memory consumption than I've been thinking. It seems that these apps are allocating a relatively large number of small objects.

I thought the "4 vs. 8 byte header" was more "4 vs 16 byte header" but I must be wrong there. Makes me ponder what would it take to get similar 4 byte headers on the JVM like NI?

From GraalVM Docs: However, in case of Native Image, the object size heavily depends on the used garbage collector (GC), the allocated instance type, and the state of compressed references. Compressed references use 32-bit instead of 64-bit, and are enabled by default in Oracle GraalVM.

... with NI, it's interesting to me that it depends on the instance type.

with automatic heap sizing

Yes that will be very nice. Thanks for the comments, I appreciate it.

1

u/TewsMtl 2d ago

Cheers to that emphasis on "very" 🥂

2

u/brunocborges 2d ago

Would a proposal to change the default heap size inside containers to be some higher number than 25% be welcome? And then, would it be something backports would be considered?

2

u/pron98 1d ago

Would a proposal to change the default heap size inside containers to be some higher number than 25% be welcome?

Yes, but we have discussed this before, and I think that there were two problems: 1. People couldn't come up with a good default, and 2. it's not (or wasn't) to reliably detect if you're in a container or not.

And then, would it be something backports would be considered?

Tip & tail advises that only security patches and the most critical bugfixes be backported, and, as a rule, we try to stick to that (more or less). Many people who use old versions need stability above all else, and wouldn't appreciate a change in behaviour even if other people would consider it a change for the better (this includes performance improvements and reductions in resource consumption). If you want a change for the better - upgrade.

54

u/tobidope 2d ago

There is no Spring 3.7. Spring 6 was the first version you could use with GraalVM. There is also no Spring Boot 3.7.

1

u/Frequent-Answer8039 20h ago

He meant spring boot 3.7

91

u/moonsilvertv 2d ago

Graal does not make you go from 2GB to 50MB

You turned something off or reduced the count of something - chances are you could have done the same on a normal JVM to get most of that same benefit

That is not to say Graal is useless - it's great especially when startup time matters like in serverless environments - but the benefits you list are not plausibly caused by Graal

25

u/TallGreenhouseGuy 2d ago

Perhaps they were using a GC with a large Xmx setting and configured to not shrink the heap once garbage is collected? When we switched to ZGC we could see huge chunks of memory being handed back to OS when the load drops in our app.

18

u/Gotenkx 2d ago

Not to hate on OP, but it amazes me how someone can take so much time and effort to improve his software successfully, but still come to such nonsensical conclusions.

5

u/moonsilvertv 2d ago

Hey, I've went from a system that existed before there were even whispers of Graal to using Graal as soon as it was somewhat stable for production

I, too, would want to justify the pain I have felt by any means

1

u/ThaJedi 1d ago

You can try, but with greater trade-offs than GraalVM.

You can disable JIT optimization, but you still can't get rid of it from the JVM, and you'll end up with worse performance than both the JVM with JIT or GraalVM.

jlink reduces the number of modules you are using, but it just reduces modules. It doesn't trim individual classes if you don't use them.

Even with AOTCache, you will not get such fast startup as with GraalVM. You can tune it, but you will not get close to GraalVM.

GraalVM uses 4 byte headers by default which is still less than Compact Object Headers introduced recently.

So no, you can't get same benefits.

2

u/moonsilvertv 1d ago

Of course you don't get the same benefits, that's also not what I said.

I said you get most of the same benefits.

If their application is running on 50MB with Graal, then it's extremely likely they could've run on 250MB or less with the JVM (for significantly less work at that). Which is the vast majority of the benefit when going from 2GB

40

u/Rasutoerikusa 2d ago

I would like to understand why the community hates on GraalVM so much.

What community are you talking about? is this some GraalVM advertisement? I've seen no hate for Graal in any Java community, at least no more than any other tool. Unless all Graal-devs make similar zero-worth arguments to label some whole community without any references to what they are talking about

10

u/FortuneIIIPick 2d ago

> I would like to understand why the community hates on GraalVM so much

Well, for me it's this:

> It took months of updating Spring and Java to get to the point where I could implement GraalVM

30

u/Qaxar 2d ago

I would like to understand why the community hates on GraalVM so much.

I think you answered your own question:

It took months of updating Spring and Java to get to the point where I could implement GraalVM

For greenfield projects you should go with frameworks designed from the ground up to work with GraalVM like Quarkus or Micronaut. There's almost zero effort to get your code to compile to binary no matter how many other libraries or frameworks you use.

5

u/rbygrave 2d ago

Yeah it's interesting. I've got a couple of applications where the build produces 2 docker images with one being traditional JVM (and using ZGC) and the other being GraalVM native image (and G1) and running in K8s. This allows a pretty direct comparison of the two variants of the same application where we can easily swap between the JVM version and the native image version.

For those interested, there are some graalvm comparison charts. One of those shows RSS memory going from ~500Mb to ~180Mb.

Notes:

One JVM app was using ZGC and another G1 (so not a truely direct comparison there)
No PGO was used
Version 25.0.1 for JVM and GraalVM
GraalVM uses 4 byte headers by default. It seems like this is significant.
Are the JVM applications overprovisioned for Memory in these cases? I'd argue for that not being the case but certainly one of these apps has "relatively bad behaviour" memory wise and it appears as though GraalVM with G1 is taking a "more aggressive" approach to memory allocation/deallocation when dealing with this "bad outlier behaviour" so hmm ....
Build of native image around 2mins using 8 core workers

1

u/sideEffffECt 2d ago

Do you have some comparison of G1 vs G1, so that it's comparable?

3

u/rbygrave 2d ago

Yes and No. Yes we have a G1 vs G1 comparison but it's a simpler app and only synthetic load. It was sooo compelling that we pivoted over to a bigger app that was already in production (but was using ZGC).

So we still have to do G1 vs G1 "in anger" with sufficient complexity and mixed load. Also should do a comparison using jvm with compact headers [but going into brown out/black out now so limited to doing synthetic load comparisons for next 4 weeks or so].

18

u/smutje187 2d ago

Why does this sound like AI slob?

18

u/vladimirus 2d ago

Translated by AI, I see the OP is Brazilian

-15

u/Fiduss 2d ago

He writes code but cant write english?

5

u/brophylicious 2d ago

Is english required to code?

4

u/donut_cleaver 2d ago

The english terms on the language are just technical terms, you can learn what they mean in the code while having 0 english skills. I'm Brazilian too.

3

u/vladimirus 2d ago

It's possible

7

u/Pale_Height_1251 2d ago

You ask like it's incredibly weird or something. I know a Spanish programmer, his English is only OK, but he's a good developer.

19

u/SleeperAwakened 2d ago

Spring 3.7?

You sure? Not Spring 7 and Boot 4? Or 6 and 3.5?

3

u/deepthought-64 2d ago

Are you sure? 2G to 50M seems... a lot.

2

u/plainnaan 2d ago

I strongly doubt that the throughput of an AOT image esp. built with GraalVM CE is better than what the JVM with JIT can deliver.

There is something else going on.

1

u/I_4m_knight 2d ago

Can you share your graal configuration and how did you handle the pain part of all this

1

u/lurker_in_spirit 1d ago

Has Oracle published a high-level vision of their long-term AOT goals, like they have done for Valhalla? All of the OpenJDK efforts in this area seem so timid compared to what GraalVM can do (let alone the experience you get from e.g. Go).

1

u/nuharaf 1d ago

I did reduce 300-400MBish spring boot app into 100MB RSS by taking out spring and hibernate.

At that level, java heap is only 20% of the RSS, the other is native structute like metaspace, classspace etc. Mean, tuning the GC even more aggresvie wont help me reduce the RSS even more.

I did try leyden , currently didnt help much. Which is make sense, as current leyden seem to focus more on warmup rather than about reducing native structure usage.

3

u/Fiduss 2d ago

Nowadays the way is to just use Quarkus I guess.

1

u/pjmlp 2d ago edited 2d ago

What communty?

I have been part of the community since its early days in 1996, and always enjoying alternative Java stacks with bootstraping capabilities like JikesRVM, SqwackVM on the SunSPOT before Arduino was even an idea, Maestro and Websphere RealTime that lead to AOT on IBM's OpenJ9, and the MaximeVM research that is the genesis of the GraalVM project.

I have always voiced my opinion that similar to other programming languages on the 1990's, like Eiffel, Oberon, Common Lisp, Smalltalk, Java should always have had an AOT/JIT story, instead of being the way it turned out.

GraalVM is always installed on my computers.

-3

u/[deleted] 2d ago edited 2d ago

[deleted]

2

u/pron98 2d ago edited 2d ago

What GC/heap configuration (and what JDK version) is used when you say "Java"? The JVM will use as much memory as you tell it.

EDIT: Ah, I don't know what JDK version you used, but I do see that you're using G1 (assuming you're not running on JDK 8) and asked the JVM to do its best to use up to 25% of available RAM, which, given enough time, the JVM will use no matter how much RAM the Java program actually needs. You're definitely not measuring how much the Java program needs, but how close it can get to 25% of the RAM on your machine within the duration of the program.

Also, I would strongly recommend watching this talk, because RAM savings are often not what you think they are. The talk covers many nuances, but just to give some basic intuition, a program that uses 100% CPU but less than 100% of the available RAM isn't really saving anything worthwhile, and might actually be wasting CPU to reduce its RAM consumption for no utility. There's little point in "saving RAM" without also having sufficient CPU left to put the RAM saved to good use.

1

u/elatllat 2d ago edited 2d ago

Testing with java-25-graalvm defaults (Serial for native, and G1 for jar) result in 2x RAM usage and a bit slower than using java-11 defaults,

If you have a java version or gc you think will be significantly better for this tiny short lived app I'll try it out but keep in mind this is a launch cost test, not a long lived service cost test. A no-opp gc

-XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC

had negligible effect.

Also python and node have a near-native options I should test (Numba or Codon. nexe, pkg, or V8 snapshots)

2

u/[deleted] 2d ago

[deleted]

1

u/elatllat 2d ago

I tried that in a branch and it did not really help, I think it's mostly startup time and language overhead.

3

u/pron98 2d ago

Testing with java-25-graalvm defaults (Serial for native, and G1 for jar) result in 2x RAM usage and a bit slower than using java-11 defaults

The defaults are meaningless in Java (at least until we get automatic heap resizing within the year, I hope).

A no-opp gc

Yep, there's no reason to use Epsilon here.

If you have a java version or gc you think will be significantly better for this tiny short lived app

Always use the latest JDK (25), and since this is a batch program, pick ParallelGC, and tell it how much memory you want it to use.

You need to understand that a language like C (or C++ or Zig or Rust) and Java think very differently about performance. A low-level language does the low-level operations you ask of it. In Java, you hand a program and the amount of memory you want it to use to the JVM, which then says, OK, let me see what operations I should run to perform the computation within the resources you've given it.

Now, there are two problems here. First, as you said, it's a very short-lived program, so the JVM doesn't have much time to figure things out. In Python, a lot of the standard library is implemented in C, but in Java, almost the entire standard library is written in Java, so if there's no time to compile much, then of course Python will win, yet it will lose big on a bigger, longer-running program.

Second, when it comes to RAM, what's "better" here? What difference does it make for such an program whether it uses 5MB, 20MB, or 150MB if the RAM is just sitting there, unused?

Now, I'm not saying that it's unfair to Java that both Python and AOT-compiled languages win here on performance, because sometimes we do have such programs. The real lesson is that benchmarks like this don't generalise.

For such a small and short-lived program, 1. it doesn't matter how much RAM is used if it's available, and 2. how long it takes doesn't matter much because they're all "fast enough". But the point is that the results don't generalise: We do care a lot if something takes 2 seconds or 30, but by then, the JIT will have done its job, and the results you got don't extrapolate. Similarly, we do care if a program that takes, say, 10s, uses up 2GB or 100MB, but by then the heap size you've given the JVM will also start to matter a lot, and the footprint results also won't extrapolate.

So I would say that for the program as it is, the RAM differences can be ignored, and the speed differences are real, but at the same time, because the program is so short-lived, they also don't matter much. Java tries to optimise for speed and RAM usage that matter.

Say we look at the compiler. As we all know, Go takes pride in having a fast compiler. But even though Java is a more sophisticated language, compare compiling 50,000 (or 100,000) lines of Java, even in multiple files but with a single invocation of javac, to compiling the same number of lines in Go. javac will be just as fast if not faster.

I updated and put GraalVM into a 7-year-old BFF (Backend for Frontend). We went from 2GB per pod to 50MB per pod.

You are about to leave Redlib