Readit News logoReadit News
zeruch · 2 years ago
Early on (15+ years ago) I spent a few weeks there on contract and I noticed they used Java EVERYWHERE, and not always well. They had a CS app named after a key Star Wars character that was in all likelihood a breach of the Geneva Convention. A code atrocity with the performance of a sloth on its 8th bong rip with a UX from hell.
jedberg · 2 years ago
If it helps that CS app was rewritten about 10 years ago (when I worked there, but not on that app) in part due to the complaints you mention. It's totally true that most resources were spent on customer facing apps. Internal apps were definitely not of the same quality, because they didn't need to be.
zeruch · 2 years ago
Good to hear. The fact that in 2005 you had an app that required seemingly petabytes of memory to operate, and put on machines barely powerful enough to play minesweeper, was in and of itself a series of bad decisions...but the app itself, and it's layout were just maddening. It's like MC Escher was the UX lead.
sillywalk · 2 years ago
"A code atrocity with the performance of a sloth on its 8th bong rip with a UX from hell."

Sounds like Apple Music.

hbn · 2 years ago
For some reason whenever I'm on my work's VPN, Apple Music lets me play 1 album and then the next time I try to start a song it will tell me I'm not logged in and I'll have to force quit and relaunch (frequently a few times) before it will let me play another album.

Apple Music is the only app that has this problem.

fmntf · 2 years ago
I was thinking of Jira
Arrath · 2 years ago
Evocative description there, bravo
civilitty · 2 years ago
It's so evocative that I don't even care about Netflix or Java anymore.

I just want to know where I can buy a bong ripping sloth [1] and whether they're legal in California.

[1] https://imgur.com/a/S3NVS16

myvoiceismypass · 2 years ago
When I was there a decade ago, it started becoming more polyglot friendly (node apps had to use a jvm sidecar to do internal communications originally!)
jedberg · 2 years ago
My team wrote some of the Python libraries for internal services just so we could avoid that Java sidecar! It took 10 times longer to boot the sidecar than the Python app.
dt3ft · 2 years ago
Netflix should make a documentary about this.
Tim25659 · 2 years ago
Ha..Ha.Ha
bruh2 · 2 years ago
What does CS stand for here? I guess it's not computer science?

Also, that description made me lmao, thanks

jedberg · 2 years ago
Customer service.
khalilravanna · 2 years ago
Another reminder that acronyms are pretty terrible for communication. Every time I onboard with a new org there’s a whole new set of acronyms to learn that’s barely faster than typing out the unabbreviated version. Nice to save a couple seconds when the cost is only a bunch of people not able to follow along when people are communicating.

To be clear: not ragging on OP in particular at all but more at the widespread practice at a company level.

zeruch · 2 years ago
CS = Customer Support/Care in that regard.
jarym · 2 years ago
Interesting the article jumps straight from REST to GraphQL and forgets Falcor[0] - Netflix's alternative vision for federated services. For a while it looked like it might be a contender to GraphQL but it never really seemed to take off despite being simpler to adopt.

[0] https://netflix.github.io/falcor/

paulbakker · 2 years ago
Falcor is actually part of the "old" architecture described in the talk. Because it's mostly unknown and no longer used I didn't go into the details of it.

Falcor was developed at the time Facebook was developing GraphQL in-house. It has similar concepts, but never took off the way GraphQL did.

parthdesai · 2 years ago
Netflix themselves have moved off falcor though

https://netflixtechblog.com/migrating-netflix-to-graphql-saf...

lfkdev · 2 years ago
`Sad Prime noises`

Deleted Comment

dustingetz · 2 years ago
iirc falcor predated graphql
ppseafield · 2 years ago
I was at the React Rally conference where Falcon was publcly announced in August of 2015. I recall that Facebook gave a GraphQL presentation right before.

It seems GraphQL was first announced publicly in February 2015.

baby · 2 years ago
Probably because most people don't want to work with Java
ValtteriL · 2 years ago
>Netflix observed a 20% increase of CPU usage on JDK 17 compared to JDK 8. This was mostly due to the improvements in the G1 garbage collector.

Help me here, why do GC improvements cause CPU increase?

blackoil · 2 years ago
I think this is a 20% improved utilization of CPU, earlier app was memory-bound or/and GC was consuming CPU. Now app has 20% more CPU available. It should be doing correspondingly more work. This could definitely be written clearly.
moffkalast · 2 years ago
> Bakker provided a retrospective of their JDK 17 upgrade that provided performance benefits, especially since they were running JDK 8 as recently as this year. Netflix observed a 20% increase of CPU usage

Seems like it's exactly that, OP cropped out the relevant bit where they list it having an overall performance benefit for that extra CPU time. Otherwise it could be assumed that it just hogs more CPU to get the same result.

bunderbunder · 2 years ago
I haven't dealt with this side of Java in a while, but it reflects my experience poking at Java 8 performance. At some (surprisingly early) point you'd hit a performance wall due to saturating the memory bus.

A new GC could alleviate this by either going easier on the memory itself, or by doing allocations in a way that achieves better locality of reference.

Deleted Comment

jillesvangurp · 2 years ago
Most modern GCs trade off CPU usage and latency. Less latency means the CPU has to do more work on e.g. a separate thread to figure out what can be garbage collected. JDK 8 wouldn't have had the G1 collector (I think, or at least a really old version of that) and they would have probably been using one of the now deprecated garbage collectors that would be collecting less often but have a more open ended stop the world phase. It used to be that this would require careful tuning and could get out of hand and start taking seconds.

The new ZGC uses more CPU but it provides some hard guarantees that it won't block for more than a certain amount of milliseconds. And it supports much larger heap sizes. More CPU sounds worse than it is because you wouldn't want to run your application servers anywhere near 100% CPU typically anyway. So, there is a bit of wiggle room. Also, if your garbage collector is struggling, it's probably because you are nearly running out of memory. So, more memory is the solution in that case.

BinaryRage · 2 years ago
The figure is about the overall improvement, not sure why that reads increase.

On JDK 8 we are using G1 for our modern application stack, and we saw a reduction in CPU utilisation with the upgrade with few exceptions (saw what I believe is our first regression today: a busy wait in ForkJoinPool with parallel streams; fixed in 19 and later it seems).

G1 has seen the greatest improvement from 8 to 17 compared to its counterparts, and you also see reduced allocation rates due to compact strings (20-30%), so that reduces GC total time.

It's a virtuous cycle for the GRPC services doing the heavy lifting: reduced pauses means reduced tail latencies, fewer server cancellations and client hedging and retries. So improvements to application throughput reduce RPS, and further reduce required capacity over and above the CPU utilisation reduction due to efficiency improvements.

JDK 21 is a much more modest improvement upgrading from 17, perhaps 3%. Virtual threads are incredibly impressive work, and despite having an already highly asynchronous/non-blocking stack, expect to see many benefits. Generational ZGC is fantastic, but losing compressed oops (it requires 64-bit pointers) is about a 20% memory penalty. Haven't yet done a head to head with Genshen. We already have some JDK 21 in production, including a very large DGS service.

ahoka · 2 years ago
I don't think he meant that.
Macha · 2 years ago
A somewhat common problem is to be limited by the throughput of CPU heavy tasks while the OS reports lower than expected CPU usage. A lot of companies/teams just kind of handwave it away as "hyperthreading is weird", and allocate more machines. Actual causes might be poor cache usage causing programs to wait on data to be loaded from memory, which depending on the CPU metrics you use, may not show as CPU busy time.

For companies at much smaller scale than netflix where employee time is relatively more costly than computer time, this might even be the right decision. So you might end up with 20 servers at 50% usage, but using 10 servers will take twice as long but still appear to be at 50% usage.

If the bottlenecks and overhead are reduced such that it's able to make more full use of the CPU, you might be able to reduce to e.g. 15 machines at 75% CPU usage. Consequently the increased CPU usage represents more efficient use of resources.

CraigJPerry · 2 years ago
>> while the OS reports lower than expected CPU usage

>> which depending on the CPU metrics you use, may not show as CPU busy time

If your userspace process is waiting on memory (be that cache, or RAM) then you’ll show as CPU busy when you look in top or whatever - even though if you look under the covers such as via perf counters, you’ll see a lack of instructions executed.

The CPU is busy in this case and the OS won’t context switch to another task, your stalled process will be treated as running by the OS. At the hardware thread level then it will hopefully use the opportunity to run another thread thanks to hyper threading but at the OS level your process will show user space cpu bound. You’ll have to look at perf counters to see what’s actually happening.

>> you might end up with 20 servers at 50% usage, but using 10 servers will take twice as long but still appear to be at 50% usage.

Queue theory is fascinating, the latency change when dropping to half the servers may not be just a doubling. It depends on queue arrival rate and processing time but the results can be wild, like 10x worse.

xorcist · 2 years ago
When you put it like that, yes. Hardware is cheap and all that. In practice I think that an organization that doesn't understand the software it is developing has a people problem. And people problems generally can't be solved with hardware.

If somebody knows how to make that insight actionable, let me know. No, hiring new people is not the answer. In all likelihood that swaps one hard problem for an even harder.

edpichler · 2 years ago
To free memory. Also, 20% increase is not 20% in total. It's 20% when you go from 10 to 12 cpu usage, or from 50 to 60, for instance.
_the_inflator · 2 years ago
Well done.

I always appreciate numbers and the differentiation between relative and absolute numbers in this case.

"We doubled our workforce in one week!" - CEO's first hire... ;)

matsemann · 2 years ago
The CPU can do more tasks without being limited by memory pressure, perhaps?

I guess it depends on if they mean "we used 20% more CPU for the same output", or "we could utilize the CPUs 20% more".

paulbakker · 2 years ago
It’s a 20% improvement. So less time spent on GC.

Deleted Comment

znpy · 2 years ago
> Help me here, why do GC improvements cause CPU increase?

In Java 8 (afaik) there were pretty much no generational or concurrent garbage collectors, so garbage collector would happen in a stop-the-world manner: all work gets put on a halt, garbage collection happens, then the work can resume.

If you have a better GC, you have shorter and less frequent needs to do a stop the world pause.

Hence the code can run on cpu for more time, getting you higher cpu usage.

Higher cpu usage is often actually good in situations like this: it means you're getting more work done with the same cpu/memory configuration.

dboreham · 2 years ago
Java8 was at least a decade into generational and concurrent GC. It does STW once in a while though which may be what you meant.
tpm · 2 years ago
I read it as a good thing: GC improvements -> more available memory -> more work done by the CPU. But still would be interested in more detail.
groestl · 2 years ago
Because the memory / I/O is not the bottleneck anymore, and the CPU can now run optimally.
jjtheblunt · 2 years ago
I haven’t seen the specific profiling data, but it’s possible that the garbage collector is running a collection thread, concurrently with regular processing threads, and thereby preventing entire world synchronization points which would idle processor cores.
ahoka · 2 years ago
Higher CPU usage paradoxically means better performance. When I last did OPS we used to watch total CPU usage of all services and if it was not 100%, then we started to look for a bottleneck to fix.
radomir_cernoch · 2 years ago
Also interested! We saw basically the exact opposite. :-)
pyeri · 2 years ago
It's like hiring more workers to accomplish the exact same output as before. "See, I achieved 20% growth in my targets!", some recruiter will say!
groestl · 2 years ago
No, it's like improving a form to minimize the need for follow-up questions to the customer, and now seeing your workers (the same you had before) processing 20% more forms instead of waiting for responses.
dewey · 2 years ago
In case you are wondering what LOLOMO stands for, it's "List of List of Movies".
dlhavema · 2 years ago
Most of the postings for backend positions at Netflix I've seen call out nodejs. Can I assume they do both? Is one legacy and the other newer stuff, or are they more complimentary?

Anyone on in the inside know?

nameless912 · 2 years ago
Things are certainly more of a blend now than what's presented in this presentation, but the presenter is a big Java platform guy here. I would say ~70% of the services I interact with on a day to day basis are Java, another 20% in Node, and then the last 10% is a hodgepodge of Python, Go, and more esoteric stuff.

It varies from team to team; the "Studio" organization that supports creating Netflix content does lots of nodeJS due to the perception that it's faster to iterate on a UI and API together if they're both in the same language. On my team, we're very close to 50/50 due to managing a bunch of backend, business process type systems (Java), and a very complex UI (with a NodeJS backing service to provide a graphql query layer). Regardless, the tooling is really quite good, so interacting with a Node service is roughly identical to interacting with a Java service is roughly identical to interacting with anything else. We lean into code generation for clients pretty heavily, so graphQL is a good fit, but gRPC and Swagger are still used pretty frequently.

dlhavema · 2 years ago
Thanks for responding. That's good insight
agilob · 2 years ago
Is this the talk? Looks like this is it https://www.youtube.com/watch?v=5dpLVvRpPPs
paulbakker · 2 years ago
The an older version of the same talk. Things have moved a bit since, Java 21 and such, but mostly the same.
edpichler · 2 years ago
Apparently, it is.
yayitswei · 2 years ago
I heard Clojure is fairly popular at Netflix as well.
jvican · 2 years ago
Not true. Clojure use is very rare.
yayitswei · 2 years ago
Good to know, thanks. I don't have insider knowledge but at least from various posts it looks like there's some healthy usage at scale, e.g. https://news.ycombinator.com/item?id=18345341, https://news.ycombinator.com/item?id=18348295

Things may have changed in the last 5 years, though.

technion · 2 years ago
I think this should be assumed for any "x company uses y uncommon language heavily" argument that you read online.
inparen · 2 years ago
Spring Boot and Spring cloud for backend & graphql for the win. ;-)
RamblingCTO · 2 years ago
No, just no. Performance and debugging are just plain horrible. The spring team loves to force you into their automagic shit and this bean stuff is so annoying. You almost got no compile time safety in this stack. It's the bane of my existence. I'd like to know that a compiled program will run. That seems virtually impossible with java/spring boot.
StevePerkins · 2 years ago
I'm not sure what "no compile time safety in this stack" even means in the context of a strongly-typed compiled language.

If you are referring to the dependency injection container making use of reflection, then Spring Native graduated from experimental add-on to part of the core framework some years ago. You can now opt for Quarkus/Micronaut-style static build-time dependency injection, and even AOT compilation to Go-style native executables, if you're willing to trade off the flexibility that comes with avoiding reflection. For example, not being able to use any of the "@ConditionalOnXXX" annotations to make your DI more dynamic.

(Personally, I don't believe that those trade-offs are worth it in most cases. And I believe that all the Spring magic in the universe doesn't amount to 10% of what Python brings to the table in a minimal Django/Flask/FastAPI microservice. But the option is there if your use case truly calls for it.)

Honestly, I've never run into anyone who considers Spring to be "the bane of their existence", where the real issue wasn't simply that the bulk of their experience was in something else. Where they weren't thrown into someone else's project, and resent working with decisions made by other people, but don't want to either dig in and learn the tech or else search for a new job where they get to make the choices on a greenfield project.

pylua · 2 years ago
Spring is basically a standard in itself and it is easier to hire people in it. It also normalizes large pieces of the backend application so even though they are written by different people they are similar.

Once you learn the annotation based configuration it also saves a lot of time.

The performance is valid but it will only keep improving.

Fabricio20 · 2 years ago
It's funny to see this perspective! I used to work in a few companies locally who had adopted the early java-ee style for their applications and my experience is exactly the opposite. When going to spring I'm usually diagnosing issues on the application layer (ie: business issues, not framework issues), while on the java-ee applications I was often having to fix issues down at the custom persistence layer each company had, etc.. I see where you come from having looked at the "old" spring stack (non -boot), and I can see people getting mad over the configuration hell and how stuff is hidden behind xml.. Much like how java-ee is!
dimgl · 2 years ago
I completely agree with this. Spring was an absolute nightmare during the short period of time where I had the misfortune of using it. It also didn't help that the codebase was a monstrosity... classes following no design patterns and having 40k lines. But still...
nameless912 · 2 years ago
That has not been my experience on the inside - I spend most of my days working on a Spring Boot based service at Netflix and frankly it's one of the most effortless environments I've ever worked in. Granted, there's a lot of ecosystem support from the rest of the company, but things are very low effort, and generally very predictable. I can usually drop a breakpoint in a debugger in exactly the right spot and find a problem immediately.
vmaurin · 2 years ago
The issue with Spring ecosystem is that people use it without knowing why or which problem it solves but because almost everyone is using it. And most of the time, they don't need Spring (maybe a company like Netflix did, but it didn't prove to be the right choice at the end)
didntcheck · 2 years ago
It's not quite as good as compile-time or type-based guarantees, but IME configuration errors with Spring are almost always flagged up immediately on application startup. As long as you have at least one test that initializes the application context (I.e. @SpringBootTest) then this should be caught easily
krooj · 2 years ago
This is just... ignorance; your argument is basically, "I don't understand/want to learn how X works; therefore, X must be garbage"
smrtinsert · 2 years ago
Performance and debugging simple, and compile time safety is Javas core domain. I think you're over focusing on proxying or enhancement of beans, but if you look at a documentation for a reasonable amount of time there's really nothing to it.
misja111 · 2 years ago
FYI, you can still use XML based configuration in Spring. The choice is yours. See https://docs.spring.io/spring-framework/docs/4.2.x/spring-fr...

I agree it is not common to do it, most teams follow the autoconfiguration madness.

bedobi · 2 years ago
100% agree, Java and Spring are a mess and there's no justifiable reason to use them in 2023 (and no, "that's what we've always used" isn't a good justification)

Like srsly even DropWizard is better than Spring lol, let alone other even simpler frameworks like Ktor which is built on a much improved language over Java

wing-_-nuts · 2 years ago
What do you propose as an alternative? Something like Micronaut trades more compile time for stricter checks and faster runtime. Do you use something like that?
twh270 · 2 years ago
We've adopted Quarkus and it's been a breath of fresh air. Excellent all around, DX, performance, features, it's all been good.

Dead Comment

Cthulhu_ · 2 years ago
Spring is a safe and reliable choice I'd say; not the most exciting, but neither code nor frameworks should be exciting, they're used to solve a problem, they shouldn't become the problem itself.

GraphQL is interesting to me, I thought the clients were pretty similar across all platforms, meaning their API usage should also be similar enough to not need the flexible nature of GraphQL. But then, it allows for a lot more flexibility and decoupling - if a client needs an extra field, the API contract does not need to be updated, and not all clients need to be updated at once. Not all clients will be updated either, they will need to support 5-10+ year old clients that haven't updated yet for whichever reason.

m_0x · 2 years ago
> not the most exciting

It was exciting when J2EE was dominating.

robertlagrant · 2 years ago
Well, if the field is not available then new backend code will need to be written, resolvers, integrations, etc. But it does allow UIs to take less info over the wire, and eitherfewer joins need to be done or fewer performance-oriented APIs need building, as you say.
krooj · 2 years ago
The stack is tremendously productive, but history has taught me a few things when dealing with Spring:

1. It's always best to start people off with plain old spring, even with an XML context, such that they understand the concepts at play with higher level abstractions like Boot. Hell, I even start with a servlet and singletons to elucidate the shortcomings of rolling your own. 2. Don't fall prey to hype around new projects in the Spring ecosystem, such as their OAuth2 implementation, since they often become abandonware. It's always best to take a wait and see approach 3. Spring Security is/was terrible to read, understand, and extend ;)

inparen · 2 years ago
Ha ha, spring security is tricky and high chance may surprise some one while "boot"strapping a new project. But once done, it is out of way.

I did not like much of the XML, because it always seemed lot of duplication. All you doing is copying bean definitions and changing bean id and class/interface most of the time. But it became non issue over time. Now spring boot made it really easy with all those annotations.

olavgg · 2 years ago
I am a big fan of Spring Boot, its one of the few frameworks that just works and let me focus 100% on solving business problems. I've tried Micronaut, Quarkus, Dropwizard, but they slow me down too much compared to just using Spring Boot.

For me delivering business value is the most important metric when I am comparing frameworks. Spring Boot wins every time.

ramon156 · 2 years ago
May I recommend Symfony? You get the advantages of Spring but also the nicer things of PHP :-)
baby · 2 years ago
I had to review a Spring application once and that convinced me never to work with Java ever again