Readit News logoReadit News
palmfacehn · 2 years ago
The rich snippet inspection tool will give you an idea of how Googlebot renders JS.

Although they will happily crawl and render JS heavy content, I strongly suspect bloat negatively impacts the "crawl budget". Although in 2024 this part of the metric is probably much less than overall request latency. If Googlebot can process several orders of magnitude of sanely built pages with the same memory requirement as a single React page, it isn't unreasonable to assume they would economize.

Another consideration would be that "properly" used, a JS heavy page would most likely be an application of some kind on a single URL, whereas purely informative pages, such as blog articles or tables of data would exist on a larger number of URLs. Of course there are always exceptions.

Overall, bloated pages are a bad practice. If you can produce your content as classic "prerendered" HTML and use JS only for interactive content, both bots and users will appreciate you.

HN has already debated the merits of React and other frameworks. Let's not rehash this classic.

tentacleuno · 2 years ago
> If you can produce your content as classic "prerendered" HTML and use JS only for interactive content, both bots and users will appreciate you.

Definitely -- as someone who's spent quite a lot of time in the JavaScript ecosystem, we tend to subject ourselves to much more complexity than is warranted. This, of course, leads to [mostly valid] complaints about toolchain pain[0], etc.

> HN has already debated the merits of React and other frameworks.

I'll note though that while React isn't the cure-all, we shouldn't be afraid of reaching for it. In larger codebases, it can genuinely make the experience substantially easier than plain HTML+JS (anyone maintain a large jQuery codebase?).

The ecosystem alone has definitely played into React's overall success -- in some cases, I've found the complexity of hooks to be unwarranted, and have struggled to use them. Perhaps I'm just not clever enough, or perhaps the paradigm does have a few rough edges (useEffect in particular.)

[0]: Toolchain pain is definitely a thing. I absolutely hate setting toolchains up. I spent several hours trying to setup an Expo app; curiously, one of the issues I found (which I may be misremembering) is that the .tsx [TypeScript React] extension wasn't actually supported. Definitely found that odd, as you'd assume a React toolkit would support that OOTB.

osrec · 2 years ago
HTML is often not flexible or capable enough. JS exists for a reason, and is an integral part of the web. Without it, you will struggle to express certain things online and lean JS sites can be really quite nice to use (and are generally indexed well by Google).

Bloated JS sites are a horrible thing, but they almost sideline themselves. I rarely visit a bloated site after an initial bad experience, unless I'm forced.

jraph · 2 years ago
For documents, you can absolutely have all the structured content in HTML, and add JS to improve things. This way, you have your feature rich experience, the bot can build its indexing without having to run this extra js, and I have my lightweight experience.

Progressive enhancement :-)

dlevine · 2 years ago
I work for a company that enables businesses to drop eCommerce into their websites. When I started, this was done via a script that embedded an iFrame. This wasn't great for SEO, and some competitors started popping up with SEO-optimized products.

Since our core technology is a React app, I realized that we could just mount the React app directly on any path at the customer's domain. I won't get into the exact implementation, but it worked, and our customers' product pages started being indexed just fine. We even ranked competitively with the upstarts who used server-side rendering. We had a prototype in a few months, and then a few months after that we had the version that scaled to 100s of customers.

We then decided to build a new version of our product on Remix (SSR framework similar to nextjs). It required us to basically start over from scratch since most of our technologies weren't compatible with Remix. 2 years later, we still aren't quite done. When all is said and done, I'm really curious to see how this new product SEOs compared to the existing one.

chrisabrams · 2 years ago
Given that your Remix version has been ~2 years in development by X number of developers, what are the other expected outcomes? It sounds like potential SEO performance is unknown? Is the development team happy with the choice? I can't recall working somewhere that allowed us to work on a project for two years and not release to production, how did you get business buy in?
giraffe_lady · 2 years ago
You don't need buy-in when they tell you to do it!

Not OP but I've definitely seen a "leadership has decided on a rewrite into a new technology" project not ship for a couple years. I doubt it ever shipped, I didn't stay around to find out.

dlevine · 2 years ago
The other outcomes are a redesign and more configurability plus a bunch of new features. It wasn't really an apples to apples comparison. The non-iFrame version was more of a 1.1, where the new thing we are building is a 2.0. Based on some other projects, I do suspect it would have gone faster if we built it on the old stack.

The development team made the choice to go with Remix (well, the tech lead and VP of engineering). No one had used this tech before. We have subsequently talked about whether it would have been better to do the whole thing with Rails + Hotwire. We have been using this approach elsewhere in our stack, and it seems to be a lot conceptually simpler than rendering JS server side and then hydrating it.

sanex · 2 years ago
Currently building something similar and following your path. Rendering an iframe and are working on bundling react app into a custom HTML tag and dropping that onto the page. Would be curious to hear more about your experience.
dlevine · 2 years ago
We ended up using a reverse proxy since we wanted each product page to have a separate URL. Basically /shop/* would resolve to our React app, which rendered the correct page based on the URL. You could configure this pretty easily using NGINX or Apache, but our customers were pretty technically unsophisticated so it was too much work for them to do it this way.

In the end, we built a Wordpress plugin since it turned out that most of our customers used Wordpress. This plugin acted as the reverse proxy. We went a step beyond this and did some cool stuff to let them use a shortcode to render our eCommerce menu within their existing Wordpress template.

One wrinkle with ditching the iFrame was getting our CSS to not conflict with their CSS. I ended up putting our stuff within a shadow DOM, which was a bit of work but ended up working pretty well.

38 · 2 years ago
> nextjs

FYI nextjs is notoriously user hostile and one of the worst pieces of client side code I've ever seen, second only to Widevine. Who dumps 2mb of JSON directly into the HTML?

cjblomqvist · 2 years ago
It's been SSR SPA best practice for a decade at least (when keeping your model data client side).
jxi · 2 years ago
I actually worked on this part of the Google Search infrastructure a long time ago. It's just JSC with a bunch of customizations and heuristics tuned for performance to run at a gigantic scale. There's a lot of heuristics to penalize bad sites, and I spent a ton of time debugging engine crashes on ridiculous sites.
esprehn · 2 years ago
This isn't accurate as of a number of years ago. They run headless chrome (as mentioned in the article). No more hacked up engine or JSC.
emptysea · 2 years ago
Do you know why they use JSC rather than the V8?
esprehn · 2 years ago
They don't, but at one point you might do that for lower resource consumption.
stonethrowaway · 2 years ago
Could we trouble you for a blog post? Super curious to read more about this.
ta12197231937 · 2 years ago
Care to explain why you sold your soul? Working for an ad agency that is.
jessyco · 2 years ago
This line of questioning doesn't invite any kind of positive conversation. Why not ask more politely or even just change your question so it invokes thoughtful answers.
orenlindsey · 2 years ago
I really think it would be cool if Google started being more open about their SEO policies. Projects like this use 100,000 sites to try to discover what Google does, when Google could just come right out and say it, and it would save everyone a lot of time and energy.

The same outcome is gonna happen either way, Google will say what their policy is, or people will spend time and bandwidth figuring out their policy. Either way, Google's policy becomes public.

Google could even come out and publish stuff about how to have good SEO, and end all those scammy SEO help sites. Even better, they could actively try to promote good things like less JS when possible and less ads and junk. It would help their brand image and make things better for end users. Win-win.

capnjngl · 2 years ago
I'm sure there's some bias since it's coming from the horse's mouth, but Google does publish this stuff. Their webmaster guidelines have said for years to make content for users, not robots, and some recent updates have specifically addressed some of the AI SEO spam that's flooding the internet[1]. Their site speed tools and guidelines give very specific recommendations on how to minimize the performance impact of javascript[2].

[1] https://developers.google.com/search/docs/fundamentals/creat...

[2] https://developers.google.com/speed/docs/insights/v5/about

sureIy · 2 years ago
Spam makes transparency impossible. Like you, spammers have to spend months figuring out what works and what doesn’t. If Google is clear, it’s just abused. You can see this every day with free services and they either have to make it harder for everyone or just succumb.
dplgk · 2 years ago
Except that spammers have the incentive to spend months figuring out and normal people don't. So the spammers prevail anyway.
mirkonasato · 2 years ago
It's from 2019 so things may have changed since, but there's a great video on YouTube explaining "How Google Search indexes JavaScript sites" straight from the horse's mouth: https://youtu.be/LXF8bM4g-J4
StressedDev · 2 years ago
Google will never tell you how the ranking algorithm works because if they did, the ranking algorithm would be gamed. Basically, the problem is a lot of people will try to get less relevant content to rank higher than the best content. If you tell these people how Google's ranker works, they will make Google search worse because they will learn how deceive the ranker.

A ranker is a piece of software which determines what results should be show to a user on the search results page.

TZubiri · 2 years ago
duh.

The nerve of parent comment telling Google what to do.

encoderer · 2 years ago
I did experiments like this in 2018 when I worked at Zillow. This tracks with our findings then, with a big caveat: it gets weird at scale. If you have a very large number of pages (hundreds of thousands or millions) Google doesn’t just give you limitless crawl and indexing. We had js content waiting days after scraping to make it to the index.

Also, competition. In a highly competitive seo environment like US real estate, we were constantly competing with 3 or 4 other well-funded and motivated companies. A couple times we tried going dynamic first with a page we lost rankings. Maybe it’s because fcp was later? I don’t know. Because we ripped it all out and did it server side. We did use NextJs when rebuilding trulia but it’s self hosted and only uses ssr.

dheera · 2 years ago
I actually think intentionally downranking sates that require JavaScript to render static content is not a bad idea. It also impedes accessibility-related plugins trying to extract the content and present it to the user in whatever way is compatible to their needs.

Please only use JavaScript for dynamic stuff.

dmazzoni · 2 years ago
> It also impedes accessibility-related plugins trying to extract the content and present it to the user in whatever way is compatible to their needs.

I'm not sure I agree that this is relevant advice today. Screen readers and other assistive technology fully support dynamic content in web pages, and have for years.

Yes, it's good for sites to provide content without JavaScript where possible. But don't make the mistake of conflating the "without JavaScript" version with the accessible version.

niutech · 2 years ago
Screen readers aren't the only assistive user agents. There are terminal-based web browsers too, like Links/Lynx, which doesn't support JS.
dheera · 2 years ago
> Screen readers and other assistive technology

Readers for the blind not the only form of assistive technologies, and unnecessary JS usage where JS is not necessary makes it hard to develop new ones.

There is a huge spectrum of needs in-between, that LLMs will help fulfill. For example it can be even as simple as needing paraphrasing of each section at the top, removing triggering textual content, translating fancy English to simple English, answering voice questions about the text like "how many tablespoons of olive oil", etc.

These are all assistive technologies that would highly benefit from having static text be static.

creesch · 2 years ago
It also is still general overhead, which does require capable devices and a good internet connection. Something a lot of developers with very capable computers and fast internet connections tend to overlook.

Specifically, if you are targeting a global audience, there are entire geographic regions where the internet is much, much slower and less reliable. So not only are these people experiencing slow load times with packet drops and all that some JavaScript libraries and such might not even load. Which isn't a huge deal if your main content does not rely on JavaScript to load, but of course is if it does require JavaScript.

In addition to that, in these same regions people often access the internet through much cheaper and slower devices.

sureIy · 2 years ago
> Please only use JavaScript for dynamic stuff.

Pretty sure that ship has sailed in 2015. It’s good to see people focusing on SSR again but that’s just an extra step and it’s hard to mess up. Too many developers don’t think it’s worth it. Just try to visit any top websites without JS, even just to read them.

dheera · 2 years ago
I do this all the time because a couple of websites display all the text and then there is a JS that erases all the text and replaces it with a stupid paywall, whereas if I disable JS I can just read it.
rvnx · 2 years ago
Strange article, it seems to imply that Google has no problem to index JS-rendered pages, and then the final conclusion is "Client-Side Rendering (CSR), support: Poor / Problematic / Slow"
madeofpalk · 2 years ago
The final recommendation, is to use their semi lock-in product.
meiraleal · 2 years ago
Vercel need people to believe they deliver any value for their absurd price for their AWS wrapper
mdhb · 2 years ago
Hint: they don’t and their entire business model is actively reliant upon deceiving naive junior developers as far as I can tell.
elorant · 2 years ago
Well it is slow. You have to render the page through a headless browser which is resource intensive.
ea016 · 2 years ago
A really great article. However they tested on nextjs.org only, so it's still possible Google doesn't waste rendering resources on smaller domains
ryansiddle · 2 years ago
Martin Splitt mentioned on a LinkedIn post[1] as a follow up to this that larger sites may have crawl budget applied.

> That was a pretty defensive stance in 2018 and, to be fair, using server-side rendering still likely gives you a more robust and faster-for-users setup than CSR, but in general our queue times are significantly lower than people assumed and crawl budget only applies to very large (think 1 million pages or more) sites and matter mostly to those, who have large quantities of content they need updated and crawled very frequently (think hourly tops).

We have also tested smaller websites and found that Google consistently renders them all. What was very surprising about this research is how fast the render occured after crawling the webpage.

[1] https://www.linkedin.com/feed/update/urn:li:activity:7224438...