Rendering JavaScript for Google Search

Search Off the Record podcast
11 Jul 202425:20

Summary

TLDRIn this episode of 'Search Off The Record,' Martin and John from Google's search team, along with guest Zoe Clifford from the rendering team, discuss the intricacies of web rendering for search indexing. They delve into topics like the importance of JavaScript and the Document Object Model (DOM), the challenges of rendering dynamic content, and the impact of browser updates on indexing. The conversation also touches on issues like user agent detection and handling JavaScript redirects, providing insights into optimizing web content for search engines.

Takeaways

  • 😀 The podcast 'Search Off The Record' is a fun and informative series from the Google Search team discussing various search-related topics.
  • 👋 Zoe Clifford from the rendering team joined the podcast to discuss her role and experiences with Google's rendering processes.
  • 🌐 Rendering in the context of web search refers to the process of using a browser to view a web page as a user would see it after it has loaded and JavaScript has executed.
  • 💻 Google uses headless browsing to index web pages, simulating a user's view of the page after all elements have loaded.
  • 💰 Rendering web pages for indexing is an expensive process, but it is necessary to capture the full content of pages that rely on JavaScript.
  • 🔄 Googlebot now follows the stable release of Chromium for browser updates, ensuring compatibility with modern web features.
  • 🛠️ The Document Object Model (DOM) is a tree-like structure that represents the browser's view of the page, essential for JavaScript manipulation and indexing.
  • 🤖 Googlebot's rendering process is stateless, meaning each rendering is a fresh browser session without retaining cookies or session data from previous renders.
  • 🔒 Google respects 'robots.txt' files and will not fetch content that is disallowed for Googlebot, which can affect the rendering of dependent resources.
  • 🔄 JavaScript redirects are followed by Googlebot at render time, similar to server-side redirects, but care should be taken to avoid redirect loops.
  • 👻 The podcast shared a 'ghost story' about an unexpected 'iterator Act is not defined' error, highlighting the complexities and occasional quirks of browser rendering.

Q & A

  • What is the role of the rendering team at Google?

    -The rendering team at Google is responsible for headless browsing, which allows Google to index the view of web pages as a user would see it after it has loaded and JavaScript has executed.

  • What is the Document Object Model (DOM) and why is it important for indexing web pages?

    -The DOM is a programming interface for HTML documents. It represents the structure of a web page in a tree form, which reflects the browser's view of the page at runtime. It's important for indexing because it allows search engines to understand and interact with the content of the page.

  • Why did Google decide to follow the stable release of Chrome for Googlebot?

    -Google decided to follow the stable release of Chrome for Googlebot to ensure that it gets browser updates regularly and to reduce the manual integration work required to maintain a headless browser for the indexing pipeline.

  • What is the significance of continuous integration in the context of Googlebot's browser updates?

    -Continuous integration ensures that Googlebot receives updates from the Chromium project automatically, which helps in maintaining compatibility with the latest web standards and JavaScript features.

  • How does Googlebot handle JavaScript redirects during the rendering process?

    -Googlebot follows JavaScript redirects just like normal server-side redirects, but they have to happen at render time instead of crawl time.

  • What is the impact of rendering on the indexing process, especially considering its computational expense?

    -Rendering is computationally expensive, but it is necessary to capture the final state of a web page after all scripts have executed. This ensures that Google can index the content as users would see it, including dynamic content.

  • What is the 'ghost story of iterator' mentioned in the podcast, and what does it reveal about browser updates?

    -The 'ghost story of iterator' refers to an incident where an unexpected error message 'iterator Act is not defined' appeared during a browser update. It reveals that even with automated updates, there can be unforeseen issues that require careful QA and sometimes manual intervention.

  • How does Googlebot handle cookies during the rendering process?

    -Googlebot has cookies enabled, but it does not interact with cookie dialogs. It will accept cookies set by the website without user interaction, but it does not maintain state between rendering sessions.

  • What is meant by 'user agent shenanigans' and why are they problematic for web indexing?

    -User agent shenanigans refer to practices where web developers serve different content based on the user agent string, which can lead to issues when the website is updated and the conditional logic is not maintained properly, potentially resulting in broken or incorrect content being served to crawlers.

  • How does the rendering process handle websites that are blocked by robots.txt?

    -If a website or resource is disallowed by robots.txt, Googlebot will not fetch it during the rendering process. This means that if the page content relies on resources blocked by robots.txt, the page may appear broken or incomplete to Googlebot.

  • What advice does the podcast give for web developers regarding the implementation of structured data using JavaScript?

    -The podcast advises web developers to implement structured data using JavaScript carefully, ensuring that the web page is not too fragile and can handle errors gracefully. It also suggests using tools like Google's Search Console URL Inspection to test if Googlebot can render the page correctly.

Outlines

00:00

😀 Introduction to the Podcast and Rendering Basics

The podcast begins with a lively introduction to 'Search Off The Record,' a show from the Google Search team. Host Martin is joined by John from the Search Relations team and guest Zoe Clifford from the Rendering team. They discuss the role of the rendering team, which involves understanding how web pages are displayed after JavaScript execution. Zoe introduces herself, referencing a past Google I/O appearance and her work on rendering. The conversation lightheartedly touches on the debate between dogs and cats, highlighting the team's camaraderie. The summary of this paragraph focuses on the podcast's aim to explore search-related topics in a fun and informative manner.

05:02

🛠️ The Technicalities of Web Rendering and Googlebot's Evolution

This paragraph delves into the complexities of web rendering, explaining the importance of JavaScript and the Document Object Model (DOM) in how web pages are presented and indexed. It discusses the shift to 'Evergreen Googlebots' in 2019, which follow stable Chrome or Chromium updates, streamlining the process of integrating browser updates into Google's indexing pipeline. The speakers reminisce about past challenges with browser updates and the deprecated Blink platform API, leading to the adoption of Chromium for easier maintenance. The summary emphasizes the technical advancements and continuous integration practices that have improved Googlebot's efficiency and kept it up-to-date with the latest web standards.

10:03

👻 Ghost in the Machine: Unexpected Bugs in Web Rendering

The conversation takes a turn with a story about an unexpected 'iterator' error that appeared during a browser update. This led to a debugging adventure, uncovering a bug in V8's bundled JavaScript files and a macro substitution issue. The anecdote illustrates the challenges in maintaining a seamless web indexing process and the occasional 'ghosts in the machine' that developers must address. The summary captures the essence of this debugging story, highlighting the unpredictability of software updates and the importance of thorough Quality Assurance (QA) in web rendering.

15:04

🕸️ Navigating JavaScript and Structured Data in Web Development

The discussion moves to the use of JavaScript for implementing structured data on web pages. While acknowledging past issues with ES6 support, the panelists agree that JavaScript is now well-integrated into Googlebot's capabilities, thanks to the continuous Chromium updates. They caution, however, about the potential fragility of web pages reliant on JavaScript, especially when external resources or API calls are involved. The summary underlines the importance of robust web development practices that account for potential errors and ensure content accessibility, even when JavaScript fails.

20:05

🤖 User Agent Shenanigans and the Pitfalls of Conditional Rendering

This paragraph addresses the practice of 'user agent Shenanigans,' where web developers tailor content specifically for Googlebot, potentially leading to issues when the website or its frameworks are updated. The panelists discuss the shift away from 'dynamic rendering' and towards more consistent web experiences for all visitors, including search engines. They also touch on the use of Googlebot's user agent for debugging purposes and the challenges it presents. The summary captures the essence of the conversation, emphasizing best practices in web development that avoid conditional rendering based on user agents and the importance of maintaining updated and accessible content.

25:06

🔁 JavaScript Redirects, Cookies, and the State of Googlebot's Rendering

The final paragraph of the script explores the topic of JavaScript redirects and their treatment by Googlebot, which follows them just like regular HTTP redirects. The conversation also covers Googlebot's handling of cookies, its stateless rendering process, and adherence to robots.txt guidelines. The speakers discuss the implications of resources being disallowed for Googlebot and how it affects the rendering of web pages that rely on external API calls. The summary provides a comprehensive overview of Googlebot's capabilities and limitations in rendering, including its approach to redirects, cookies, and API accessibility.

🎙️ Closing Remarks and Invitation for Listener Engagement

The podcast concludes with a round of thanks to the guests and listeners, highlighting the fun and informative nature of the episode. The hosts express hope that the insights shared on web rendering will be useful to the audience. They also encourage listeners to engage with them on social media and at upcoming events, emphasizing the interactive and community-driven aspect of the podcast. The summary captures the spirit of the closing remarks, inviting continued dialogue and feedback from the audience while celebrating the collaborative exploration of search and rendering topics.

Mindmap

Keywords

💡Rendering

Rendering in the context of web development refers to the process of generating a view from a model by the browser. It is central to the video's theme as it discusses how Google's search indexing pipeline uses rendering to understand and index web pages as they would appear to a user after JavaScript execution. The term is used to explain the complex process of headless browsing, which is vital for Google to accurately index dynamic content.

💡JavaScript

JavaScript is a programming language that allows interactive web pages and is a key technology in the video script. It is used to manipulate the Document Object Model (DOM) and is crucial for client-side rendering. The video discusses the importance of JavaScript for Googlebot to properly index websites that rely on it for content delivery, as seen in the discussion about pages that prompt users to enable JavaScript.

💡Document Object Model (DOM)

The DOM is a programming interface for HTML documents and is essential for rendering. It represents the page so that programs can change the document structure, style, and content. In the video, the DOM is mentioned as the browser's mental model of a website, which is manipulated by JavaScript and crucial for Google to index the final state of a web page.

💡Client-side Rendering

Client-side rendering is a technique where content is fetched and rendered by the client's browser using JavaScript. The video touches upon this concept when discussing how some websites require JavaScript to be enabled for content to be displayed, which is important for Google's rendering process to capture the full page content.

💡Googlebot

Googlebot is the web crawler for the Google search engine, which indexes web pages. In the script, Googlebot is discussed in relation to how it has been updated to follow the stable releases of Chromium, allowing it to render pages more accurately. The term is used to illustrate the evolution of Google's indexing capabilities.

💡Continuous Integration

Continuous integration (CI) is a development practice where code changes are regularly merged to a central repository. In the video, the term is used to describe Google's update process for Googlebot, ensuring it stays current with the latest browser features and rendering capabilities.

💡Structured Data

Structured data is a way of annotating web content in a machine-readable format for search engines to better understand and process it. The video discusses using JavaScript for implementing structured data, emphasizing that while Google is capable of executing JavaScript, webmasters should ensure their pages are not too fragile if certain scripts fail to load.

💡User Agent

A user agent is a software identifier that allows servers to detect the software and hardware capabilities of the client. In the script, 'user agent Shenanigans' refers to the practice of serving different content based on the user agent, which can lead to issues if not maintained correctly, as it might result in content that is not suitable for indexing.

💡Robots.txt

Robots.txt is a file that websites use to instruct web crawlers on which pages or sections of a website should not be processed or scanned. The video mentions that Googlebot follows the rules set in robots.txt, which impacts the content that is available for rendering and indexing.

💡JavaScript Redirect

A JavaScript redirect is a method of redirecting users to another URL using JavaScript. The video clarifies that Googlebot can follow JavaScript redirects, which must occur at render time. This is important for SEO as it ensures that redirects are tracked and indexed correctly.

💡Stateless Rendering

Stateless rendering refers to the process where each rendering request is independent and does not retain any state from previous requests. The video mentions that Google's rendering process is stateless, meaning each page is rendered as a fresh browser session without any retained data from previous sessions.

Highlights

Introduction of the podcast 'Search Off The Record' and the hosts involved in the discussion about search and rendering.

Zoe Clifford's introduction from the rendering team and her experience with Google I/O in 2019.

Explanation of the role of JavaScript in web pages and its necessity for client-side rendering.

Clarification on the Document Object Model (DOM) and its importance in reflecting the browser's view of a page.

Discussion on the challenges and costs associated with rendering web pages for indexing.

Google's approach to rendering all HTML pages and the expenses involved in the process.

Introduction of Evergreen Google Bots and the benefits of following stable Chrome for browser updates.

The history of Googlebot's browser updates and the shift to Chromium for easier integration.

Continuous integration of upstream Chromium and its significance for Googlebot.

The importance of testing and the role of Search Console's URL Inspection tool for webmasters.

Zoe's personal experience with a 'ghost' error message 'iterator Act is not defined' during a browser update.

The impact of JavaScript on structured data and the importance of not making web pages too fragile.

Googlebot's handling of JavaScript redirects and the comparison with server-side redirects.

Discussion on user agent detection and the pitfalls of user agent Shenanigans in web development.

The rendering process of Googlebot, including its stateless nature and handling of cookies.

Googlebot's adherence to robots.txt and the implications for rendering when APIs are disallowed.

Conclusion of the podcast with a reminder of the importance of rendering in search and indexing.

Transcripts

play00:03

[Music]

play00:11

hello and welcome to another episode of

play00:13

search of the record a podcast coming to

play00:15

you from the Google search team

play00:17

discussing all things search and having

play00:19

some fun along the way my name is Martin

play00:22

and I'm joined today by John from the

play00:24

search relations team of which I'm also

play00:26

part of hi John hi Martin and we are

play00:29

joined today by Zoe Clifford from the

play00:31

rendering team hi Zoe howdy Hey Zoe

play00:35

would you like to introduce yourself

play00:37

yeah I'm Zoe Clifford you may remember

play00:39

me from getting up on stage with Martin

play00:42

at google.io around 2019 or so I yeah

play00:47

work for Google bike to work work on

play00:49

rendering I like dogs and cats fun times

play00:53

that's it for me which one is better

play00:55

dogs or cats well you're you're going to

play00:58

make me choose between dogs and cats on

play01:00

a podcast John okay fine is it depends

play01:04

the answer uh so I have a favorite but

play01:06

I'll never admit which

play01:08

one it would make the other two sad

play01:11

that's totally just like

play01:13

Google okay so you're in the rendering

play01:15

team and I'm not sure everyone

play01:17

understands what rendering is about but

play01:19

we have the web you make a website you

play01:22

use HTML and CSS right am I missing

play01:25

something you are missing something

play01:27

Martin it's a scary word that's starts

play01:30

with j gifs gifs yes yes there can also

play01:34

be gifs on web pages as well as

play01:37

JavaScript JavaScript no it's not GIF

play01:40

it's okay it's JavaScript all right okay

play01:44

it's technically guavas cript

play01:47

gu no no it's JavaScript is guavas

play01:49

script actually useful do we need that

play01:52

for something yeah there there's many

play01:54

web pages out there that I'm quite fond

play01:57

of where if you try and load them

play01:59

without JavaScript you'll just get short

play02:02

string of text that says please enable

play02:04

JavaScript to access this web page fair

play02:07

so I know that there's a lot of websites

play02:09

especially when they use the wonderful

play02:11

term client side rendering that actually

play02:13

fetch their content using JavaScript and

play02:16

uh I guess we want to see the content to

play02:18

actually be able to index it no uh yeah

play02:21

it is generally useful to have the

play02:23

contents in the Dom to be able to index

play02:26

it o now we're using another fancy word

play02:29

the Dom the document object model so

play02:31

what's that what even is it all all I

play02:35

can tell you Martin is it's kind of like

play02:37

HTML but unwrapped into a tree form

play02:40

which reflects the browser's view of the

play02:43

page at rent time yeah it's like the

play02:45

browser's mental model of a website yeah

play02:47

but I I've never actually read the Dom

play02:49

spec so there could be something else

play02:51

about it that I've never heard of I'm

play02:54

not sure about that either now you make

play02:55

me question my my worldview that's

play02:57

that's that's something that's

play02:59

interesting okay so we using the Dom

play03:03

which is like the representation of all

play03:04

the content inside the browser and that

play03:08

can be changed and controlled by

play03:10

JavaScript is that roughly accurate yeah

play03:12

yeah that's right right and for that to

play03:15

be able to see things that have been

play03:17

manipulated added or removed by

play03:18

JavaScript we have to render right right

play03:21

right you can also have a Dom without

play03:23

any JavaScript at all fair that's true

play03:26

even static websites have a Dom yeah but

play03:28

then what is this rendering what happens

play03:30

inside Google search when we render a

play03:33

page okay so render is uh a very

play03:37

overloaded term but in this context it

play03:40

means headless browsing headless being a

play03:43

particularly gory industry term for a

play03:46

browser which is controlled by a

play03:48

computer and the reason we run a browser

play03:51

in the indexing pipeline is so we can

play03:54

index the view of the web page as a user

play03:57

would see it after it has loaded and

play03:59

JavaScript has executed okay interesting

play04:02

so I guess that involving a browser and

play04:06

having to kind of like run Pages through

play04:09

a browser is is pretty challenging no oh

play04:14

yeah it's very expensive it's so

play04:17

expensive the exact amount of

play04:20

expensiveness is highly confidential ah

play04:22

oh but then if it's Soo expensive how do

play04:24

we decide which page should get rendered

play04:27

and which one doesn't oh we just render

play04:30

all of them as long as they're HTML and

play04:33

not other content types like PDFs what

play04:36

but that that's expensive yeah yeah it

play04:40

is expensive but then if it's so

play04:42

expensive then then why can is it is it

play04:46

okay but we are rendering all the pages

play04:48

that are HTML Pages all of them get

play04:51

rendered right right right and it is

play04:54

expensive but that expense is required

play04:57

to get at the contents for the most part

play04:59

Pages which do not require JavaScript to

play05:02

index are cheap to render anyway so we

play05:05

don't think about it we just render all

play05:07

of them Ah that's really interesting

play05:09

fantastic and uh and I guess we have

play05:14

introduced I remember in 2019 when we

play05:16

were on this stage at iio we've

play05:18

introduced like the Evergreen Google

play05:20

bots so we are getting browser updates

play05:22

pretty regularly no that's correct uh we

play05:25

follow stable Chrome or stable chromium

play05:28

technically but that wasn't always the

play05:30

case why has that not been the case

play05:31

before 2019 that's a good

play05:35

question because before this effort to

play05:38

follow staple Chrome there was a lot of

play05:42

uh manual integration work to like take

play05:45

this normal browser core like blink and

play05:48

turn it into um a headless browser

play05:52

capable of running in the Google

play05:53

indexing pipeline uh and we kind of

play05:56

slacked a bit on browser updates and

play06:00

eventually the API we were using the

play06:02

blink platform API uh was deprecated and

play06:06

removed so we had to switch to something

play06:08

else and it's like I'm tired of all

play06:10

these manual updates we're just

play06:12

switching to chromium so basically

play06:15

before that we we had to install all the

play06:18

updates manually and now googlebot gets

play06:21

the updates fresh more or less yeah yeah

play06:25

uh we we were very careful to make sure

play06:27

we had this continuous integration I'm

play06:29

going to put that on my resume by the

play06:31

way continuous integration of uh

play06:34

Upstream chromium really really fancy

play06:36

that's really really nice in this bis

play06:39

you got to use words like continuous

play06:41

integration on your resume you can't

play06:43

just say I'm really good at installing

play06:45

updates you got to say cicd I still have

play06:48

to do these things manually I should get

play06:50

a John update that installs Chrome

play06:54

updates automatically you manually

play06:56

update your Chrome I thought that kind

play06:57

of does like happen in the background

play06:59

automatically no well is like constantly

play07:01

just well I mean constantly like every

play07:04

now and then this thing that it's like

play07:05

oh you have to update your browser and

play07:08

it's like oh gosh I have to spend 15

play07:10

seconds restarting my browser so

play07:13

annoying but you get all the cool new

play07:15

browser features and you can build more

play07:17

interesting and amazing websites with it

play07:19

and as far as I understand that mostly

play07:21

then works with Google search uh mostly

play07:24

mostly so all all the systems that we've

play07:27

taken care to extract will for for sure

play07:29

keep working if there's like some new

play07:33

attribute or something we might not like

play07:37

look at it automatically but it won't

play07:39

like break anything for sure oh cuz we

play07:42

have tests to make sure that stuff

play07:44

doesn't break oh it was a terrible time

play07:47

Mar before we had all those tests things

play07:49

would just break and no one could stop

play07:51

them I mean I remember being a web

play07:53

developer back before 2019 when uh there

play07:57

was the big shift to es6 I think that

play07:59

was in 2015 and we got so many new

play08:01

features in JavaScript and we could use

play08:03

none of them because Google search

play08:05

wouldn't support them yeah at the time

play08:07

we were running an older version of

play08:09

blink with an older version of V8 so we

play08:12

had a lot of trouble with es6 and it it

play08:15

was a big problem which was one of the

play08:18

motivations for switching to continuous

play08:21

integration When you mention all these

play08:23

lowlevel browser Parts like blink which

play08:25

is the rendering engine in Chrome and

play08:27

then V8 was this Javas execution engine

play08:30

or rendering engine then uh there must

play08:33

have been scary things that you ran into

play08:36

uhuh yeah have I told you the ghost

play08:39

story of iterator

play08:43

iterator there was one day when we were

play08:47

updating our blink

play08:49

version and as part of this we had T

play08:53

know do some QA another thing to put on

play08:56

my resume to make sure that the new

play08:59

version actually worked for all the

play09:01

websites out there so you looked at all

play09:03

the pages on the web uh not all the

play09:05

pages we'd like divy up a bunch of pages

play09:08

with the most diffs and everyone would

play09:11

like get 10,000 pages each to kind of

play09:13

glance over it was a lot of fun you know

play09:16

I just spent hours and hours and hours

play09:18

just looking at web page diffs it was

play09:20

great but one of these diffs was like

play09:23

actually a really subtle difference

play09:25

there was just something on some Wiki

play09:28

article

play09:30

uh not Wikipedia one of the other wikis

play09:33

about um some TV series and part of the

play09:37

page just looks suddenly wrong to me so

play09:41

I open up console.log and I see a

play09:44

curious error message iterator Act is

play09:49

not defined that is probably not defined

play09:52

that that sounds like es

play09:55

6.5 yeah so I thought maybe this is some

play09:59

kind of weird JavaScript keyword with a

play10:03

bizarre name so I used a search engine

play10:06

to search for it and there were zero

play10:09

results what and I tried again with all

play10:13

the other search engines I could think

play10:15

of and there were still zero results so

play10:18

then you made a page and now you rank I

play10:21

searched in the page and the page didn't

play10:25

reference it anywhere and I searched in

play10:27

the browser source code and it it wasn't

play10:30

referenced anywhere there either whoa it

play10:33

was a ghost in the machine a Ghost in

play10:37

the Shell where did it come from in the

play10:39

end it came from V8 V8 okay yeah uh so

play10:45

the code has changed since then but at

play10:48

the time V8 came with some bundled

play10:51

JavaScript files which has part of

play10:53

compiling the browser these JavaScript

play10:56

files would get pre-processed and shoved

play10:59

in into C arrays C arrays being kind of

play11:03

the C++ equivalent of data URLs but as

play11:06

part of this pre-processing there was a

play11:09

macro substitution step where it would

play11:11

substitute one string for another string

play11:15

and this macro

play11:16

substitution uh tried to substitute two

play11:19

strings at once only there was some

play11:22

overlap so if they were substituted in

play11:25

the wrong order this was indeterministic

play11:28

order because of python dictionary uh

play11:32

ordering then it would produce this bad

play11:35

output of iterator from iterator and

play11:39

object oh I couldn't tell you the exact

play11:42

details now but it was something like

play11:45

that if you search for my name in the

play11:47

creme commit log and it it's quite hard

play11:50

to find now but it's somewhere in there

play11:52

oh wow so your browser was hallucinating

play11:55

before hallucinating was cool yeah yeah

play11:59

uh so so that was some gnarly stuff

play12:03

there and that that was my first

play12:05

contribution to the chromium code base

play12:08

cool so one of the questions I I

play12:10

sometimes hear from people is whether it

play12:13

makes sense to implement uh structur

play12:17

data using JavaScript and the worry is

play12:20

sometimes is like it's too fragile or

play12:24

like Google hates JavaScript it's like

play12:26

of course they don't tell Martin that

play12:28

but they tell me that sometimes what do

play12:31

you think is implementing structured

play12:33

data with JavaScript is is that a

play12:36

problem does it work well how do you see

play12:38

that we're very good at executing

play12:40

JavaScript and I think javascript's

play12:42

great uh we mentioned a lot of problems

play12:45

with like es6 but now that we're

play12:47

following like normal cromium release

play12:50

schedule uh we basically get new

play12:53

JavaScript keywords for free and for the

play12:55

most part don't throw weird exceptions

play12:57

that won't Al so be thrown in the web

play13:01

that said it is possible for stuff to go

play13:04

wrong in particularly complicated

play13:07

scenarios uh for example if a web page

play13:10

is loading hundreds and hundreds of

play13:12

resources and it is possible that we

play13:15

won't always be able to fetch all the

play13:17

resources due to like crawl rate or HTTP

play13:21

errors or stuff like that so

play13:23

javascript's great but I'd also take

play13:26

some care to make sure that the web page

play13:28

isn't too fragile if errors do happen

play13:32

Okay so how do you mean fragile if

play13:34

errors happen uh like if you have a web

play13:38

page which accesses uh an API endpoint

play13:42

and that API endpoint could return of

play13:45

429 under certain circumstances then

play13:48

this is one example of where things

play13:50

could go wrong if the return call there

play13:54

is critical and the page fails to have

play13:56

good contents without a successful resp

play13:59

from it okay and then what what happens

play14:02

do it does a page just stop loading or

play14:05

does everything get deleted it depends

play14:07

on the web page uh I've seen like

play14:11

partial page contents blank pages Pages

play14:14

which redirect to google.com um error

play14:17

messages if there's going to be like an

play14:20

error and you can't load the content I

play14:22

think it's best to have a clear error

play14:24

message but ideally it's best to have

play14:27

the contents of course okay and to so so

play14:31

I guess on the one hand the error

play14:33

Handler is is something that should be

play14:37

kind of reasonable and not crash the

play14:39

rest of the pages loading but yeah yeah

play14:43

uh like if there's an uncut exception

play14:46

because a video fails to load I've seen

play14:49

a case where a video fails to load so

play14:52

the page redirects to google.com

play14:55

actually wow um that's a popular

play14:58

redirect destination uh and this was a

play15:01

case where the page had good contents

play15:03

but then this tiny little thing went

play15:05

wrong so it's like I'm going to throw

play15:06

this all the away so if there is an

play15:09

error I just try and handle it as

play15:11

gracefully as possible and this is hard

play15:15

stuff don't get me wrong web development

play15:17

is hard stuff I'm not a web developer it

play15:19

like terrifies me I guess testing it is

play15:22

hard if it's sometimes breaks but if it

play15:24

always breaks what would you recommend

play15:26

like how how could someone test it to

play15:29

see if it's like generally possible that

play15:31

it could work there's this uh web master

play15:34

tool search console URL inspection tool

play15:37

that's great stuff if that works then

play15:39

generally it's possible that Google bot

play15:42

could also render it yes generally and

play15:45

rendering in Google is as close to a

play15:49

normal browser as possible Right but

play15:51

it's not quite the same is it yeah do do

play15:54

you want to hear another ghost story

play15:56

Martin oh please please do tell

play15:59

it's not quite the same and one of the

play16:01

ways it's different is we try and do

play16:04

things as efficiently as possible so

play16:07

efficiently that there's this certain

play16:10

JavaScript event that we were not firing

play16:14

called request idle call back because

play16:16

our Brower was never idle oh this is all

play16:20

well and good but there was a certain

play16:22

popular video website which I won't name

play16:25

to protect the

play16:26

guilty which um deferred loading any of

play16:30

the page contents until after request

play16:33

idle call back was fired this is

play16:35

actually a very reasonable thing to do

play16:37

you might want to you know get the video

play16:40

playing first and then load all the

play16:41

comments and stuff for example but since

play16:44

our browser was never actually idle this

play16:46

event was never fired so we couldn't

play16:48

load most of the page contents which was

play16:51

a problem for this website oh so now we

play16:54

fake being idle every once in a while

play16:57

just so paig has got better that that's

play16:59

one of the weird things that can happen

play17:01

when you have a browser that's mostly

play17:04

but not entirely like a normal browser

play17:07

so it has to be like Oh I'm I'm so bored

play17:10

and actually it's busy all the time what

play17:12

kind of things have have you noticed

play17:14

that people otherwise get wrong when it

play17:16

comes to rendering another common class

play17:19

of issues is called user agent

play17:23

Shenanigans Shenanigans being a

play17:26

technical industry term that's what we

play17:28

call in the bit what are US Asian

play17:31

Shenanigans Enlighten us so imagine you

play17:35

write a website and you're like I really

play17:38

really want Google in particular to be

play17:40

able to Index this web page so you're

play17:42

like okay I'll put in if statement if

play17:45

user agent header equals googlebot

play17:47

output go down this code path and output

play17:50

this HTML which I think will be really

play17:52

good for googlebot for some reason and

play17:54

this is all well and good it's tested it

play17:57

works but then here pass by the website

play18:00

changes may maybe it gets updated to a

play18:05

different framework or whatnot and

play18:08

there's just this code still lurking

play18:10

deep within it somewhere and it starts

play18:13

outputting HTML which is like uh broken

play18:17

or useless or missing contents or stuff

play18:19

like that and this is what I would call

play18:21

user agent Shenanigans we used to call

play18:24

that Dynamic rendering and we actually

play18:26

discouraging it now if that makes you a

play18:28

little happy

play18:29

ah so there is an industry term for it

play18:32

besides Shenanigans I think I ran across

play18:36

a case of this recently now now that you

play18:38

mention it like this uh so in in one of

play18:43

the help Forum threads

play18:46

someone uh was was mentioning that their

play18:49

their homepage title was wrong and I

play18:52

looked into it and it seemed that we

play18:55

were being redirected to a page that

play18:57

does a 404

play18:59

uh but if you look at it in a browser it

play19:01

redirects to a page that's normal and uh

play19:06

in in the end I I noticed you could

play19:08

reproduce it by telling Chrome to use

play19:11

Google bot's user agent oh yeah I love

play19:14

that feature probably that that is

play19:17

happening in the background where

play19:18

someone is like oh I will be smart and

play19:20

do something special for

play19:21

googlebot and then the next person who

play19:25

works on the website is like I don't

play19:26

know I don't see anything wrong it works

play19:28

works for me yeah I I love the dev tools

play19:32

user agent override feature it's great

play19:34

for debugging stuff like this sometimes

play19:37

I'll even be trying to debug a web page

play19:39

and I change my user agent to Google bot

play19:41

and then it's like your access to this

play19:44

web page has been denied because you're

play19:46

doing you're using a suspicious user

play19:48

agent and I'm like no I wanted to debug

play19:51

this Shenanigan's gone wrong that's

play19:54

where they're being good and checking

play19:56

that the Google bot user agent comes

play19:58

from in a official IP address as

play20:00

recommended in the documentation but it

play20:02

it still makes it harder for me to debug

play20:04

so I cry a single tiar okay that's uh

play20:08

understandable understandable I would

play20:10

say how do you feel about JavaScript

play20:11

redirect so redirect is is kind of a

play20:14

topic in the SEO world where everyone

play20:16

has very strong

play20:18

opinions and JavaScript redirects kind

play20:21

of feels like that things like it's like

play20:24

even normal serers side redirects are

play20:27

this weird SEO myth topic and JavaScript

play20:31

redirect are like oh my gosh what do we

play20:33

even do with them what do we even do

play20:35

with them well we follow them so so they

play20:39

work just like normal redirects or for

play20:41

the most part JavaScript redirects of

play20:44

course have to happen at render time

play20:47

instead of crawl time but that's the

play20:49

pretty much the only thing special about

play20:51

them I don't think we like treat them

play20:53

differently in any way there have been

play20:56

cases where a web page gets into a

play20:58

JavaScript redirect

play21:00

Loop uh which is not very fun but okay

play21:06

yeah well I guess that happens with

play21:08

normal server side redirects from time

play21:10

to time as well where they're like oh

play21:13

you don't have a cookie it's like here's

play21:14

a cookie and then it checks again it's

play21:16

like oh you didn't take my cookie take

play21:18

another one and just keeps going forever

play21:21

our cookies do work pretty good though

play21:23

we have good cookies we have fairly Good

play21:26

Cookies yeah and in rendering do we also

play21:29

accept cookies or how how does that work

play21:31

do we accept cookies cookies are enabled

play21:34

if there's a cookie dialogue that says

play21:36

do you want to accept or deny these

play21:38

cookies we won't click either button

play21:39

we're Rogue like that we just don't make

play21:42

a

play21:43

decision but uh on the browser level

play21:46

cookies are enabled so if a web page you

play21:49

know sets a cookie without going through

play21:52

a dialogue then we'll see it okay but we

play21:54

don't keep that for the next time right

play21:56

uh no no rendering is stateless

play21:59

every time it happens it's a completely

play22:01

fresh browser session basically very

play22:03

very nice so if we're in the territory

play22:06

of like we're not clicking on cookie

play22:08

banners and and it's stateless I think

play22:11

when we fetch things we're using Google

play22:13

bot for that right so we do follow

play22:15

robots txt yeah yeah of course we follow

play22:18

robots txt that's the whole point of

play22:21

robots.txt but browser stoned uh yes but

play22:25

we're we're a search engine Martin okay

play22:27

fair enough yeah

play22:29

yeah that makes sense that makes sense

play22:31

okay fine fine but that means that if

play22:34

your API is roboted or disallowed for

play22:37

Google bot then rendering can't fetch

play22:39

API content right uh that's correct so

play22:42

we'll get the crawl which is like the

play22:46

HTML and that could be roboted but if

play22:49

it's not roboted and it's HTML it's sent

play22:51

to rendering and then rendering loads

play22:54

this in a browser which of course can

play22:56

make HTTP fetches to bunch of other

play22:59

stuff and any of those other resources

play23:01

could also be roboted if a resource is

play23:04

roboted we just can't fetch it we

play23:06

continue on with rendering the rest uh

play23:09

so if there's a API call you

play23:12

said and we can't fetch the API call

play23:15

then maybe that's okay if it wasn't

play23:16

doing anything important but if it was

play23:18

like fetching the page contents then we

play23:21

have a problem and I guess that's that's

play23:23

hard for us on on Google side to

play23:26

recognize because we don't know what the

play23:27

page is supposed to look like yeah I

play23:30

mean it is very reasonable for someone

play23:33

to just be like I don't want Google

play23:35

saying my content I'm just going to

play23:36

block this API call fair enough I'm

play23:39

totally okay with that but if it looks

play23:41

like a broken page it's uh can't be

play23:43

indexed the best way cool well this was

play23:46

super fun thanks for joining us Zoe oh

play23:49

yeah it's always a lovely time to hang

play23:51

out with my good pals John and Martin a

play23:55

thank you Zoe it's always good to talk

play23:57

to you and and rendering is such a

play23:59

fascinating topic and the wrs the r web

play24:02

rendering service such an amazing piece

play24:04

of software yeah the the last time I had

play24:07

a talk with Martin we were up on stage

play24:09

at Google IO and that is a blank spot in

play24:12

my memory I remember nothing of it I

play24:15

just remember getting up on stage and

play24:17

walking off of stage and that's it

play24:19

having a great time hopefully this was a

play24:21

great time as well and maybe you'll

play24:23

remember this one as well oh I hope so

play24:26

we'll send you a recording to remember

play24:28

John this has been search off the Record

play24:30

there's no record oh off the record of

play24:33

course yeah thank you so much Zoe for

play24:35

being here thank you John for joining me

play24:36

as well and um everyone out there thank

play24:39

you so much for being with us uh and I

play24:42

hope that this episode was interesting

play24:44

and fun and useful may your page indexes

play24:47

be contentful goodbye everybody goodbye

play24:55

bye we've been having fun with these

play24:57

podcast episodes and we hope that you

play25:00

The Listener have found them both

play25:01

entertaining and insightful too feel

play25:03

free to drop us a note on Twitter at@

play25:06

Google search C or chat with us at one

play25:08

of the next upcoming events that we go

play25:10

to and of course don't forget to like

play25:13

And subscribe

play25:17

[Music]

Rate This

5.0 / 5 (0 votes)

Étiquettes Connexes
Web RenderingGoogle SearchJavaScriptSEOCrawl RateUser AgentDynamic RenderingContent IndexingAPI CallsWeb Development
Besoin d'un résumé en anglais ?