Webcomponents: Is or how can Shadow DOM be 'SEO friendly'?

Created on 11 May 2016  Â·  15Comments  Â·  Source: WICG/webcomponents

So far it seems that slots are all read by googles search bots. Currently I have no idea what it does with slotted content.
Q: Do we need to optimize the API or implementation to make it easier to discover content (without breaking encapsulation) ?
Q: Is this more a Search Engine issue vs a Shadow DOM issue? Or not an issue at all?

Thoughts?

shadow-dom

Most helpful comment

@hayatoito how does this affect applications that may use web components to reuse parts of their application that may contain information they want to be crawlable? For example, I may have an <x-app> component that contains information in its shadow root that it wants to reuse everywhere while only allowing certain things to be customised via slots.

<x-app>
  <p slot="description">This page is cool etc.</p>
</x-app>

Which may render out to:

<x-app>
  #shadow-root
    <h1>My rad company</h1>
    <slot name="description">
      <p slot="description">This page is coo etc.</p>
    </slot>
</x-app>

While this is a contrived example, there could be several other things that isn't worth making the consumer of this component aware of. Preventing shadow roots from being crawlable may pigeon-hole web components into only being practical when used as leaf nodes - or otherwise forcing the consumer to specify _all_ content - which wouldn't be intuitive.

I see this as a similar problem to #499. Maybe a solution for this is for search engines to force all shadow roots to be open by overriding attachShadow() so that they can crawl into them via the shadowRoot property. Is this too obtuse for crawlers to have to do?

Another solution may be to put the onus on the developers of the components - rather than just the consumers to put everything they want crawlable into the light DOM - to ensure that if they want their components to be crawlable, that they make the shadow roots in question "open". It'd be really useful if one could specify "closed" while still having stuff crawled, though, these may be mutually exclusive. Maybe this is an argument for going back to the drawing board here. Closed shadow roots are hardly a deterrent for the determined and open by default solves several issues others outside of the working groups have raised.

A solution for crawlers that don't support JavaScript would be to support server-side rendering of shadow roots. Tangental to this, this is something that a _lot_ of people are going to be crying for if web components are to be used as a building block for applications. I see no reason that they shouldn't be either since it's a lower-level API that doesn't guide people in a particular direction and has already worked quite well in practice: a la Polymer, Skate (what we use) etc.

Saying that users must provide all crawlable content via light DOM severely limits the possibilities for web components and there needs to be a clear path for how things like this will be done - even if it means re-discussing open vs closed.

All 15 comments

It looks this topic would not get an attention...

Q: Do we need to optimize the API or implementation to make it easier to discover content (without breaking encapsulation) ?

We do not have any plan for such an API, AFAIK.

Q: Is this more a Search Engine issue vs a Shadow DOM issue? Or not an issue at all?

Could you elaborate an issue? I appreciate if you could provide an example markup so that we can understand the issue more clearly.

In search engines that do support JavaScript and render shadow DOM, it's really the crawler's job to get the contents inside shadow trees and include them in the search results.

I can see that we could add a non-normative note saying that AT and search engine crawler, etc... are advised to walk through the flattened tree but there isn't much we can do beyond that.

My general advice here:

  • If components are well designed, crawlers do not need a flattened tree to get text contents.
  • Good search-engine friendly components would enable component users to specify text contents at the place of component's children, as _crawl-able text contents_, rather than embedding hard-coded text contents in their shadow trees.

e.g.

<good-components>hello world</good-components> 
 // Its shadow tree is using a slot to get text contents
<bad-components></bad-components>
//  "hello world" is hard-coded in its shadow tree.

Humans might see "hello world" in both cases, but the former is search-engine friendly, I think.
Thus, it's up-to component developers. If every components are well-designed, users can put every important text information in top HTML, which can be crawled easily.

For example, HTML's built-in <table> <tr>, <td> (or <details>, <summary>), etc..., enable users to specify text information in their children, rather than hard-coding text contents in their implementations.

The same story can be applied to user-made web components.

@hayatoito how does this affect applications that may use web components to reuse parts of their application that may contain information they want to be crawlable? For example, I may have an <x-app> component that contains information in its shadow root that it wants to reuse everywhere while only allowing certain things to be customised via slots.

<x-app>
  <p slot="description">This page is cool etc.</p>
</x-app>

Which may render out to:

<x-app>
  #shadow-root
    <h1>My rad company</h1>
    <slot name="description">
      <p slot="description">This page is coo etc.</p>
    </slot>
</x-app>

While this is a contrived example, there could be several other things that isn't worth making the consumer of this component aware of. Preventing shadow roots from being crawlable may pigeon-hole web components into only being practical when used as leaf nodes - or otherwise forcing the consumer to specify _all_ content - which wouldn't be intuitive.

I see this as a similar problem to #499. Maybe a solution for this is for search engines to force all shadow roots to be open by overriding attachShadow() so that they can crawl into them via the shadowRoot property. Is this too obtuse for crawlers to have to do?

Another solution may be to put the onus on the developers of the components - rather than just the consumers to put everything they want crawlable into the light DOM - to ensure that if they want their components to be crawlable, that they make the shadow roots in question "open". It'd be really useful if one could specify "closed" while still having stuff crawled, though, these may be mutually exclusive. Maybe this is an argument for going back to the drawing board here. Closed shadow roots are hardly a deterrent for the determined and open by default solves several issues others outside of the working groups have raised.

A solution for crawlers that don't support JavaScript would be to support server-side rendering of shadow roots. Tangental to this, this is something that a _lot_ of people are going to be crying for if web components are to be used as a building block for applications. I see no reason that they shouldn't be either since it's a lower-level API that doesn't guide people in a particular direction and has already worked quite well in practice: a la Polymer, Skate (what we use) etc.

Saying that users must provide all crawlable content via light DOM severely limits the possibilities for web components and there needs to be a clear path for how things like this will be done - even if it means re-discussing open vs closed.

Thank you. That is a well-known concern which I have heard many times. However, no one is trying to answer this concern with a clear solution, AFAIK.

I think this concern would be also discussed in terms of accessibility, rather than _SEO_.
e.g.: http://marcysutton.github.io/accessibility-of-web-components/
I guess there are other resources where we can know how web developer ecosystem can resolve this concern.

If someone has a concrete proposal which web platform itself should provide to help them, please let us know that.

In addition to @treshugart 's use case, I can imagine another where components have dynamic content – say, for example, an i18n component:

<x-i18n key="my.string"></x-i18n> -> <x-i18n>Value of my string</x-18n>

It seems to me that search engines should have the same power as a user when it comes to parsing web pages – and a user has the ability to see the contents of the shadow DOM.

At least for now, I guess people who use components that have SEO and a11y concerns should use shadow dom mode=open.

I am concerned that components will be designed with slots that have entire chunks of SEO content which would make the component effectively a styling & layout container. One use case I could think of would be a page header with a main navigation. You may want the attributes and innerHTML of the anchors to change. The 'skip nav' may change..you may have other dynamic content...etc.

This is easy to do with adding by feeding the data via attributes but it gets cumbersome and very XML-ish. Sending the data via a giant JSON blog is not good for SEO or general best practices in Web Components. Dozens of slots has the same effect as adding attributes.

The obvious 'elephant in the room' is why even use web components in this case? Here's my reasoning...if I have several applications for my customers and I want consistent UX/UI across those apps I want to share large pieces like footers and headers and have customization as per app. I think the answer is this is better suited for a JS framework component UNLESS Shadow DOM is parsed by search engines after the component is flattened.

@hayatoito - I can see your points as to good design and I agree.It is a important discussion IMO

  • If components are well designed, crawlers do not need a flattened tree to get text contents.
  • Good search-engine friendly components would enable component users to specify text contents at the place of component's children, as crawl-able text contents, rather than embedding hard-coded text contents in their shadow trees

Ultimately, I guess I created this issue to determine if Shadow DOM needs to expose something to the light DOM for search engines in order to optimize component content discovery (or even web accessibility) before these components are flattened. How do search engines plan on approaching this?

Hi

Well, let me propose a solution, may be too late in
development/agreements on ShadowDOM, but how about making closed
ShadowDOM tree available, but in read only mode...

Brona

or how about

DocumentFragment Element.prototype.cloneShadowDOM()

To add to @Nevraeka's point, isn't it the core principle of the web component specs to provide encapsulation and modularity as a set of primitive APIs? It would significantly subtract from their value if you added the stipulation: "unless you want SEO or A11y". This is especially true since JS libraries and frameworks are looking to web components as a foundation for their functionality.

I like @BronislavKlucka's proposal; but maybe it's worth considering making shadowRoot return an immutable tree if mode is "closed".

For reference, #505 was created in relation to this.

how about making closed ShadowDOM tree available, but in read only mode...

Something like https://github.com/w3c/webcomponents/issues/499#issuecomment-225672116 ?

A crawler monkey patching the attachShadow method _might_ be fine for that purpose, as the intent of the crawler is not going to be to modify the page, only to read it, so there's not caution that the crawler needs to take when doing that. A problem with this is that it is possible that some component might do something different when it sees that some inner component is open instead of closed, so a crawler monkey patching everything open could possibly introduce different application behavior (and therefore possibly a different DOM result). A limited (perhaps proxy) interface might be a better approach, so that app functionality is not inadvertently modified...

I am triaging issues. Let me close this issue, tentatively. It looks there is no proposal which is clearly demonstrated feasible.

Although I am going to close this issue, please feel free to propose a new concrete feasible idea, if you have.
Then, I'll reopen this.

Components are most effectivelt designed for functionality, layout, and design. If you have header, footer, site-map, and other content that should be able to crawl, It should be directly in the markup from the landing where it should be crawled. The content really only needs to be flattened for what is needed to complete initial paint and display of information for the first visit. Landing points can also be flattened for user-agents such as the google search bot or any other agent you wish to deliver content. This means of you are using "fragments" (full chunks of html and content) you should flatten that portion, and retain any base component and behaviors you are using. The browser will process and render your custom-elements before they are defined in the browser. Write clean custom-elements, and use prerendering techniques. Dont nest data, if you do nest data, such as within an app shell then flatten the app shell. You do not need declarative shadow dom when flattening the The browser should detect this is a custom component, and will work through the life cycle of enhancing that custom-element once it is defined. I believe it should display even if no javascript has been loaded, and I believe an undefined custom-element is treated like a element in browser.

I wrote a server render of the markup in Light Tree, later when the components on the client begin to come to life, they replace this markup with the shadow tree, although visually, the swaping is invisible.

Was this page helpful?
0 / 5 - 0 ratings