Amphtml: Variable substitution in href of <a> tags?

Created on 15 Jul 2016  路  40Comments  路  Source: ampproject/amphtml

I'd love to see the ability to do at least simple variable substitution in the href of an <a> tag. specifically if I could get CLIENT_ID transmitted it would let me track sessions between the a cdn and regular site. One of the big gripes we're getting from publishers is they have no idea if AMP is generating more or longer sessions or if users are just swiping in the carousel to another pubs content.

I realize that some of the variables are real time so that would involve intercepting clicks and that bring it's own issues. However if we could just get the static stuff that would still be a huge win.

Feature Request

All 40 comments

I'm supportive of doing this. I don't think that exposing CLIENT_ID in URLs is a good practice though.

cc @cramforce @avimehta

Exposing on the URL is definitely problematic. We'd definitely only do it for links to the source origin of an article. Not sure about insecure destinations, probably not.
@jpettitt Would putting it on the URL fragment be enough?

The primary risk is that a user shares the destination page and leaks their client id. Destination pages can avoid that, by clearing the ID from the URL, but there is no way for the referrer to enforce that.

POST requests would not have this problem, but come with back button issues.

Fragment might work, I'd have to think about how that would work if the page being linked to is AMP where we can't run JS. If FRAGMENT were also a substitution value we could use in an amp-pixel or amp-analytics tag I think that would work. As long as we can tie the cdn id we get on our amp-pixel to the user session on our domain we're good.

@jpettitt Do you need this at all on browsers that support 3p cookies or if you already have a cookie with the user?

One way would be to do this:

  • We call an endpoint early in page lifecycle that tells us whether the following is necessary
  • If yes, outbound links to the origin of the doc are instead POSTed to a given URL with two post params:

    1. CLIENT_ID

    2. Actual destination URL

  • This HTTP endpoint would then be expected to emit a redirect to the destination.

That would work too. I was just looking for a simple answer since we already have variable substitution. I think if we're going the the POST solution I'd like to be able to pass other variables (time on page spring to mind as does the ampdoc and canonical urls). If this become a generalize "redirect via post" and works for link on both the ampdoc origin and the canonical origin I think we'd be good.

I'd caution to be careful about being too generic, since href attributes are one of the few places in the doc that can be used to run arbitrary javascript via javascript: URLs and the like. I think typically as long as you don't allow replacements in the protocol, no problem, but it's unclear if there might be ways to abuse the error handling of various user agent's URL parsing.

bump.

We will definitely do this.
@jasti @rudygalfi I pinged you about the same topic on an internal doc recently.

I don't think we should land #5053 as is. It is fine for inside amp-ad with custom rules.

@jpettitt How would you feel about an in-doc whitelist that says something like

<meta name="amp-link-variables" content="www.mysite.com:RANDOM,CLIENT_ID(abc)">

You could freely whitelist them, but would have to do so.

Benefits of this:

  • Possibly sensitive data doesn't get send to potential external links (e.g. a reporter pastes a link with CLIENT_ID into an article)
  • Adding a new substitution isn't a breaking change. The mechanism wouldn't otherwise change random URLs with upper case strings in them.

An alternative would be to not do the above and instead do

<a href="some.link" amp-replace>

Or some attribute like that, which opts individual links into replacement.

<a href="some.link" amp-replace>

I was gonna suggest something like this as well.

I think i prefer the whitelist in approach becasue of the possible breaking change issue. I'd prefer it in JSON. Could we also include a "post" flag have the link post the variables in question.

How about

{
    "www.somesite.com": {
        vars: {
            "RANDOM",
            "CLIENT_ID(abc)"
        },
               "post": "https://post.target.example.com/target" /* optional */
    },
    {
        "some.other.site": {
            /* etc */
        }
    }
}

Where site name is a FQDN or one of CANONICAL, SOURCE, AMPDOM, CDN

If post is supplied then all vars for that domain are posted along with a href var for the original link target. Otherwise it's just a replace.

Fwiw, the general approach should solve our problem (session attribution when doing deep-linking) and would prefer either the JSON "metadata" version or the attribute.

Hmm, that doesn't really allow us to deal with session attribution after deeplink. Those URLs tend to be non-origin and I'm not sure we can easily bounce through our systems in that case. Any thoughts on how to best handle that? Our primary goal is to evaluate the impact of experiments on downstream behavior so open to other idea for how to make that happen.

@smeder Could you give an example of the type of URL flow a user would go through in that case.

Also, I didn't really want to close this, so reopening, since the feature hasn't launched.

We're currently using Branch.io for this so a URL might look something like:

https://bnc.lt/a/key_live_eifqPbvhIRnKEb1T0MSFpipgsCenSDRu?%24deeplink_path=https%3A%2F%2Fwww.pinterest.com%2Fpin%2F574631233686186958&%24ios_deeplink_path=pinterest%3A%2F%2Fpin%2F574631233686186958&%24android_deeplink_path=pinterest%3A%2F%2Fpin%2F574631233686186958

I believe that using Firebase would result in something similar with a different base url. In this case a click goes through Branch which does record some of the parameters and makes them available to us and ultimately either takes the user to the App, to the App Store and falls back to a web page. I can find out more about the specific http response if needed.

We could, of course, add a way to add additional origins. Sigh. The real bug here is, of course, that services like branch.io and Firebase have to exist for something that should just be a browser API. AMP has traditionally been very hostile against such unneeded redirects.

Agreed that would be really nice to have a browser API. Not the world we line in today though...

@smeder Could you file a separate issue for white listing target domains for this feature?

@avimehta had a crazy idea for how to pass the client id in a more secure fashion and it is working.

See this example http://output.jsbin.com/gifejef/quiet?ref=123

This passes the client id via referrer. The main benefit is that this cannot accidentally leak via sharing the destination URL.

How would people feel about this?

See this example http://output.jsbin.com/gifejef/quiet?ref=123

You just leaked the client id 馃槈. I assume you hit back then copied the URL?

So if I'm following @cramforce a link (presumably to a whitelisted origin) could contain an client ID url param ref=blah in the referrer. Since our CDN is on varnish I could parse that at the edge and turn it into a set-cookie: blah to set the same ID on the next page. When the next page loads analytics will see the new cookie id and presto we've linked the sessions without leaking it in the url itself or fragmenting our edge cache. Right?

Only really works if the links either point to a redirect or you can do cookie manipulation at the edge.

Edit: It would also only work if the next page or redirect endpoint was https.

@jpettitt you can read the value via document.referrer on the destination just like a query param from location.href. HTTPS is actually not an issue because the AMP cache uses <meta name=referrer content=always> unless you override it.

@jridgewell LOL, yes. The correct URL is http://output.jsbin.com/gifejef/quiet. The leakage is easy to fix, though, because the same logic that adds it on navigation can take it away on load.

@cramforce I was presuming the next page would also be AMP and wanted to get the same ID in before the amp runtime assigned a new one, hence manipulating it on the edge. Could you make the amp runtime use the id in the referrer param if it's there and set it as a cookie on the domain it's running on? That would mean AMP to AMP links would have the same ID everywhere ...

(and not do any of that if DNT is set)

@jpettitt Lets track AMP picking these values up separately (I think that is also independent of how we transport them).

Do you still want me to create a separate item for passing the client id?

@smeder I meant for configuring additional "trusted destination hosts".

@cramforce I think the PR is merged, can we close this?

@zhouyx No, because the experiment is still off, pending some research.

@cramforce Is there anything we can assist with in-regards to research?

This is noted in https://github.com/ampproject/amphtml/blob/master/spec/amp-managing-user-state.md#task-5-using-client-id-in-linking-and-form-submission by @rudygalfi as an upcoming but recommended way for user state without a 3rd party cookie.

We're just hitting this now, and without some way to push this forward we can't attribute AMP's impact on conversion rates and revenue for eCommerce when rolling out AMP.

We're all good on our side and are looking at turning this on end of this week.

Is this available as an experiment now? (Or does "turning this on end of this week" refer to turning it on as an experiment?)

It's available experimentally. We're basically waiting on #7347 to land (URLs that get substitutions must be secure) before enabling this broadly.

Experiment is link-url-replace for links (and amp-form-var-sub for forms #5654).

@rudygalfi Thanks. If anyone's interested, there's a demo here (works on/off Google's AMP cache):

https://childlike-mare.gomix.me/

Looks like #7347 landed last week. We've verified our latest works with the experiments. Good to enable broadly?

We rolled this out late last week, so you should see link substitutions working now without the experiment opt-in. Take note that substitution now requires secure URLs, which was the last bit we were waiting on.

It's good to enable broadly with the following caveat: In the (unlikely) event we ever find an issue with the current production build of AMP, we'd need to roll back to the previous one. This will be true until we get out another stable release near the end of this week. So safest thing would be waiting until end of week when that next build is out, but if you're comfortable with that risk and its impact (the substitutions would just fail), then you can proceed.

Thanks @rudygalfi! We have this behind a feature flag for now, but appreciate the updates. So far works great for both link and form substituitions

Reopening, because we need to write docs for this I believe.

to @rudygalfi for docs.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Download picture Download  路  3Comments

choumx picture choumx  路  3Comments

sryze picture sryze  路  3Comments

choumx picture choumx  路  3Comments

sryze picture sryze  路  3Comments