Influxdb: Add funnel type query

Created on 19 Jan 2014  路  13Comments  路  Source: influxdata/influxdb

Multiple people have asked for the ability to easily do funnel type queries with InfluxDB. Given users who visited, how many clicked to the pricing page, how many signed up, how many did x...

I think it's important enough to exist as a concept in the query language. Here's a proposal for how that might look.

select funnel(
  (type="visit", type="signup", visitor_id, 3d),
  (type="signup", type="paid", user_id, 7d))
from user_events
where time > "2014-01-20" and time < "2014-01-27"
group by time(1d)

What that query says is to return a time series of funnels. 7 days of them to be exact. Where the first stage of the funnel is a visit that turns into a signup. visitor_id is the field that joins the two events. The visit has 3 days to turn into a signup. So if we're looking at 2014-01-20, we'd look at all the visits from that day and then look for signups for that day and the following 2 days.

The second stage of the funnel is signups that turn into paid accounts. We'd look up to 7 days in the future of the initial visit for those conversions. This query would return something like this:

[
{
  "name": "user_events",
  "columns": ["visit", "signup", "paid", "time"],
  "points": [
    [103, 21, 2, 1388552400],
    [122, 34, 3, 1388638800],
    ...
  ]
}
]

That would have 7 points, one for each day. Another option would be to have that funnel query return all of the actual raw events for the funnel. So you'd be able to see which actual users fell off at each stage.

Feedback welcome!

revisit in the future

Most helpful comment

Hi! I see this post has been closed long time ago with a label "revisit in the future". I'm very interested in this feature, do you know if anyone started to work on it? Thanks!

All 13 comments

@pauldix I don't have a lot of experience with InfluxDB specifically but I've worked on funnel analysis a lot with Sky. Sky takes an inverted approach to selection and I think it's syntax is probably overkill for InfluxDB but hopefully our approach with Sky gives you something to compare/contrast against.

Basic Funnel

Sky nests conditionals (WHEN) and selections (SELECT) so a basic funnel looks like this:

WHEN action == 'visit' THEN
  SELECT count() INTO "step1"
  WHEN action == 'signup' WITHIN 1..10000 STEPS THEN
    SELECT count() INTO "step2"
  END
END

And outputs this JSON:

{"step0": 12, "step1": 7}

Sky executes this query over every event and when it matches a conditional then it'll execute everything within it starting from that event. The WITHIN specifies that the conditional matches the next event or any event up to 10,000 events from the action='visit' event. I need to add a WITHIN 1..* STEPS but I haven't gotten around to it yet.

Sessionization

A big thing I needed early on was grouping funnels by sessions (e.g. periods of activity delimited by idle time). There's a specific construct in Sky to do that:

FOR EACH SESSION DELIMITED BY 2 HOURS
  WHEN action == '/index.html' THEN
    SELECT count() INTO "step1"
    WHEN action == '/signup.html' THEN
      SELECT count() INTO "step1"
    END
  END
END

Filter by Time

Specifying a conditional on the timestamp is tricky since every event has a different timestamp. In your example, are you limiting all events queried by the time range? I find it usually makes more sense to only filter the initial event by the time range since cutting off the signup events outside that range would exclude valid signups. For example, in SkyQL I would do:

WHEN action == 'visit' && timestamp > 1390176000 && timestamp < 1390780800 THEN
  SELECT count() INTO "step1"
  WHEN action == 'signup' WITHIN 1..10000 STEPS THEN
    SELECT count() INTO "step2"
  END
END

...instead of specifying it for every step like this:

WHEN action == 'visit' && timestamp > 1390176000 && timestamp < 1390780800 THEN
  SELECT count() INTO "step1"
  WHEN action == 'signup' && timestamp > 1390176000 && timestamp < 1390780800 WITHIN 1..10000 STEPS THEN
    SELECT count() INTO "step2"
  END
END

Finding the Next Action

Another big feature I use a lot in Sky is knowing what happens immediately after a funnel. For example, if a user goes to /index.html then /signup.html, what do they do next? It helps when interactively drilling through a funnel. In SkyQL it's written like this:

WHEN action == 'visit' THEN
  SELECT count() INTO "step1"
  WHEN action == 'signup' WITHIN 1..10000 STEPS THEN
    SELECT count() INTO "step2"
    WHEN true WITHIN 1..1 STEPS THEN
      SELECT count() GROUP BY action INTO "step3"
    END
  END
END

which will return:

{
  "step1": 12,
  "step2": 7,
  "step3": {
    "action": {
      "/signup_success.html": {"count": 3},     // Some people successfully signed up.
      "/pricing.html": {"count": 2}             // Some people went to the pricing page.
                                                // Anyone else just dropped off
    }
  }
}

Grouping by Time Period

I like what you're doing with the group by time(1d). I find it useful for replaying how long it takes people to move from one action (or state) to another. In Sky it's done with a temporal loop:

WHEN action == 'signup' THEN
  FOR i EVERY 1 DAY WITHIN 30 DAYS
    SELECT avg(friend_count) GROUP BY i
  END
END

This query shows the average number of friends someone has (on a social networking site in this example) every day for the first 30 days after sign up.

Other features

Another cool feature Sky has is using inline variables to track implicit state or making finite state machines based on actions. For example, you can track implicit state such as a user being inactive for 30 days. It's a fun feature to use but I've found most people have a hard time grasping it so it's probably not a great v1 feature. :)

Not sure if this rant has been helpful. Let me know if you have any questions and I can try to clarify. SkyQL will get documented once v0.4.0 releases but currently all the information in pull requests and a handful of people's heads.

Thanks for the thorough writeup @benbjohnson! It's definitely helpful to see how you're doing it in Sky. The where time part of the query is only limiting the starting point and end of the first part of the funnel queries. The last part of the tuple in funnel specifies how much time it has to get to the next stage of the funnel.

Your example of finding the next action is good. Definitely see how that would be useful.

My feedback is that this looks good. I'd try to make it more future-proof. For instance, what if I would accept any (potentially infinite) date? And what if there's more options in the future? I don't know if your query language supports named parameters. You could always use null as the duration parameter to represent an infinite duration.

select funnel(
  (type="visit", type="signup", visitor_id, null),
  (type="signup", type="paid", user_id, null, "some_new_option1", "some_new_option2"))
from user_events
where time > "2014-01-20" and time < "2014-01-27"
group by time(1d)

If you end up adding lots of options, remembering the order of additional options would be tricky.

yeah, named parameters might be better. Will make the query super verbose, but more clear.

+1 on making a funnel query come to fruition with influxdb.

Is this funnel function implemented?

Nope, it's marked as revisit in the future.

hmm.. Is there a known workaround this?

Unfortunately not that I'm aware of :(.

ok.. I am trying to make it work for funnels. Will post if I figure out a way to do this.

Hi! I see this post has been closed long time ago with a label "revisit in the future". I'm very interested in this feature, do you know if anyone started to work on it? Thanks!

Any news?

The future is here! Let's revisit?

Was this page helpful?
0 / 5 - 0 ratings