Example: Running Points to Polygon analysis (Which can at the moment only be aggregated by count), then intersecting that to obtain a count from a points dataset breaks the analysis & viz. Deleting broken analysis and refreshing page breaks builder.
Error: column "count_vals" specified more than once

cc/ @javisantana
@saleiva @xavijam In my opinion, this is a key workflow that should be solved. For instance, I have Madrid neighborhoods and a set of points layers such as hospitals, green areas, libraries and so on. I could intersect one by one with the neighborhoods layer, and then use the autostyle to show the variability both geographically and thematic.
Added next so we can attack this sooner
+1, BCG also reporting this behavior.
I'm looking into this.
The problem is that both the aggregate-intersection/"Insersect second layer" and group-points/"Group points into polygons" (convex-hull, boundix-box, concave-hullnodes) analysis add columns named count_vals when agregating by count.
So, in our case, which chains two such analyses, camshaft tries to create an analysis cache table like this one (with two count_vals columns):
CREATE TABLE analysis_b194a8f896_dcb3c5cac4fbf0062380d58a0b3497ab0d531320 AS
SELECT * FROM (
SELECT
_cdb_analysis_source.cartodb_id,
_cdb_analysis_source.category,
_cdb_analysis_source.the_geom,
_cdb_analysis_source.count_vals, -- << first count_vals column
count(_cdb_analysis_target.cartodb_id) as count_vals, -- << second count_vals column
count(_cdb_analysis_target.cartodb_id) / GREATEST(0.0000026, ST_Area((ST_Transform(_cdb_analysis_source.the_geom, 4326))::geography)) as count_vals_density
FROM (
SELECT
row_number() over() as cartodb_id,
distrito as category,
ST_ConvexHull(ST_Collect(the_geom)) AS the_geom,
count(1) as count_vals
FROM (
SELECT * FROM points_table
) _analysis_source
GROUP BY distrito
) _cdb_analysis_source,
(SELECT * FROM points_table) _cdb_analysis_target
WHERE ST_Intersects(_cdb_analysis_source.the_geom, _cdb_analysis_target.the_geom)
GROUP BY _cdb_analysis_source.cartodb_id, _cdb_analysis_source.category, _cdb_analysis_source.the_geom, _cdb_analysis_source.count_vals
) _analysis_create_table_query
LIMIT 0;
With other aggregates different from count it is also possible this kind of problem, if the second analysis uses the same aggregate function on a columns with the same name (in this case the doplicated column would be named {aggregate_function}_{base_column}. (here's an example; you can see the error activating DevTools and looking at the named map instantiation response)
I see no obvious way of preventing this kind of problem in general without introducing much uglier column names.
I think this is a very important workflow for any client. Better to have ugly column names than useless analysis right?
Yep, let me give it a thought
What about count_vals_1, count_vals_vals_2 and so on?
It might not be trivial to apply such as naming-scheme (notice that each node's query is independently generated, and it should be deterministic based on the parameters used for hashing the node's id), but I think we can detect possible name collisions and do some renaming similar to that, given that we can know the source node's column names (with getColumns() method, thanks to @dgaubert for the info).
I will look for some simple way to handle this.
I've just noticed we've had a ticket for this for some time: https://github.com/CartoDB/camshaft/issues/235
I've also open a new ticket for the solution I'm working on: https://github.com/CartoDB/camshaft/issues/330
I intended to suffix the ambiguous column names with something meaningful to the user (the a0, a1, etc. identifiers that Builder uses), but I've realised that we cannot do that, so we'll have to just use numerical (_2, _3, ...) suffixes. I'm afraid it will be hard for the user to determine which column corresponds to which analysis, but I see no easy way to fix it.
Hm, what about an identifier for what analysis it came from? For example centroid_count_1, centroid_count_2, etc
That's feasible if we don't correlate the _N numbers with different analyses. Can get a little verbose for my taste, but easy to implement.
All in all I think it could be worth, since it would make clear where the column belongs in most cases.
Fixed and deployed, thank you all!
Vamos!!!
Most helpful comment
@saleiva @xavijam In my opinion, this is a key workflow that should be solved. For instance, I have Madrid neighborhoods and a set of points layers such as hospitals, green areas, libraries and so on. I could intersect one by one with the neighborhoods layer, and then use the autostyle to show the variability both geographically and thematic.