What are you trying to achieve?
Spec states that "SQL statement without variable arguments" are a good choice for DB span name, but sometimes SQL queries (especially ones generated by ORM frameworks like Hibernate) get terribly long and may have very high cardinality. E.g.
SELECT t.col1 as column1, t.col2 as column2, t.col3 as column3, t.col4 as column4, t.col5 as column5 from table t where t.id = ?
just for selecting a single entry. Imagine a query that eagerly fetches 3 tables joined together, each one having 15-20 columns, with some complex conditions -- and everything generated completely dynamically using session.createCriteria() (Hibernate again).
Instead, would it be fine to use just SELECT (or INSERT or ...) as span name? Full query should be available as db.statement anyway.
First, how do you get SELECT or INSERT from the query without parsing it? Spec advises strongly against query parsing and I agree.
I very much prefer the John's idea from https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/1409#issuecomment-713028264
I very much prefer the John's idea from open-telemetry/opentelemetry-java-instrumentation#1409 (comment)
馃憤
Full query should be available as
db.statementanyway.
馃憤
Alternative proposition: what about statement type + main table name?
SELECT FROM <table>
UPDATE <table>
INSERT INTO <table>
DELETE FROM <table>
MERGE INTO <table>
...
It's possible to implement that using a simple lexer (and the implementation won't be very complicated, I think). The downside of this solution is that it would cover only the most common/simple SELECT queries, tables used in joins & subqueries wouldn't be visible in the span name (of course, there's still the db.statement attribute that contains the original query).
What do you think about that?
CC @bogdandrutu @tigrannajaryan
I think with the current spec, this is all allowed, as it is not very strict:
The span name SHOULD be set to a low cardinality value representing the statement executed on the database. It may be a stored procedure name (without arguments), SQL statement without variable arguments, operation name, etc. When it's otherwise impossible to get any meaningful span name,
db.nameor the tech-specific database name MAY be used.
I think we wanted to avoid requiring or suggesting to parse anything though. If you have a very long statement, I think the db.name suggestion applies.
I think we wanted to avoid requiring or suggesting to parse anything though. If you have a very long statement, I think the db.name suggestion applies.
You still need to do simple parsing if you want to remove (potentially sensitive) arguments from a SQL query.
Anyway, I wanted to find some middle ground between db.name (which provides very little information) and the full query (which provides way too much information), so I'm glad that spec allows it.
Do you think we could mention it in the spec?
Sure, the spec already has a few examples listed with etc. at the end, adding the suggestion to use the SQL keyword seems OK. EDIT: But probably db.name + keyword, otherwise we would get too few distinct span names. I think just "SELECT" does not describe an "interesting class of spans" as the general span name spec puts it.
I think just "SELECT" does not describe an "interesting class of spans" as the general span name spec puts it.
Oh, I meant SELECT FROM <table>, SELECT by itself really is not descriptive enough.
Hmm, what if you have more than one table? or what if you SELECT FROM (SELECT ...)? This might be too complex for a general recommendation. Maybe <keyword> <db.name>.<db.table>? (see #1141 for db.table)
Maybe
<keyword> <db.name>.<db.table>
That actually sounds great. In case of nested queries or multiple tables <db.table> could be omitted, same as what #1141 suggests for that attribute:
If the operation is acting upon an anonymous table, or more than one table, this value may be omitted.
So for SELECT FROM (SELECT ...) the span name would be SELECT db_name without any table. WDYT?
Most helpful comment
That actually sounds great. In case of nested queries or multiple tables
<db.table>could be omitted, same as what #1141 suggests for that attribute:So for
SELECT FROM (SELECT ...)the span name would beSELECT db_namewithout any table. WDYT?