When I do substr to a fixed length CHAR(n) column, it returns white-space-padded string. But I think it should return a truncated string without padding.
I've done the same thing in MySQL and it returned a truncated string which I expected for.
abcd1234)select substr(col1, 1, 4) from T)varchar(n) abcd (truncated text)
char(n) abcd (four white spaces padded)
331
I need to take another look at the SQL specification, but this seems like the correct and expected behavior.
CHAR(n) is a fixed-length type (as opposed to VARCHAR(n). The result of applying substr to a CHAR(n) is, by necessity, of the same type — the type system is not powerful enough to derive a new type based on the actual arguments to the function and infer that because you asked for a substring of length k, the result should be CHAR(k).
I think substr(char(n), ...) should return varchar(n) (instead of char(n)). Let's check the spec though.
substr doesn't actually exist in the spec. There's a SUBSTRING construct that behaves in this way:
<character substring function> ::=
SUBSTRING <left paren> <character value expression> FROM <start position>
[ FOR <string length> ] [ USING <char length units> ] <right paren>
If <character substring function> CSF is specified, then let DTCVE be the declared type of the
<character value expression> immediately contained in CSF. The maximum length, character set,
and collation of the declared type DTCSF of CSF are determined as follows:
a) Case:
i) If the declared type of <character value expression> is fixed-length character string or
variable- length character string, then DTCSF is a variable-length character string type
with maximum length equal to the length or maximum length of DTCVE.
ii) Otherwise, the DTCSF is a large object character string type with maximum length equal to
the maximum length of DTCVE.
To paraphrase, if the argument type is CHAR(n), the result of SUBSTRING(string FROM start FOR length) is supposed to be a variable-length string type such as VARCHAR(n)
On the other hand, substr in other systems behaves like this:
text, a variable length type (https://www.postgresql.org/docs/9.1/functions-string.html)SUBSTRING. The return type is a variable-length type (https://docs.microsoft.com/en-us/sql/t-sql/functions/substring-transact-sql?view=sql-server-ver15)Returning varchar(n) in our implementation seems like a reasonable choice.
thanks @martint for detailed analysis!
Returning
varchar(n)in our implementation seems like a reasonable choice.
Unless @martint @kokosing @electrum you have any other objections, I am gonna remove the syntax-needs-review.