Clickhouse: Float values are sometimes altered

Created on 18 Nov 2019  Β·  8Comments  Β·  Source: ClickHouse/ClickHouse

When inserting Float32 or Float64 values over HTTP, I noticed that _sometimes_ their decimal part is missing, or divided by 10000 or 100000, or the float becames inf. This is not systematic and happens only for a few rows (11 out of 500000 in the example below).
The same thing doesn't happen when inserting the same data from a CSV file.

ClickHouse server version : 19.16.2.2, also reproduced with 19.17.3.7.

To reproduce:

  • Create Tables
DROP TABLE IF EXISTS strange_bug_http;
CREATE TABLE strange_bug_http
(
    `row_id` Int32, 
    `longitude` Float32 CODEC(Gorilla, ZSTD), 
    `latitude` Float32 CODEC(Gorilla, ZSTD), 
    `geohash` FixedString(3) CODEC(ZSTD)
)
ENGINE = MergeTree()
ORDER BY row_id;

// for comparison
DROP TABLE IF EXISTS strange_bug_csv;
CREATE TABLE strange_bug_csv
(
    `row_id` Int32, 
    `longitude` Float32 CODEC(Gorilla, ZSTD), 
    `latitude` Float32 CODEC(Gorilla, ZSTD), 
    `geohash` FixedString(3) CODEC(ZSTD)
)
ENGINE = MergeTree()
ORDER BY row_id;
  • Generate data :

You can download the generated files from here.

# pip install python-geohash
# pip install numpy
import numpy as np
import geohash

NUM_POINTS = 500_000

longitude = np.random.rand(NUM_POINTS) * 360 - 180
latitude = np.random.rand(NUM_POINTS) * 180 - 90
@np.vectorize 
def geohashEncode(longitude, latitude, precision=3):
    return geohash.encode(latitude, longitude, precision)
geohash = geohashEncode(longitude, latitude)

data = b','.join([b'(' + b','.join(map(lambda x: repr(x).encode(),(i,)+tup)) + b')' for i,tup in enumerate(zip(longitude, latitude, geohash))])

with open('floats.bin', 'wb') as f: 
    f.write(data)

csv = '\n'.join([','.join(map(lambda x: repr(x),(i,)+tup)) for i,tup in enumerate(zip(longitude, latitude, geohash))])
with open('floats.csv', 'w') as f: 
    f.write(csv)
  • Insert over HTTP:
# pip install uvloop
# pip install aiohttp
import asyncio
import uvloop
import aiohttp

with open('floats.bin', 'rb')  as f: 
    data = f.read()

uvloop.install()
loop = asyncio.get_event_loop()

params = {"database":"default", "query": "INSERT INTO strange_bug_http VALUES"}
async def main(data, params, url="http://localhost:8123/"):
    async with aiohttp.ClientSession() as session:
        async with session.post(url=url, params=params, data=data) as resp:
            return await resp.read()

loop.run_until_complete(main(data, params))
  • Insert from CSV:
cat floats.csv | clickhouse-client --query="INSERT INTO strange_bug_csv FORMAT CSV"
  • Perform the following query on both tables
SELECT 
    row_id + 1, 
    longitude, 
    latitude, 
    geohashEncode(longitude, latitude, 3) AS computed_geohash, 
    geohash
FROM [strange_bug_csv/strange_bug_http]
WHERE computed_geohash != geohash;

On the table filled from CSV, it retrurns one row, which is probably due to geohash miscalculation/float precision :

β”Œβ”€plus(row_id, 1)─┬─longitude─┬──latitude─┬─computed_geohash─┬─geohash─┐
β”‚          360307 β”‚ -171.5625 β”‚ -81.95313 β”‚ 01q              β”‚ 01m     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

On the table filled via HTTP, it retrurns the following :

─plus(row_id, 1)─┬──longitude─┬───latitude─┬─computed_geohash─┬─geohash─┐
β”‚          201272 β”‚ -95.921555 β”‚         52 β”‚ cc5              β”‚ cc7     β”‚
β”‚          241104 β”‚ -14.000025 β”‚  35.987846 β”‚ ewq              β”‚ ewm     β”‚
β”‚          261022 β”‚   7.000061 β”‚ -69.146805 β”‚ h5s              β”‚ h5t     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€plus(row_id, 1)─┬──longitude─┬───latitude─┬─computed_geohash─┬─geohash─┐
β”‚           81429 β”‚  174.77354 β”‚  49.000008 β”‚ zbs              β”‚ zbu     β”‚
β”‚          101695 β”‚   35.81316 β”‚        inf β”‚ uzc              β”‚ uu9     β”‚
β”‚          121610 β”‚ -109.74762 β”‚ -28.000027 β”‚ 3e1              β”‚ 3dc     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€plus(row_id, 1)─┬─longitude─┬──latitude─┬─computed_geohash─┬─geohash─┐
β”‚          300857 β”‚ 157.00002 β”‚ -69.10218 β”‚ p7x              β”‚ pe8     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€plus(row_id, 1)─┬─longitude─┬──latitude─┬─computed_geohash─┬─geohash─┐
β”‚          360307 β”‚ -171.5625 β”‚ -81.95313 β”‚ 01q              β”‚ 01m     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€plus(row_id, 1)─┬─longitude─┬───latitude─┬─computed_geohash─┬─geohash─┐
β”‚          360603 β”‚      -171 β”‚ -56.861885 β”‚ 0jy              β”‚ 0jv     β”‚
β”‚          380518 β”‚ 119.77515 β”‚ -46.000004 β”‚ nxv              β”‚ nxt     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€plus(row_id, 1)─┬──longitude─┬───latitude─┬─computed_geohash─┬─geohash─┐
β”‚          420349 β”‚ -113.02339 β”‚ -11.000004 β”‚ 3qp              β”‚ 3mz     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€plus(row_id, 1)─┬─longitude─┬───latitude─┬─computed_geohash─┬─geohash─┐
β”‚          480089 β”‚ -177.2999 β”‚ -42.000057 β”‚ 209              β”‚ 203     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

For instance, the second and third rows in the results have as longitude -14.000025 and 7.000061 respectively, which is different from the inserted values -14.250967871479986 and 7.608598672771308.

Checking the CSV data :

$ sed -n 241104p floats.csv
241103, -14.250967871479986, 35.98784501339779, 'ewm'

$ sed -n 261022p floats.csv
261021, 7.608598672771308, -69.14680134387808, 'h5t'

Checking the CSV table :

SELECT 
    row_id + 1, 
    longitude, 
    latitude, 
    geohashEncode(longitude, latitude, 3) AS computed_geohash, 
    geohash
FROM strange_bug_csv
WHERE (row_id + 1) IN (241104, 261022);

β”Œβ”€plus(row_id, 1)─┬──longitude─┬───latitude─┬─computed_geohash─┬─geohash─┐
β”‚          241104 β”‚ -14.250968 β”‚  35.987846 β”‚ ewm              β”‚ ewm     β”‚
β”‚          261022 β”‚  7.6085987 β”‚ -69.146805 β”‚ h5t              β”‚ h5t     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Checking the binary data :

def get_binary_row(row_id, data):
    p = data.find("({}".format(row_id).encode())
    return data[p : p + data[p:].find(b')') + 1]

>>> get_binary_row(241104-1, data)
b"(241103,-14.250967871479986,35.98784501339779,'ewm')"
>>> get_binary_row(261022-1, data)
b"(261021,7.608598672771308,-69.14680134387808,'h5t')"

Any workaround is much appreciated.
Thank you.

bug

All 8 comments

Not related to HTTP.

cat f.values |clickhouse-client -q "INSERT INTO strange_bug_val FORMAT Values"

CREATE TABLE strange_bug_val
(   a Float32, 
    b Float32)
ENGINE = Memory;

clickhouse-client -q "select '(-160.32605134916085,37.70584056842162),' from numbers(1000000)" >test.values
cat test.values |clickhouse-client -q "INSERT INTO strange_bug_val FORMAT Values"

SELECT DISTINCT *
FROM strange_bug_val

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€a─┬─────────b─┐
β”‚ -160.32605 β”‚  37.70584 β”‚
β”‚ -160.32605 β”‚ 37.000008 β”‚
β”‚       -160 β”‚  37.70584 β”‚
β”‚ -160.32605 β”‚       inf β”‚
β”‚       -inf β”‚  37.70584 β”‚
β”‚ -160.32605 β”‚        37 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Reproduced with 19.18.1.1667
No issue with 19.15.5.18

@den-crane Thanks for the concise example.

It seems that with less decimals in the input, wrong values such as 37.000008 disappear, but inf and values without decimals persist.

BTW, about CSV, the cause is FLOAT32 vs FLOAT64

β”Œβ”€plus(row_id, 1)─┬─longitude─┬──latitude─┬─computed_geohash─┬─geohash─┐
β”‚          360307 β”‚ -171.5625 β”‚ -81.95313 β”‚ 01q              β”‚ 01m     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜


select geohashEncode(cast(-171.5625053737408 as Float64) ,cast(-81.95313575217848 as Float64)) computed_geohash

β”Œβ”€computed_geohash─┐
β”‚ 01mvzbzuru5y     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

select geohashEncode(cast(-171.5625053737408 as Float32) ,cast(-81.95313575217848 as Float32)) computed_geohash
β”Œβ”€computed_geohash─┐
β”‚ 01qjb0bh2hb1     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

I have also encountered this problem, my workaround is:

  • Create a tmp table and change all Float32 and Float64 columns to String
  • Load data to the tmp table
  • Insert rows from tmp table, converting String values to Float (using toFloat64OrNull or other functions)

JFYI fixed: 19.17.4.11-stable , 19.16.5.15-stable

Thank you guys for your reactivity!

Was this page helpful?
0 / 5 - 0 ratings