Clickhouse: New IP address functions/types (CIDR mask)

Created on 13 Dec 2016  Β·  3Comments  Β·  Source: ClickHouse/ClickHouse

It would be great to have some kind of these IPv4-related (IPv6 too?) functions:

IPv4NumToStringClassC(num) isn't enough for proper subnet handling.

:) SELECT IPv4CIDRtoIPv4Range('192.168.1.0/24') AS ips_range;

-- Returns minimum and maximum IP addressses that belong to that subnet
-- (192.168.1.0, 192.168.1.255)

β”Œβ”€ips_range───────────────┐
β”‚ (3232235776,3232236031) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

It's also hard to check if IP belongs to a certain subnet.

:) SELECT IPv4StringToNum('192.168.1.42') AS ip, IPv4inRange(ip, '192.168.1.0/24') AS ip_in_subnet;

-- Some kind of helper to check if IP belongs to a certain subnet.
-- Like `ip BETWEEN range_min AND range_max`

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€ip─┬─ip_in_subnet─┐
β”‚ 3232235818 β”‚            1 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Finally, there should be a way to group IP addresses by its subnets:

:) SELECT IPv4StringToNum('192.168.1.42') AS ip, IPv4getSubnet(ip, 28) AS subnet;

-- Returns a CIDR subnet (or a range, or whatever) for the given IP and netmask

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€ip─┬─subnet──────────┐
β”‚ 3232235818 β”‚ 192.168.1.32/28 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
feature

Most helpful comment

It would be nice to have special storage type to store IPv4 and IPv6 subnets. For IPv4 subnets, we need 32 bits for the network address and uint8 to store CIDR mask length (valid range 0-32).

Yes, I could store this data in two fields but it does not look as great solution. Also I want to have ability to extract / handle this data with CH bundled functions without external additional processing.

All 3 comments

BTW, here's my snippet of a (not that well working πŸ˜€) CIDRtoRange function:

SELECT network, min_, max_ FROM (
        SELECT
            arrayJoin([
                '42.61.141.0/24',
                '10.1.1.16/29',
                '192.168.0.0/23',
                '8.8.8.8/24',
                '127.0.0.1/30',
                '127.0.0.1/32'
            ]) AS network,
            -bitShiftLeft(-1, 32 - toUInt8(extract(network, '/(\\d+)$'))) - 3 AS subnet_count,
            toUInt32(IPv4StringToNum(extract(network, '(\\d+.\\d+.\\d+.\\d+)')) + 1) AS min_ip,
            toUInt32(min_ip + subnet_count) AS max_ip,
            IPv4NumToString(min_ip) as min_,
            IPv4NumToString(max_ip) as max_
    )
β”Œβ”€network────────┬─proper_min──┬─proper_max────┬─min_────────┬─max_──────────┬─OK?────┐
β”‚ 42.61.141.0/24 β”‚ 42.61.141.1 β”‚ 42.61.141.254 β”‚ 42.61.141.1 β”‚ 42.61.141.254 β”‚ true   β”‚
β”‚ 10.1.1.16/29   β”‚ 10.1.1.17   β”‚ 10.1.1.22     β”‚ 10.1.1.17   β”‚ 10.1.1.22     β”‚ true   β”‚
β”‚ 192.168.0.0/23 β”‚ 192.168.0.1 β”‚ 192.168.1.254 β”‚ 192.168.0.1 β”‚ 192.168.1.254 β”‚ true   β”‚
β”‚ 8.8.8.8/24     β”‚ 8.8.8.1     β”‚ 8.8.8.254     β”‚ 8.8.8.9     β”‚ 8.8.9.6       β”‚ false  β”‚
β”‚ 127.0.0.1/30   β”‚ 127.0.0.1   β”‚ 127.0.0.2     β”‚ 127.0.0.2   β”‚ 127.0.0.3     β”‚ false  β”‚
β”‚ 127.0.0.1/32   β”‚ 127.0.0.1   β”‚ 127.0.0.1     β”‚ 127.0.0.2   β”‚ 127.0.0.0     β”‚ false  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜

It would be nice to have special storage type to store IPv4 and IPv6 subnets. For IPv4 subnets, we need 32 bits for the network address and uint8 to store CIDR mask length (valid range 0-32).

Yes, I could store this data in two fields but it does not look as great solution. Also I want to have ability to extract / handle this data with CH bundled functions without external additional processing.

Yes, I could store this data in two fields but it does not look as great solution. Also I want to have ability to extract / handle this data with CH bundled functions without external additional processing.

The functions have been added in #5095

About storage of IP address and mask length in separate columns - in fact it is the recommended way (for better compression and query performance). Although it is possible to introduce a data type that will store them in two separate files.

Was this page helpful?
0 / 5 - 0 ratings