Name: IP_PREFIX(ip_address, prefix_bits)
Arguments:
Return type: IPADDRESS
Return value: The IP address with only the prefix bits retained. Should handle both IPv4 and IPv6 correctly.
An IP_SUFFIX function be defined similarly. This could be used to get the host bits in IPv6.
https://prestodb.github.io/docs/current/functions/binary.html
Alternatively, if the substr, rpad, and lpad functions in a VARBINARY supported length in bits rather than bytes, we would not need the above UDFs.
https://prestodb.github.io/docs/current/functions/bitwise.html
Or, if the bitwise left/right shift functions supported VARBINARY, we would not need these UDFs.
Here's an implementation that breaks the IP into two bigints, applies a mask to each bigint, and concatenates them. It's ugly, but it works.
ip,
prefix_bits,
CAST(
IF(
-- Is this an IPv4 address?
ip BETWEEN IPADDRESS '0.0.0.0' AND IPADDRESS '255.255.255.255',
-- IPv4 prefix.
CAST(
CONCAT(
-- First 64 bits are 0.
TO_BIG_ENDIAN_64(0),
-- Apply a mask to get next 64 bits.
TO_BIG_ENDIAN_64(
BITWISE_AND(
FROM_BIG_ENDIAN_64(SUBSTR(CAST(ip AS VARBINARY), 9, 8)),
BITWISE_SHIFT_LEFT(
BITWISE_NOT(0),
GREATEST(32 - prefix_bits, 0),
64
)
)
)
) AS IPADDRESS
),
-- IPv6 prefix.
CAST(
CONCAT(
-- Apply a mask to first 64 bits.
TO_BIG_ENDIAN_64(
BITWISE_AND(
FROM_BIG_ENDIAN_64(SUBSTR(CAST(ip AS VARBINARY), 1, 8)),
BITWISE_SHIFT_LEFT(
BITWISE_NOT(0),
GREATEST(64 - prefix_bits, 0),
64
)
)
),
-- Apply a mask to second 64 bits.
TO_BIG_ENDIAN_64(
BITWISE_AND(
FROM_BIG_ENDIAN_64(SUBSTR(CAST(ip AS VARBINARY), 9, 8)),
BITWISE_SHIFT_LEFT(
BITWISE_NOT(0),
GREATEST(128 - prefix_bits, 0),
64
)
)
)
) AS IPADDRESS
)
) AS VARCHAR
) AS ip_prefix
FROM (
SELECT
CAST(ip_address AS IPADDRESS) AS ip,
IF(
CAST(ip_address AS IPADDRESS) BETWEEN IPADDRESS '0.0.0.0' AND IPADDRESS '255.255.255.255',
8,
68
) AS prefix_bits
FROM dim_one_row
CROSS JOIN UNNEST(ARRAY['1.2.3.4', 'face:face:face:face:face:face:face:face'])
AS t(ip_address)
)
Output is:
| ip | prefix_bits | ip_prefix |
| -----| -----| ----- |
| 1.2.3.4 | 8 | 1.0.0.0 |
| face:face:face:face:face:face:face:face | 68 | face:face:face:face:f000:: |
IP_PREFIX(ip_address, prefix_bits) is reasonable. Please go for it.
BTW, currently Presto does not support IP address in prefix format. So you probably first need to add support to that.
Thanks for your help!
The IP prefix can simply be returned as a regular IP address, with non-prefix bits set to 0.
The IP prefix can simply be returned as a regular IP address, with non-prefix bits set to 0.
That would be unreliable because 0.0.0.0/16 and 0.0.0.0/17 are different IP prefix.
You are totally correct, I was imagining that the user would also store the prefix_bits in the table and use that for joins and grouping.
Would that be acceptable? I worry that introducing a new type (e.g. IPPREFIX)here will make it difficult to write future UDFs that may need to support both IPADDRESS and IPPREFIX. What do you think?
IP prefix is a well known concept, if you claim something it an IP prefix but it doesn't act that way, there are all sort of potential problems you could produce. So I don't think it's a good idea to return an normal IP address as prefix. I'd look into whether prefix can be a valid IPADDRESS first. If not, we should support IP prefix as a separate type.
I believe IP prefixes are most commonly represented either in CIDR notation (e.g. 0.0.0.0/16) or as a subnet mask (e.g. 0.0.0.0 for both 0.0.0.0/16 and 0.0.0.0/17). A subnet mask and CIDR notation can both be converted into a regular IPADDRESS.
is that sufficient justification, or is there somewhere else you suggest I look for this information?
We should be able to tell whether an IP address is a prefix or a real address. That's the bottomline.
Thanks for the feedback! We'll proceed with the pure presto implementation above for now. Later on, if the need arises, we'll look into making a UDF.
I am also interested in support for ip prefix. I'll make an implementation of IP prefix as a separate type.
Most helpful comment
I am also interested in support for ip prefix. I'll make an implementation of IP prefix as a separate type.