2019.08.20 10:56:30.062278 [ 77 ] {e781c38b-55fb-4048-995c-804c0cd24544}
Table structure:
(RECORD_ID UInt64, PART_KEY UInt32, SPEC String, PROC_DATE Date, TRAN_DATE Date, TRAN_WEEK Date, TRAN_MONTH Date, TRAN_QUARTER Date, TRAN_WEEKDAY UInt8, TRAN_YEARMONTH UInt32, TRAN_DATETIME DateTime, LOCAL_DATETIME DateTime, GMT_DATETIME DateTime, MTI Int16, PS_ID Int8, CARD UInt64, CARD_MASK String, CARD_TRUNCATED String, CARD_LAST_DIGITS UInt16, CARD_TOKEN UInt64, CARD_LENGTH UInt8, CARD_SHA256 String, CARD_CITYHASH64 UInt64, CARD_SERVICE_CODE Int16, CARD_SEQUENCE Int16, CARD_PRODUCT String, BIN6 UInt64, BIN8 UInt64, PROC_CODE Int16, RESP_CODE String, PEM Int16, TRAN_CURRENCY Int16, BILL_CURRENCY Int16, TRAN_AMOUNT UInt64, TRAN_AMOUNT_RUB UInt64, TRAN_AMOUNT_STEP_100 UInt64, TRAN_AMOUNT_STEP_500 UInt64, TRAN_AMOUNT_STEP_1000 UInt64, TRAN_AMOUNT_STEP_5000 UInt64, TRAN_AMOUNT_RANGE UInt8, FEE_SIGN Int8, FEE_AMOUNT UInt64, ACQUIRING_MEMBER_ID UInt64, PROCESSING_MEMBER_ID UInt64, ISSUING_MEMBER_ID UInt64, ISSUING_MEMBER_COUNTRY UInt64, ACQUIRING_MEMBER_COUNTRY UInt64, TERMINAL_ID String, MERCHANT_ID String, TERMINAL_ID_INT Int64, MERCHANT_ID_INT Int64, MCC UInt64, TID_MID UInt64, TID_MID_MCC UInt64, ACCEPTOR_NAME_LOCATION String, SELLER_NAME String, MERCHANT_NAME String, GROUPED_MERCHANT_NAME String DEFAULT '-1', MERCHANT_CITY String, MERCHANT_COUNTRY UInt64, MERCHANT_POSTAL_CODE UInt64, PARTNER_ID UInt64, TOKEN_REQUESTOR UInt64, ECI Int8, TRANSACTION_STATUS String, CARDHOLDER_PRESENCE Int16, TERMINAL_TYPE String, TRANSFER_TYPE String, BENEFIT_TYPE UInt64, CARDHOLDER_BIRTHDATE UInt64, CARDHOLDER_BIRTH_YEAR Int16, IS_G2C UInt8, IS_NON_RUB UInt8, IS_APPROVED UInt8, IS_ISSUER_RESPONSE UInt8, IS_STANDIN UInt8, IS_CALLOUT UInt8, IS_TOKEN UInt8, IS_ZERO_AMOUNT UInt8, IS_PIN_BASED UInt8, IS_AUTH_ON_US UInt8, IS_CAT UInt8, IS_COLLECTION_ONLY UInt8, IS_REPORTING_ONLY UInt8, IS_OFFLINE UInt8, RRN UInt64, AUTH_CODE String, TRN String, ACQUIRER_GATE UInt64, ISSUER_GATE UInt64, POS_DATA String, VALID String, EMV_TAGS Array(String), EMV_VALUES Array(String), ADD_DATA_TAGS Array(String), ADD_DATA_VALUES Array(String), EXT_DATA_TAGS Array(String), EXT_DATA_VALUES Array(String), PAYMENT_SOURCE_COMPANY String, PAYMENT_SOURCE_ACCOUNT String, PAYMENT_SYSTEM String DEFAULT '-1', TRAFFIC_TYPE String DEFAULT '-1', TRAFFIC_SOURCE String DEFAULT '-1', MCC_KEY UInt64, TRAN_MC UInt8 DEFAULT 0, TRAN_VS UInt8 DEFAULT 0, TRAN_MR UInt8 DEFAULT 0, TRAN_MR_FULL UInt8 DEFAULT 0, TRAN_AMOUNT_MC UInt64 DEFAULT CAST(0, 'UInt64'), TRAN_AMOUNT_VS UInt64 DEFAULT CAST(0, 'UInt64'), TRAN_AMOUNT_MR UInt64 DEFAULT CAST(0, 'UInt64'), TRAN_AMOUNT_MR_FULL UInt64 DEFAULT CAST(0, 'UInt64'), CARD_MC UInt64 DEFAULT CAST(0, 'UInt64'), CARD_VS UInt64 DEFAULT CAST(0, 'UInt64'), CARD_MR UInt64 DEFAULT CAST(0, 'UInt64'), CARD_MR_FULL UInt64 DEFAULT CAST(0, 'UInt64'), INDEX i1 PART_KEY TYPE minmax GRANULARITY 4) ENGINE = MergeTree PARTITION BY PART_KEY ORDER BY (PS_ID, MERCHANT_ID, TERMINAL_ID) SETTINGS index_granularity = 8192
SELECT
DISTINCT general_number
FROM
invoice.item_test
WHERE like(lcase(name),lcase('%каша%'))
LIMIT 100000
Code: 246, e.displayText() = DB::Exception: Bad size of marks file '/var/lib/clickhouse/data/invoice/item_test/201607_1_1821_3/skp_idx_name_search.mrk2': 3936, must be: 4824 (version 19.13.1.11 (official build))
CREATE TABLE invoice.item_test (
idUUID,
general_numberString,
general_issuance_dateDateTime,
nameString,
codeNullable(String),
unitsNullable(Decimal(38, 6)),
quantityNullable(Decimal(38, 6)),
priceNullable(Decimal(38, 6)),
costNullable(Decimal(38, 6)),
summa_vatNullable(Decimal(38, 6)),
cost_vatNullable(Decimal(38, 6)),
rate_vatNullable(Decimal(38, 6)),
INDEX name_search lcase(name) TYPE set(1000) GRANULARITY 4
) ENGINE = MergeTree() PARTITION BY toYYYYMM(general_issuance_date)
ORDER BY
(general_issuance_date, id) SETTINGS index_granularity = 8192
I get this error when I build an index tokenbf_v1(256, 4, 7) granularity 8 or bloom_filter(0.01) granularity 8 on strings.
I wonder if this has something to do with my FS mount options:
/dev/md0 on /mnt/data type ext4 (rw,noatime,nodiratime,block_validity,delalloc,nojournal_checksum,barrier,user_xattr,acl,stripe=1024)
tune2fs -O ^has_journal /dev/md0
Can't reproduce it locally with simple fake data :(. I will be glad If anybody can write a script to reproduce the issue. Still investigating.
I wonder if this has something to do with my FS mount options
It should not relate.
I can give you a 300GB dump of my table (it is not secret). I am tar-ing it.
I already have examples of broken columns (thanks to @EvgenyVinogradov), and the problem is clear -- in some cases, we don't write marks for index columns. But according to reports the amount of missed marks is always different and I cannot find any similarities between them...
Reproduced possible another, but related issue:
:) SET allow_experimental_data_skipping_indices=1;
:) CREATE TABLE some_table
(
`Key` UInt64,
`Val1` UInt64,
`Val2` UInt64,
INDEX i1 Val1 TYPE minmax GRANULARITY 1
)
ENGINE = MergeTree()
ORDER BY Key
SETTINGS vertical_merge_algorithm_min_rows_to_activate = 1, vertical_merge_algorithm_min_columns_to_activate = 1;
Ok.
:) insert into some_table select number, number + 1, number + 2 from numbers(1000);
Ok.
:) optimize table some_table final;
server:
2019.08.27 19:56:50.665809 [ 121 ] {} <Fatal> BaseDaemon: ########################################
2019.08.27 19:56:50.665864 [ 121 ] {} <Fatal> BaseDaemon: (version 19.14.1.1) (from thread 120) Received signal Segmentation fault (11).
2019.08.27 19:56:50.665886 [ 121 ] {} <Fatal> BaseDaemon: Address: NULL pointer. Access: read. Address not mapped to object.
2019.08.27 19:56:50.665907 [ 121 ] {} <Fatal> BaseDaemon: Stack trace: 0xa045964 0xa04add5 0x9f955ba 0x9e5ba41 0x9e6065c 0x9ae15a6 0x9d4e851 0x9d4e1e0 0x67eb096 0x67f54bc 0xa4a308c 0xa4a3536 0xaa3125a 0xaa2ee7b 0xaa306cb 0x7f78e68a56db 0x7f78e623088f
2019.08.27 19:56:50.739583 [ 121 ] {} <Fatal> BaseDaemon: 3. 0xa045964 DB::IMergedBlockOutputStream::calculateAndSerializeSkipIndices(std::__1::vector<DB::ColumnWithTypeAndName, std::__1::allocator<DB::ColumnWithTypeAndName> > const&, unsigned long) /home/alesap/code/cpp/ClickHouse/contrib/libcxx/include/memory:2616
2019.08.27 19:56:50.740290 [ 121 ] {} <Fatal> BaseDaemon: 4. 0xa04add5 DB::MergedColumnOnlyOutputStream::write(DB::Block const&) /home/alesap/code/cpp/ClickHouse/contrib/libcxx/include/ostream:864
2019.08.27 19:56:50.741093 [ 121 ] {} <Fatal> BaseDaemon: 5. 0x9f955ba DB::MergeTreeDataMergerMutator::mergePartsToTemporaryPart(DB::FutureMergedMutatedPart const&, DB::MergeListEntry&, DB::TableStructureReadLockHolder&, long, DB::DiskSpaceMonitor::Reservation*, bool, bool) /home/alesap/code/cpp/ClickHouse/dbms/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp:845
2019.08.27 19:56:50.741847 [ 121 ] {} <Fatal> BaseDaemon: 6. 0x9e5ba41 DB::StorageMergeTree::merge(bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool, bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*) /home/alesap/code/cpp/ClickHouse/dbms/src/Storages/StorageMergeTree.cpp:627
2019.08.27 19:56:50.742667 [ 121 ] {} <Fatal> BaseDaemon: 7. 0x9e6065c DB::StorageMergeTree::optimize(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::IAST> const&, bool, bool, DB::Context const&) /home/alesap/code/cpp/ClickHouse/dbms/src/Storages/StorageMergeTree.cpp:926
2019.08.27 19:56:50.743143 [ 121 ] {} <Fatal> BaseDaemon: 8. 0x9ae15a6 DB::InterpreterOptimizeQuery::execute() /home/alesap/code/cpp/ClickHouse/contrib/libcxx/include/memory:4047
2019.08.27 19:56:50.743739 [ 121 ] {} <Fatal> BaseDaemon: 9. 0x9d4e851 DB::executeQueryImpl(char const*, char const*, DB::Context&, bool, DB::QueryProcessingStage::Enum, bool) /home/alesap/code/cpp/ClickHouse/dbms/src/Interpreters/executeQuery.cpp:0
2019.08.27 19:56:50.744259 [ 121 ] {} <Fatal> BaseDaemon: 10. 0x9d4e1e0 DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::Context&, bool, DB::QueryProcessingStage::Enum, bool) /home/alesap/code/cpp/ClickHouse/contrib/libcxx/include/tuple:265
2019.08.27 19:56:50.744348 [ 121 ] {} <Fatal> BaseDaemon: 11. 0x67eb096 DB::TCPHandler::runImpl() /home/alesap/code/cpp/ClickHouse/dbms/programs/server/TCPHandler.cpp:0
2019.08.27 19:56:50.744623 [ 121 ] {} <Fatal> BaseDaemon: 12. 0x67f54bc DB::TCPHandler::run() /home/alesap/code/cpp/ClickHouse/dbms/programs/server/TCPHandler.cpp:0
2019.08.27 19:56:50.745297 [ 121 ] {} <Fatal> BaseDaemon: 13. 0xa4a308c Poco::Net::TCPServerConnection::start() /home/alesap/code/cpp/ClickHouse/contrib/poco/Net/src/TCPServerConnection.cpp:57
2019.08.27 19:56:50.745933 [ 121 ] {} <Fatal> BaseDaemon: 14. 0xa4a3536 Poco::Net::TCPServerDispatcher::run() /home/alesap/code/cpp/ClickHouse/contrib/libcxx/include/atomic:1036
2019.08.27 19:56:50.746647 [ 121 ] {} <Fatal> BaseDaemon: 15. 0xaa3125a Poco::PooledThread::run() /home/alesap/code/cpp/ClickHouse/contrib/poco/Foundation/include/Poco/Mutex_STD.h:132
2019.08.27 19:56:50.747352 [ 121 ] {} <Fatal> BaseDaemon: 16. 0xaa2ee7b Poco::ThreadImpl::runnableEntry(void*) /home/alesap/code/cpp/ClickHouse/contrib/poco/Foundation/include/Poco/SharedPtr.h:156
2019.08.27 19:56:50.748092 [ 121 ] {} <Fatal> BaseDaemon: 17. 0xaa306cb void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void* (*)(void*), Poco::ThreadImpl*> >(void*) /home/alesap/code/cpp/ClickHouse/contrib/libcxx/include/memory:2648
2019.08.27 19:56:50.748115 [ 121 ] {} <Fatal> BaseDaemon: 18. 0x7f78e68a56db start_thread /lib/x86_64-linux-gnu/libpthread-2.27.so
2019.08.27 19:56:50.748207 [ 121 ] {} <Fatal> BaseDaemon: 19. 0x7f78e623088f clone /build/glibc-OTsEL5/glibc-2.27/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:97
2019.08.27 19:56:51.140693 [ 49 ] {} <Debug> ConfigReloader: Loading config '/home/alesap/config/config.xml'
Here is the table: https://drive.google.com/file/d/1oyeD-oNuDkyKyhYXvdaXf_SeoUS4w0dK (230GB)
How to reproduce:
ALTER TABLE uasts ADD INDEX value value bloom_filter(0.01) granularity 4;
OPTIMIZE TABLE uasts FINAL;
// 3 hours later...
SELECT count(*) FROM uasts WHERE value='numpy';
Есть два уточнения которые возможно помогут, в моём случае база запущена в докере и таблица заполнена через запрос:
INSERT INTO invoice.esf_item_test
SELECT
id,
general_number,
general_issuance_date,
name,
code,
oced,
description,
units,
quantity,
price,
cost,
summa_excise,
summa_vat,
cost_vat,
rate_vat
FROM
invoice.esf_item
https://github.com/yandex/ClickHouse/pull/6723 fix other issue.
Need to recreate the index?
@vmarkovtsev Can you provide table definition: metadata/dbname/uasts.sql?
@alesapin
ATTACH TABLE uasts
(
`id` Int32,
`repo` String,
`lang` String,
`file` String,
`line` Int32,
`parents` Array(Int32),
`pkey` String,
`roles` Array(Int16),
`type` String,
`uptypes` Array(String),
`value` String
)
ENGINE = MergeTree()
ORDER BY (repo, file, id)
SETTINGS index_granularity = 8192
Reproduced on my machine.
Most helpful comment
Reproduced on my machine.