Clickhouse: How do I use an HDFS engine in HA mode

Created on 12 Dec 2019  ·  32Comments  ·  Source: ClickHouse/ClickHouse

Hi:
I now have two namenodes, one primary and one secondary. How do I create the HDFS engine table?
Namenode1:12.12.12.12:9000
Namenode1:12.12.12.13:9000
Current form sentence:
The create table XXXX
(...).Engine = HDFS (' HDFS / / [email protected]:9000 / path/database. Db/table / * ', 'ORC);
How to do HA table building? I need an example, thank you;

comp-documentation comp-foreign-db comp-hdfs help wanted question

Most helpful comment

I also have this problem, no matter I put LIBHDFS3_CONF into global environment or user environment ,it always can not be load 。but i put it into clickhouse start scripts , it works。 you can try it。if you use service clickhouse start , you need add "export LIBHDFS3_CONF=/etc/clickhouse-server/hdfs-client.xml " to /etc/init.d/clickhouse-server; if you use systemctl clickhouse start , you need add "Environment="LIBHDFS3_CONF=/etc/clickhouse-server/hdfs-client.xml"" to /etc/systemd/system/clickhouse-server.service

All 32 comments

I also want to know how to configure the hdfs-engine path.

Currently HA mode is not supported.

Currently HA mode is not supported.

Is there any plan to support ha? thanks

HDFS HA mode is already support. you need put the hdfs-client.xml in the working directory. and then use the namespace uri in the HDFS URI as you used in Hadoop componments.

@weiqxu
Hello, I have a few questions about your answer.
1.Can you say hdfs-client.xml can be understood as hdfs-site.xml?

  1. Which specific directory of clickhouse do you mean by working directory? Is it / etc / clickhouse-server?
  2. Do you still need to configure the path in the config.xml file of clickhouse to point to hdfs-client.xml.
    4.I configure in hdfs-site.xml

         dfs.nameservices
         ns

    Whether it is configured as HDFS in the engine ('HDFS // user @ ns: 9000 / path / database. Db / table / *', 'ORC);
    Thank you!

@gubinjie you can simplify rename the copy the hdfs-site.xml to hdfs-client.xml. the file should be put in working directory or you can use environment variable $LIBHDFS3_CONF to point the configuration file.

Actually, it use the libhdfs3 which from Apache HAWQ to read the HDFS. the configuration should be same.

@weiqxu

  1. The clickhouse host is a separate server and is not on any host in the hadoop cluster.
  2. I copied hdfs-site.xml to the / etc / clickhouse-server directory and renamed the file hdfs-client.xml.
    3.I configured LIBHDFS3_CONF = /etc/clickhouse-server/hdfs-client.xml in the / etc / profile file of the host of clickhouse
    After the above operation, I found that it didn't work.
    Can you explain in more detail what this working directory is?
    Or is there a more detailed example?
    Thank you!

@gubinjie
sorry, maybe I forget something. you need keep the configuration file name as 'hdfs-site.xml'. the LIBHDFS3_CONF should be set to /etc/clickhouse-server for you .

If you still get error, maybe you need give the error info.

@weiqxu
Hello, I tried again several times, but the error log shows dfs.ha.namenodes.ns not found.

  1. This is part of my hdfs-site.xml configuration file. See below configuration
    2.I put this hdfs-site.xml (hdfs-client.xm) file in the / etc / clickhouse-server folder
    3.I will $ LIBHDFS3_CONF = / etc / clickhouse-server
    4.This is ENGINE = HDFS ('hdfs: //ns/hive/*.db/test_ha/*',ORC) for the HDFS table engine I created
    5.View the error log, log error information:
    Code: 210. DB :: Exception: Received from 127.0.0.1:9000. DB :: Exception: Unable to connect to HDFS: InvalidParameted: Cannot parse URI: hdfs: // ns, missing port or invalid HA configuration Caused by: HdfsConfigNotFound: Condfig key: dfs.ha.namenodes.ns not found
    6.configuration file
    <configuration>
    <property>
    <name>dfs.nameservices</name>
    <value>ns</value>
    </property>
    <property>
    <name>dfs.ha.namenodes.ns</name>
    <value>nn1,nn2</value>
    </property>
    <property>
    <name>dfs.namenode.rpc-address.ns.nn1</name>
    <value>BigData1:9000</value>
    </property>
    <property>
    <name>dfs.namenode.http-address.ns.nn1</name>
    <value>BigData1:50070</value>
    </property>
    <property>
    <name>dfs.namenode.rpc-address.ns.nn2</name>
    <value>BigData2:9000</value>
    </property>
    <property>
    <name>dfs.namenode.http-address.ns.nn2</name>
    <value>BigData2:50070</value>
    </property>
    </configuration>
    I don't think ClickHouse has found a configuration file for hdfs-site.xml (hdfs-client.xml). What else needs to be configured?

@gubinjie Your uri seems has a space between "hdfs:" and "//ns/hive/".

@filimonov I'd like to enhance the document this week.

@weiqxu
这应该是复制的问题,实际情况中不存在空格的。我看了一下libhdfs3的源码,的确是找LIBHDFS3_CONF的地址,但是我在hdfs-client.xml文件中配置了dfs.ha.namenodes.ns,但是貌似找不到这个hdfs-client.xml文件,所以是否遗漏了什么?还是我现在用的clickhouse版本是19.16的版本。
能否加个微信:16621128482,
`class DefaultConfig {
public:
DefaultConfig() : conf(new Hdfs::Config) {
bool reportError = false;
const char * env = getenv("LIBHDFS3_CONF");
std::string confPath = env ? env : "";

    if (!confPath.empty()) {
        size_t pos = confPath.find_first_of('=');

        if (pos != confPath.npos) {
            confPath = confPath.c_str() + pos + 1;
        }

        reportError = true;
    } else {
        confPath = "hdfs-client.xml";
    }

    init(confPath, reportError);
}`

两位大神,如果有结果了,请分享一下完整的执行步骤,万分感谢。

@filimonov I'd like to enhance the document this week.

Is this work in progress? If so, please provide the document url. Thank you!

@nickevin Here is the draft

The path part of URT may cantain HDFS nameservice instead of host/ip and port. In this case it will read HDFS configuration values that are configured with hdfs-client.xml. By default the location of hdfs-client.xml is in working directory. Or special environment variable LIBHDFS3_CONF can be used to explicitly point to config file you wish use. The configuration values of hdfs-client.xml can be found in Apache HAWQ. In the simple case, the user can copy the hdfs-site.xml in using to hdfs-client.xml or set the LIBHDFS3_CONF to the file path of hdfs-site.xml. The configuration file can't contain uppercase letters.

CREATE TABLE hdfs_engine_table (name String, value UInt32) ENGINE=HDFS('hdfs://mycluster/other_storage', 'TSV')

I also got the problem ,

@nickevin Here is the draft

The path part of URT may cantain HDFS nameservice instead of host/ip and port. In this case it will read HDFS configuration values that are configured with hdfs-client.xml. By default the location of hdfs-client.xml is in working directory. Or special environment variable LIBHDFS3_CONF can be used to explicitly point to config file you wish use. The configuration values of hdfs-client.xml can be found in Apache HAWQ. In the simple case, the user can copy the hdfs-site.xml in using to hdfs-client.xml or set the LIBHDFS3_CONF to the file path of hdfs-site.xml. The configuration file can't contain uppercase letters.

CREATE TABLE hdfs_engine_table (name String, value UInt32) ENGINE=HDFS('hdfs://mycluster/other_storage', 'TSV')

follow your steps,still got the error HdfsConfigNotFound: Condfig key: dfs.ha.namenodes.ns not found

@amosbird
I haven't solved this problem yet, but someone seems to be able to solve this problem by installing clickhouse using docker. I also need to try this solution when I have time.If you have other good solutions, you can post them to tell us, thank you!

I also tried all sorts of configurations and it gives me the same errro:
"Cannot parse URI: hdfs://nameservice1, missing port or invalid HA configuration Caused by: HdfsConfigNotFound: Config key: dfs.ha.namenodes.nameservice1 not found"

I tried to use working directory as:

  • /etc/clickhouse-server
  • /var/lib/clickhouse-server

I put the file hdfs-client.xml and hdfs-site.xml in both directories and set LIBHDFS3_CONF=/etc/clickhouse-server/hdfs-client.xml
All of this while running clickhouse over docker version 19.17.2.4 (official build)

Is there a solution yet?

@yehiaelbehery
It seems that someone solved the problem using docker installation

@gubinjie OK, any idea about the detailed steps to make it work via docker?

I also have this problem, no matter I put LIBHDFS3_CONF into global environment or user environment ,it always can not be load 。but i put it into clickhouse start scripts , it works。 you can try it。if you use service clickhouse start , you need add "export LIBHDFS3_CONF=/etc/clickhouse-server/hdfs-client.xml " to /etc/init.d/clickhouse-server; if you use systemctl clickhouse start , you need add "Environment="LIBHDFS3_CONF=/etc/clickhouse-server/hdfs-client.xml"" to /etc/systemd/system/clickhouse-server.service

@inertance Thank you very much, it has been solved, the second method is used

If you're using docker add to your docker run commnad:
-e "LIBHDFS3_CONF=/etc/clickhouse-server/hdfs-client.xml"
tested on yandex/clickhouse-server:20.3.8.53 image

Hello, have you solved the problem?

@wbigdata solved!

@wbigdata解决了!

How to solve it can you share,thank you very much.

@wbigdata解决了!

@gubinjie 兄弟,方便给我说一下咋解决的吗,困扰了两天了。。。

@wbigdata
you need add "export LIBHDFS3_CONF=/etc/clickhouse-server/hdfs-client.xml " to /etc/init.d/clickhouse-server;
1、你把搭建hadoop的时候的hdfs-client.xml 的文件复制一份到装有ClickHouse的服务器的/etc/clickhouse-server文件夹里面。
2、然后在装有ClickHouse的服务器的/etc/init.d/clickhouse-server的文件里面加一行export LIBHDFS3_CONF=/etc/clickhouse-server/hdfs-client.xml
3、然后你在用hdfs引擎的时候把nameNode地址写成hdfs-client.xml里面NN节点的别名,并且不要加端口号就可以了

3、然后你在用hdfs引擎的时候把nameNode地址写成hdfs-client.xml里面NN节点的别名,并且不要加端口号就可以了
这个地方不是写成hdfs://集群名称 这样吗?

@wbigdata
you need add "export LIBHDFS3_CONF=/etc/clickhouse-server/hdfs-client.xml " to /etc/init.d/clickhouse-server;
1、你把搭建hadoop的时候的hdfs-client.xml 的文件复制一份到装有ClickHouse的服务器的/etc/clickhouse-server文件夹里面。
2、然后在装有ClickHouse的服务器的/etc/init.d/clickhouse-server的文件里面加一行export LIBHDFS3_CONF=/etc/clickhouse-server/hdfs-client.xml
3、然后你在用hdfs引擎的时候把nameNode地址写成hdfs-client.xml里面NN节点的别名,并且不要加端口号就可以了

搞定了兄弟,太给了了,嘚记录一下。。。

@wbigdata
其实上面英文部分已经讨论出了解决方案,你没有仔细看。

@wbigdata
其实上面英文部分已经讨论出了解决方案,你没有仔细看。

确实是,问题很大,不然周五就可以了,唉

Was this page helpful?
0 / 5 - 0 ratings