Cloud Data Streaming Service로 Presto 연동
    • PDF

    Cloud Data Streaming Service로 Presto 연동

    • PDF

    기사 요약

    VPC 환경에서 이용 가능합니다.

    네이버 클라우드 플랫폼 Cloud Hadoop과 Cloud Data Streaming Service (CDSS)를 연동하는 방법을 소개합니다.
    이 가이드는 Presto 공식 가이드에서 제공하는 Kafka Connector Tutorial 가이드를 기반으로 작성되었습니다.

    사전작업

    1. Cloud Hadoop 클러스터를 생성해 주십시오.
      • Cloud Hadoop 클러스터 생성에 관한 자세한 내용은 Cloud Hadoop 시작 가이드를 참조해 주십시오.
    2. Cloud Data Streaming Service를 생성해 주십시오.
    3. Cloud Data Streaming Service 사용하기 위해 VM 생성 및 세팅을 해주십시오.
    4. ACG를 설정해 주십시오.
      • Cloud Hadoop에서 Cloud Data Streaming Service Broker 노드에 접속하기 위해서 9092 포트를 허용해줘야 합니다.
      • Cloud Data Streaming Service의 Broker 노드 ACG 접근 소스에 Cloud Hadoop의 Subnet 대역을 추가해 주십시오.
        cloudhadoop-use-pre-vpc_ko
    참고

    Cloud Hadoop과 Cloud Data Streaming Service는 같은 VPC 내 통신이 가능한 동일 Subnet으로 생성하는 것을 권장합니다.

    CDSS(Kafka)에 데이터 업로드

    Cloud Data Streaming Service VM에서 Kafka를 실행해 주십시오.

    [root@s17e27e0cf6c ~ ]# cd kafka_2.12-2.4.0
    [root@s17e27e0cf6c kafka_2.12-2.4.0]# ./bin/kafka-server-start.sh -daemon config/server.properties
    

    Kafka에 데이터를 다운로드해 주십시오.

    [root@s17e27e0cf6c kafka_2.12-2.4.0]# curl -o kafka-tpch https://repo1.maven.org/maven2/de/softwareforge/kafka_tpch_0811/1.0/kafka_tpch_0811-1.0.sh
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100 21.6M  100 21.6M    0     0  7948k      0  0:00:02  0:00:02 --:--:-- 7947k
    

    Kafka에 데이터를 업로드해 주십시오.

    [root@s17e27e0cf6c kafka_2.12-2.4.0]# chmod 755 kafka-tpch
    [root@s17e27e0cf6c kafka_2.12-2.4.0]# ./kafka-tpch load --brokers 172.16.2.6:9092 --prefix tpch. --tpch-type tiny
    2022-02-07T10:30:09.426+0900     INFO   main    io.airlift.log.Logging  Logging to stderr
    2022-02-07T10:30:09.448+0900     INFO   main    de.softwareforge.kafka.LoadCommand      Processing tables: [customer, orders, lineitem, part, partsupp, supplier, nation, region]
    2022-02-07T10:30:09.859+0900     INFO   pool-1-thread-1 de.softwareforge.kafka.LoadCommand      Loading table 'customer' into topic 'tpch.customer'...
    2022-02-07T10:30:09.859+0900     INFO   pool-1-thread-2 de.softwareforge.kafka.LoadCommand      Loading table 'orders' into topic 'tpch.orders'...
    2022-02-07T10:30:09.859+0900     INFO   pool-1-thread-3 de.softwareforge.kafka.LoadCommand      Loading table 'lineitem' into topic 'tpch.lineitem'...
    2022-02-07T10:30:09.860+0900     INFO   pool-1-thread-4 de.softwareforge.kafka.LoadCommand      Loading table 'part' into topic 'tpch.part'...
    2022-02-07T10:30:09.860+0900     INFO   pool-1-thread-5 de.softwareforge.kafka.LoadCommand      Loading table 'partsupp' into topic 'tpch.partsupp'...
    2022-02-07T10:30:09.860+0900     INFO   pool-1-thread-6 de.softwareforge.kafka.LoadCommand      Loading table 'supplier' into topic 'tpch.supplier'...
    2022-02-07T10:30:09.860+0900     INFO   pool-1-thread-7 de.softwareforge.kafka.LoadCommand      Loading table 'nation' into topic 'tpch.nation'...
    2022-02-07T10:30:09.865+0900     INFO   pool-1-thread-8 de.softwareforge.kafka.LoadCommand      Loading table 'region' into topic 'tpch.region'...
    2022-02-07T10:30:13.079+0900     INFO   pool-1-thread-7 de.softwareforge.kafka.LoadCommand      Generated 25 rows for table 'nation'.
    2022-02-07T10:30:13.175+0900     INFO   pool-1-thread-6 de.softwareforge.kafka.LoadCommand      Generated 100 rows for table 'supplier'.
    2022-02-07T10:30:13.514+0900     INFO   pool-1-thread-8 de.softwareforge.kafka.LoadCommand      Generated 5 rows for table 'region'.
    2022-02-07T10:30:13.711+0900     INFO   pool-1-thread-1 de.softwareforge.kafka.LoadCommand      Generated 1500 rows for table 'customer'.
    2022-02-07T10:30:14.168+0900     INFO   pool-1-thread-4 de.softwareforge.kafka.LoadCommand      Generated 2000 rows for table 'part'.
    2022-02-07T10:30:14.895+0900     INFO   pool-1-thread-5 de.softwareforge.kafka.LoadCommand      Generated 8000 rows for table 'partsupp'.
    2022-02-07T10:30:15.078+0900     INFO   pool-1-thread-2 de.softwareforge.kafka.LoadCommand      Generated 15000 rows for table 'orders'.
    2022-02-07T10:30:16.335+0900     INFO   pool-1-thread-3 de.softwareforge.kafka.LoadCommand      Generated 60175 rows for table 'lineitem'.
    

    Presto에 connector 추가

    Ambari UI에서 Presto > [CONFIGS] > Advanced connectors.properties에 connectors.to.add 값을 아래와 같이 추가한 후, [SAVE] 버튼을 클릭해 주십시오.

    {"kafka":["connector.name=kafka",         
    "kafka.nodes=172.16.2.6:9092",         
    "kafka.table-names=tpch.customer,tpch.orders,tpch.lineitem,tpch.part,tpch.partsupp,tpch.supplier,tpch.nation,tpch.region",         
    "kafka.hide-internal-columns=false"]         }
    

    hadoop-vpc-use-ex15_connect1_vpc_ko

    변경된 구성을 적용하기 위해 재시작이 필요합니다. 우측 상단의 [ACTIONS] > Restart All을 클릭한 후, 팝업창의 [CONFIRM RESTART ALL] 버튼을 클릭해 주십시오.

    Presto에서 테이블 조회

    Cloud Hadoop 엣지 노드로 접속하여 Presto를 실행해 주십시오.

    • catalog는 kafka, schema는 tpch로 설정해 주십시오.
    [sshuser@e-001-example-pzt-hd ~]$ /usr/lib/presto/bin/presto-cli --server http://pub-210ab.vpc-hadoop.gov-ntruss.com:8285 --catalog kafka --schema tpch
    presto:tpch> SHOW TABLES;
      Table
    ----------
     customer
     lineitem
     nation
     orders
     part
     partsupp
     region
     supplier
    (8 rows)
    
    Query 20220128_064417_00003_96n53, FINISHED, 3 nodes
    Splits: 36 total, 36 done (100.00%)
    0:00 [8 rows, 166B] [57 rows/s, 1.16KB/s]
    

    간단한 쿼리를 통해서 내용을 확인합니다.

    presto:tpch> DESCRIBE customer;
          Column       |  Type   | Extra |                   Comment
    -------------------+---------+-------+------------------------------------------
     _partition_id     | bigint  |       | Partition Id
     _partition_offset | bigint  |       | Offset for the message within the partiti
     _message_corrupt  | boolean |       | Message data is corrupt
     _message          | varchar |       | Message text
     _message_length   | bigint  |       | Total number of message bytes
     _key_corrupt      | boolean |       | Key data is corrupt
     _key              | varchar |       | Key text
     _key_length       | bigint  |       | Total number of key bytes
     _timestamp        | bigint  |       | Offset Timestamp
    (9 rows)
    
    presto:tpch> SELECT _message FROM customer LIMIT 5;
    
    --------------------------------------------------------------------------------
     {"rowNumber":1,"customerKey":1,"name":"Customer#000000001","address":"IVhzIApeR
     {"rowNumber":4,"customerKey":4,"name":"Customer#000000004","address":"XxVSJsLAG
     {"rowNumber":7,"customerKey":7,"name":"Customer#000000007","address":"TcGe5gaZN
     {"rowNumber":10,"customerKey":10,"name":"Customer#000000010","address":"6LrEaV6
     {"rowNumber":13,"customerKey":13,"name":"Customer#000000013","address":"nsXQu0o
    (5 rows)
    
    
    참고

    Presto와 Kafka 활용에 대한 더 자세한 내용은 Kafka Connector 튜토리얼을 참고해 주십시오.


    이 문서가 도움이 되었습니까?

    Changing your password will log you out immediately. Use the new password to log back in.
    First name must have atleast 2 characters. Numbers and special characters are not allowed.
    Last name must have atleast 1 characters. Numbers and special characters are not allowed.
    Enter a valid email
    Enter a valid password
    Your profile has been successfully updated.