All Documents
Current Document

Content is empty

If you don't find the content you expect, please try another search term

Documentation

Data import guide

Last updated:2020-05-08 23:09:45

This chapter describes common data import and export methods of KMR.

Data import to KS3

KMR cluster can access to standard storage service (KS3) directly. Before you use KMR, we recommend that you open KS3 service to consolidate the computing programs and raw data into KS3 for easy management and persistent storage.

(1) Go to KS3 console to create the storage space at http://ks3.ksyun.com/console.html#/
(2) Select “Region” (you need to choose the same region as KMR service, because KMR cannot access KS3 across regions), and enter the space name. If public read/write is not required, select “Private” for the access control.
(3) Enter the space and select "Content Management" to create a directory, or upload files directly through the browser. For files over 500M, you can upload them to KS3 SDK with SDK or other tools: https://github.com/ks3sdk KS3 upload tool is available at: http://www.ksyun.com/doc/art/id/432

Data import to HDFS

Usually, the raw data that KMR needs to process is stored directly on KS3, so various computing jobs can be executed. In order to obtain better data processing performance and take full advantage of the Hadoop data localization, we can copy the data from KS3 to HDFS file system of KMR cluster.
DistCp (Distributed Copy) is a tool for copying the data within and between large-scale clusters. It uses Map/Reduce for the file distribution, error handling and recovery, as well as the report generation. Kingsoft Cloud KMR uses special technology, and you can use DistCp tool to copy the data directly between HDFS and KS3. Steps:

  1. Connect to the master data through SSH. Please refer to SSH Connection Guide
  2. Enter the command: su hadoop switch to hadoop user
  3. Execute the command in the following format: hadoop distcp

Example:
Upload from HDFS to KS3
```hadoop distcp /user/hadoop/conf/hive-site.xml ks3://testbarcket/kmr/

Copy from KS3 to HDFS<br/>
```hadoop distcp ks3://testbarcket/kmr/hive-site.xml /user/hadoop/conf/

For more use cases of Discp, please refer to DisCp Guide

On this page
Pure ModeNormal Mode

Pure Mode

Click to preview the document content in full screen
Feedback