Data import guide

Last updated:2020-05-08 23:09:45

This chapter describes common data import and export methods of KMR.

Data import to KS3

KMR cluster can access to standard storage service (KS3) directly. Before you use KMR, we recommend that you open KS3 service to consolidate the computing programs and raw data into KS3 for easy management and persistent storage.

(1) Go to KS3 console to create the storage space at http://ks3.ksyun.com/console.html#/

(2) Select “Region” (you need to choose the same region as KMR service, because KMR cannot access KS3 across regions), and enter the space name. If public read/write is not required, select “Private” for the access control.

(3) Enter the space and select “Content Management” to create a directory, or upload files directly through the browser. For files over 500M, you can upload them to KS3 SDK with SDK or other tools: https://github.com/ks3sdk
KS3 upload tool is available at: http://www.ksyun.com/doc/art/id/432

Data import to HDFS

Usually, the raw data that KMR needs to process is stored directly on KS3, so various computing jobs can be executed. In order to obtain better data processing performance and take full advantage of the Hadoop data localization, we can copy the data from KS3 to HDFS file system of KMR cluster.

DistCp (Distributed Copy) is a tool for copying the data within and between large-scale clusters. It uses Map/Reduce for the file distribution, error handling and recovery, as well as the report generation. Kingsoft Cloud KMR uses special technology, and you can use DistCp tool to copy the data directly between HDFS and KS3.
Steps:

  1. Connect to the master data through SSH. Please refer to SSH Connection Guide
  2. Enter the command: su hadoop switch to hadoop user
  3. Execute the command in the following format: hadoop distcp

Example:

Upload from HDFS to KS3

Copy from KS3 to HDFS

For more use cases of Discp, please refer to DisCp Guide

Did you find the above information helpful?

Unhelpful
Mostly Unhelpful
A little helpful
Helpful
Very helpful

What might be the problems?

Insufficient
Outdated
Unclear or awkward
Redundant or clumsy
Lack of context for the complex system or functionality

More suggestions

0/200

Please give us your feedback.

Submitted

Thank you for your feedback.

问题反馈