Content is empty
If you don't find the content you expect, please try another search term
Last updated:2020-05-08 23:09:45
This chapter describes common data import and export methods of KMR.
KMR cluster can access to standard storage service (KS3) directly. Before you use KMR, we recommend that you open KS3 service to consolidate the computing programs and raw data into KS3 for easy management and persistent storage.
(1) Go to KS3 console to create the storage space at http://ks3.ksyun.com/console.html#/
(2) Select “Region” (you need to choose the same region as KMR service, because KMR cannot access KS3 across regions), and enter the space name. If public read/write is not required, select “Private” for the access control.
(3) Enter the space and select "Content Management" to create a directory, or upload files directly through the browser. For files over 500M, you can upload them to KS3 SDK with SDK or other tools: https://github.com/ks3sdk
KS3 upload tool is available at: http://www.ksyun.com/doc/art/id/432
Usually, the raw data that KMR needs to process is stored directly on KS3, so various computing jobs can be executed. In order to obtain better data processing performance and take full advantage of the Hadoop data localization, we can copy the data from KS3 to HDFS file system of KMR cluster.
DistCp (Distributed Copy) is a tool for copying the data within and between large-scale clusters. It uses Map/Reduce for the file distribution, error handling and recovery, as well as the report generation. Kingsoft Cloud KMR uses special technology, and you can use DistCp tool to copy the data directly between HDFS and KS3.
Steps:
Example:
Upload from HDFS to KS3
```hadoop distcp /user/hadoop/conf/hive-site.xml ks3://testbarcket/kmr/
Copy from KS3 to HDFS<br/>
```hadoop distcp ks3://testbarcket/kmr/hive-site.xml /user/hadoop/conf/
For more use cases of Discp, please refer to DisCp Guide
Pure Mode