sqoop是apache旗下一款“Hadoop和关系数据库服务器之间传送数据”的工具。
导入数据:MySQL,Oracle导入数据到Hadoop的HDFS、HIVE、HBASE等数据存储系统;
导出数据:从Hadoop的文件系统中导出数据到关系数据库
将导入或导出命令翻译成mapreduce程序来实现,在翻译出的mapreduce中主要是对inputformat和outputformat进行定制.
1 sqoop安装
安装sqoop的前提是已经具备java和hadoop的环境
2 修改配置文件
$ cd $SQOOP_HOME/conf
$ mv sqoop-env-template.sh sqoop-env.sh
打开sqoop-env.sh并编辑下面几行:
export HADOOP_COMMON_HOME=/home/hadoop/hadoop-2.7.3/
export HADOOP_MAPRED_HOME=/home/hadoop/hadoop-2.7.3/
export HIVE_HOME=/home/work/hive/
这里没用HBASE,用的时候再配。
3 将jdbc数据包mysql-connector-java-5.1.44.jar 拷贝到 $SQOOP_HOME/lib下
4 检查安装时否功
$ cd $SQOOP_HOME/bin
$ sqoop-version
预期的输出:
15/12/17 14:52:32 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Sqoop 1.4.6 git commit id 5b34accaca7de251fc91161733f906af2eddbe83
Compiled by abe on Fri Aug 1 11:19:26 PDT 2015
到这里,整个Sqoop安装工作完成。
5 但是,有时候虽然安装成功了,但在使用中会出现各种Bug
比如:
MySQL数据库服务器中的xing/ emp表导入HDFS。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97[root@shizhan1 bin]# ./sqoop import \
> --connect jdbc:mysql://localhost:3306/xing \
> --username root \
> --password xing \
> --table emp \
> --m 1
Warning: /home/work/sqoop/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/work/sqoop/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/work/sqoop/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/08/13 10:42:47 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
18/08/13 10:42:47 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/08/13 10:42:48 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/08/13 10:42:48 INFO tool.CodeGenTool: Beginning code generation
18/08/13 10:42:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
18/08/13 10:42:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
18/08/13 10:42:48 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop-2.7.3
注: /tmp/sqoop-root/compile/0cbad6aa7f1605ddcd735b7cd9f9e47a/emp.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
18/08/13 10:42:49 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/0cbad6aa7f1605ddcd735b7cd9f9e47a/emp.jar
18/08/13 10:42:49 WARN manager.MySQLManager: It looks like you are importing from mysql.
18/08/13 10:42:49 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
18/08/13 10:42:49 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
18/08/13 10:42:49 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
18/08/13 10:42:49 INFO mapreduce.ImportJobBase: Beginning import of emp
18/08/13 10:42:49 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/08/13 10:42:50 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/08/13 10:42:50 INFO client.RMProxy: Connecting to ResourceManager at /192.168.14.151:8032
18/08/13 10:42:53 INFO db.DBInputFormat: Using read commited transaction isolation
18/08/13 10:42:54 INFO mapreduce.JobSubmitter: number of splits:1
18/08/13 10:42:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1532423449295_0031
18/08/13 10:42:54 INFO impl.YarnClientImpl: Submitted application application_1532423449295_0031
18/08/13 10:42:54 INFO mapreduce.Job: The url to track the job: http://shizhan1:8088/proxy/application_1532423449295_0031/
18/08/13 10:42:54 INFO mapreduce.Job: Running job: job_1532423449295_0031
18/08/13 10:43:02 INFO mapreduce.Job: Job job_1532423449295_0031 running in uber mode : false
18/08/13 10:43:02 INFO mapreduce.Job: map 0% reduce 0%
18/08/13 10:43:09 INFO mapreduce.Job: Task Id : attempt_1532423449295_0031_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.RuntimeException: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:167)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:749)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.RuntimeException: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at org.apache.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:220)
at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:165)
... 9 more
Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:989)
at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:341)
at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2189)
at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2222)
at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2017)
at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:779)
at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:47)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:389)
at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:330)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:215)
at org.apache.sqoop.mapreduce.db.DBConfiguration.getConnection(DBConfiguration.java:302)
at org.apache.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:213)
... 10 more
Caused by: java.net.ConnectException: 拒绝连接
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at com.mysql.jdbc.StandardSocketFactory.connect(StandardSocketFactory.java:211)
at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:300)
... 26 more
可以看到,报错1
Caused by: java.lang.RuntimeException: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
应该是主要和数据库相关,后面还有一个拒绝连接的异常。
在网上找了些资料,有说把--connect jdbc:mysql://localhost:3306/xing
这里的主机名改成IP,改了无果。
仔细想了下,这里连不上数据库,可能是数据库配置问题,是不是Mysql不允许外部连接导致的,于是进入Mysql1
2use mysql;
select host,user,password from user;
看到里面乱七八糟,host为shizhan1的user是空,还有一个host为shizhan1的user是root,但是密码为空,于是把第一个记录直接delete,第二个记录的password改成root的密码。
GRANT ALL PRIVILEGES ON . TO root@”shizhan1” IDENTIFIED BY “ password “; //为root添加远程连接的能力,(mysql安装shizhan1上)
最后flush一下
flush privileges;
重新跑一下我们的sqoop,但是会报错,提示一个exist。因为虽然之前没有跑通,但是在hdfs对应目录上生成了一个文件夹,就是这个文件夹exist了。
删了就好,hadoop fs -rm -r /user/root/emp
可以重新跑sqoop了
可以看到success,搞定!