hive笔记1

#hive笔记1


##select
语法结构

1
2
3
4
5
6
7
8
SELECT [ALL | DISTINCT] select_expr, select_expr, ... 
FROM table_reference
[WHERE where_condition]
[GROUP BY col_list [HAVING condition]]
[CLUSTER BY col_list
| [DISTRIBUTE BY col_list] [SORT BY| ORDER BY col_list]
]
[LIMIT number]

其中,CLUSTER BY col_list就等同于DISTRIBUTE BY col_list+SORT BY
若使用distribute by,分发数量由reduce的数量决定。
外部表和内部表的区别:
外部表在drop时,只删表结构和表,不删原数据,即集群数据还在。

sort byorder by的区别:
sort by 组内有序
order by 全局排序,只有一个reduce