On Hive 0.11, and lower versions, if we set:
set hive.optimize.skewjoin=true; set hive.auto.convert.join=false;
A query with JOIN, GROUP BY and SUM does not return results.
But if we make the query a little more simple, using JOIN but not GROUP and SUM functions, We will GET RESULTS.
I have found that there are bugs reported recently:
https://issues.apache.org/jira/browse/HIVE-5888
and
https://issues.apache.org/jira/browse/HIVE-6041
This bug is related also with the previous one (https://issues.apache.org/jira/browse/HIVE-6520), already reported:
If we set:
set hive.optimize.skewjoin=true; set hive.auto.convert.join=true;
We will no have any output.
The reason is that the skew join in hive relies on a reduce phase to save skewed keys on local disk, but hive.auto.convert.join=true turns a mapreduce task into a mapjoin task in some scenarios.
As a result, there is no skewed keys generated by the mapjoin and the result is empty.
If you set hive.auto.convert.join=false to disable the auto conversion of a mapjoin, the performance is very bad because the reduce phase takes a very long time to process the skew keys.
This is expected to be resolved on version hive-0.13.0.