How to perform data compression operation in Hive?

12 months ago

Liam

2 minutes

In Hive, data compression operations can be performed by setting properties on tables. The following are the general steps for executing data compression in Hive.

Saved as
properties of a table

CREATE TABLE my_table (
  col1 INT,
  col2 STRING
)
STORED AS ORC
TBLPROPERTIES ("orc.compress"="ZLIB");

In the example above, we created a table called my_table, specified to store data in ORC format, and compressed the data using the ZLIB algorithm.

I only need one option, which is SET.
compression of output in hive execution

SET hive.exec.compress.output=true;

Then, when executing the query, you can specify the compression format by setting the mapred.output.compress parameter, for example:

SET mapred.output.compress=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;

By following the steps above, data compression can be performed in Hive. Compression can reduce storage space and improve query performance, especially when dealing with large amounts of data.