How to perform data compression operation in Hive?

In Hive, data compression operations can be performed by setting properties on tables. The following are the general steps for executing data compression in Hive.

  1. Saved as
  2. properties of a table
CREATE TABLE my_table (
  col1 INT,
  col2 STRING
)
STORED AS ORC
TBLPROPERTIES ("orc.compress"="ZLIB");

In the example above, we created a table called my_table, specified to store data in ORC format, and compressed the data using the ZLIB algorithm.

  1. I only need one option, which is SET.
  2. compression of output in hive execution
SET hive.exec.compress.output=true;

Then, when executing the query, you can specify the compression format by setting the mapred.output.compress parameter, for example:

SET mapred.output.compress=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;

By following the steps above, data compression can be performed in Hive. Compression can reduce storage space and improve query performance, especially when dealing with large amounts of data.

 

More tutorials

How does data compression work in Cassandra?(Opens in a new browser tab)

How to set up Flume for data compression and encryption?(Opens in a new browser tab)

How to perform a JOIN operation in Hive?(Opens in a new browser tab)

How does user permission management work in Hive?(Opens in a new browser tab)

What does SerDe in Hive mean?(Opens in a new browser tab)

Leave a Reply 0

Your email address will not be published. Required fields are marked *