How to perform data sorting in Pig?

Sorting data in Pig can be achieved by using the ORDER BY statement. Below is a simple sorting example:

Suppose we have a dataset with names and ages, and we want to sort the data in ascending order based on age. We can achieve this using the following Pig Latin script:

-- 加载数据
data = LOAD 'input.txt' USING PigStorage(',') AS (name:chararray, age:int);

-- 排序数据
sorted_data = ORDER data BY age ASC;

-- 输出结果
STORE sorted_data INTO 'output' USING PigStorage(',');

In this example, we begin by loading a dataset that includes names and ages, then sort the data in ascending order by age using the ORDER BY statement. Finally, we store the sorted data in an output file.

It should be noted that the ORDER BY clause can only be applied for sorting a single column. If multiple column sorting is needed, multiple ORDER BY clauses can be used or the multi-column version of the ORDER BY clause can be utilized.

Leave a Reply 0

Your email address will not be published. Required fields are marked *


广告
Closing in 10 seconds
bannerAds