How to find completely identical rows in Python?

10 months ago

Jackson Davis

1 minute

You can use the Counter from the collections module in Python to find completely duplicated lines. First, you need to open the file and read its contents line by line. Then, add each line to a list and use Counter to calculate the frequency of each line. Finally, identify the lines with a count greater than 1, as these are the completely duplicated lines.

Here is an example code:

from collections import Counter

def find_duplicate_lines(file_path):
    lines = []
    
    with open(file_path, 'r') as file:
        for line in file:
            lines.append(line.strip())
    
    duplicates = [line for line, count in Counter(lines).items() if count > 1]
    
    return duplicates

file_path = 'file.txt'  # 替换为你的文件路径
duplicate_lines = find_duplicate_lines(file_path)

if duplicate_lines:
    print('完全重复的行：')
    for line in duplicate_lines:
        print(line)
else:
    print('没有找到完全重复的行。')

Please note that the above code assumes that each line is a text line (string) and ignores whitespace characters. If you need to consider other types of lines or need to distinguish whitespace characters when comparing, you can modify the code accordingly.