Jump to content
-=-=-=- 0 / 3 - Software and Hardware maintenance -=-=-=- ×

Finding and Deleting Duplicate Files Using Python


Recommended Posts

This Python program is designed to locate and remove duplicate files among files with identical names within a specified root directory and its subdirectories.

It is particularly useful for folders containing a large number of files, where duplicates may unnecessarily occupy storage space.

 

How It Works:

Collecting Files: The program first traverses the specified root directory and its subdirectories. It filters specific subdirectories where potentially duplicated files are expected.

Identifying Duplicates: For each filename, it creates a list containing all instances, including their full paths and modification dates.

Keeping the Latest Version: It retains only the most recent version of files with the same name. To achieve this, the program sorts the list of files by modification date and keeps only the first (most recent) item.

Deleting Redundant Files: It deletes all other instances of files with identical names from the respective subdirectories. During deletion, the program handles errors and notifies if any issues occur while deleting files.

Creating a Deletion Log: The program logs the paths of deleted files into a log file. Additionally, it logs the total size of deleted files in megabytes.

Feedback: Upon completion, the program informs the user where to find the log files containing lists of deleted files and the amount of disk space freed up.



 

Quote
import os
import shutil

def find_duplicate_files(root_folder):
    files_list = []

    for dirpath, dirnames, filenames in os.walk(root_folder):
        if os.path.basename(dirpath) in ['folder1','folder2','folder3','folder4','folder5']:
            ymir_work_path = os.path.join(dirpath, 'ymir work')
            if os.path.exists(ymir_work_path):
                for root, dirs, files in os.walk(ymir_work_path):
                    for filename in files:
                        full_path = os.path.join(root, filename)
                        files_list.append((full_path, filename, os.path.getmtime(full_path)))

    files_dict = {}
    for full_path, filename, mtime in files_list:
        if filename in files_dict:
            files_dict[filename].append((full_path, mtime))
        else:
            files_dict[filename] = [(full_path, mtime)]

    deleted_files = []
    deleted_bytes = 0 

    for filename, locations in files_dict.items():
        if len(locations) > 1:
            locations.sort(key=lambda x: x[1], reverse=True)
            for full_path, mtime in locations[1:]:
                try:
                    file_size = os.path.getsize(full_path)
                    os.remove(full_path)
                    deleted_files.append(full_path)
                    deleted_bytes += file_size
                except Exception as e:
                    print(f'Hiba a fájl törlése közben: {e}')

    log_file = os.path.join(root_folder, 'deleted_files.txt')
    with open(log_file, 'w') as f:
        f.write("Törölt fájlok:\n")
        for deleted_file in deleted_files:
            f.write(deleted_file + "\n")
        f.write(f"\nÖsszesen törölve: {deleted_bytes / (1024 * 1024):.2f} MB")

    print(f'A duplikátumok ellenőrzése és törlése befejeződött. A törölt fájlok listája megtalálható itt: {log_file}')

root_folder = 'C:/Users/ADMIN/Desktop/PythonThings/YourFolder'
find_duplicate_files(root_folder)


 

 

Link to comment
Share on other sites

  • Replies 1
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

The idea is good, but I would definitely not delete the files automatically. The best approach would be to move them to a new folder along with their entire structure. Why? Because unfortunately, Ymir used some of these files for other things (typically effects), even if they are duplicates, and this could cause problems. Until you open the file, you can't trace all the paths. It's a lengthy process and definitely not as simple as finding and deleting them.

Link to comment
Share on other sites



×
×
  • Create New...

Important Information

Terms of Use / Privacy Policy / Guidelines / We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.