====== cksgen – Generate and compare MD5 checksum lists with Python ====== This little command line program written in Python is an advanced port of my [[code:powershell:checksum|PowerShell script]] for ensuring the data integrity of my photo archive. It identifies corrupted ''.jpg'', ''.jpeg'', ''.dng'' and ''.cr2'' files by generating lists of determined MD5 checksums and comparing them to previous checks. Of course, cksgen can also be used for any other types of files. This program is primarily intended for archived data stocks that no longer change. Of course, you can also use it for data that is being edited from time to time. In this case, one must not forget that the MD5 checksums already change if, for example, the metadata of a JPG file is edited. So the message ''ATTENTION: Different MD5 checksums found'' on the command prompt does not necessarily indicate a corrupted file. cksgen is Free Software licensed under the [[https://www.gnu.org/licenses/gpl-3.0.html|GNU General Public License (GPL), Verion 3]]. ===== Features ===== * Determines the MD5 checksums of files and writes them to the file ''yyyyMMdd_HHmmss_checksum.txt''. * Compares the checksums of the last two checksum files and outputs a corresponding message in the command prompt and in the log file ''yyyyMMdd_HHmmss_log.txt''. * Allowed file extensions can be specified. * The number of checksum files and log files to be kept can be specified. * The form of the time stamp in the file name of the log files and checksum files can be adjusted. * The folder names for the log files and checksum files can also be specified. * The command prompt will let you know which file is being processed. ===== Usage ===== General Usage: ''cksgen [-h] [-c CONFIG] [-e]'' If you haven't compiled the cksgen.py file into an executable file (e.g. **cksgen.exe** on a Windows system; you can do so with ''pyinstaller %%--%%onefile cksgen.py'') and haven't registered it in the system environment variables, then: ''python /path/to/cksgen.py [-h] [-c CONFIG] [-e]'' Help: ''cksgen -h'' See example configuration file: ''cksgen -e'' Example usage with a configuration file **photos.conf**: ''cksgen -c photos'' ===== Source Code ===== Here is the source code of cksgen. You can copy or download it by clicking on the ''cksgen.py'' link in the upper left corner of the code block. """ cksgen -- Generate and compare MD5 checksum lists with Python ============================================================= Author: Helmut Kaczmarek "] for line1 in checksums1: checksum1, file1 = line1.split('\t') for line2 in checksums2: checksum2, file2 = line2.split('\t') if file1 == file2 and checksum1 != checksum2: different_files.append(file1) break return different_files if __name__ == "__main__": parser = argparse.ArgumentParser(description="Generate checksums for files.") parser.add_argument("-conf", "--config", type=str, help="Configuration file name without extension") args = parser.parse_args() if args.config: config_filename = args.config + ".conf" config = load_config(config_filename) conf_name = config.get('conf_name') allowed_extensions = config.get('allowed_extensions').split(',') data_directory = config.get('data_directory') files_to_keep = int(config.get('files_to_keep')) script_directory = os.path.dirname(os.path.abspath(__file__)) lists_directory = os.path.join(script_directory, conf_name, 'Lists') logs_directory = os.path.join(script_directory, conf_name, 'Logs') os.makedirs(lists_directory, exist_ok=True) os.makedirs(logs_directory, exist_ok=True) current_timestamp = datetime.datetime.now().strftime('%Y%m%d_%H%M%S') current_datetime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S') checksum_filename = os.path.join(lists_directory, f'{current_timestamp}_checksum.txt') log_filename = os.path.join(logs_directory, f'{current_timestamp}_log.txt') def scan_directory(directory): with open(checksum_filename, 'a') as f: f.write(f"MD5 checksums on: {current_datetime}\n") for root, _, files in os.walk(directory): for file in files: file_path = os.path.join(root, file) extension = file_path.split('.')[-1].lower() if extension in allowed_extensions: md5_checksum = hashlib.md5(open(file_path, 'rb').read()).hexdigest() with open(checksum_filename, 'a') as f: f.write(f'{md5_checksum}\t{file_path}\n') print(f'Processing {file_path}') scan_directory(data_directory) log_entry = f'MD5 checksums have been created and stored in {checksum_filename}.\n' with open(log_filename, 'a') as f: f.write(log_entry) delete_old_files(lists_directory, files_to_keep) delete_old_files(logs_directory, files_to_keep) last_checksum_files = glob.glob(os.path.join(lists_directory, '*_checksum.txt')) if len(last_checksum_files) >= 2: last_checksum_files.sort(reverse=True) last_checksum_file1 = last_checksum_files[0] last_checksum_file2 = last_checksum_files[1] different_files = compare_checksum_files(last_checksum_file1, last_checksum_file2) log_message = '' if different_files: print('ATTENTION: Different MD5 checksums found! See log file in', log_filename) log_message += 'ATTENTION: The following files have different checksums:\n' for file in different_files: log_message += file + '\n' else: log_message += 'INFO: No different MD5 checksums found.\n' print('INFO: No different MD5 checksums found.') with open(log_filename, 'a') as f: f.write(log_message) elif len(last_checksum_files) == 1: with open(log_filename, 'a') as f: f.write("INFO: Checksums could not be compared because there is currently only one checksum file.\n") print("INFO: Checksums could not be compared because there is currently only one checksum file.") print(log_entry) ===== Configuration file ===== Here is an example configuration file. You can copy or download it by clicking on the ''example.conf'' link in the upper left corner of the code block. Different configuration files can be used for different projects. # example.conf # Save the configuration files in the same # folder where the program is located. [USER_SETTINGS] # The configuration's name. # A subfolder with this name will be created. conf_name = Example # File types that cksgen will look for. # Multiple file types are separated by commas without spaces. allowed_extensions = jpg,jpeg,dng,md # The actual directory where cksgen looks for files. data_directory = C:\Users\Username\Path\To\Example # The directory where lists of MD5 checksums are placed. lists_directory = conf_name\Lists # The directory in which the log files are stored. logs_directory = conf_name\Logs # Specifies how many checksum files and log files cksgen should keep. files_to_keep = 10 ===== Downloads ===== * {{ :code:python:cksgen.zip |}}: Source code (contains the two files shown above). To run the program, Python needs to be installed on your computer. * {{ :code:python:cksgen_windows_bin.zip |}}: Contains an executable binary ''cksgen.exe'' for Microsoft Windows. Python does not need to be installed on the system (but you can also compile the source code yourself with ''pyinstaller %%--%%onefile cksgen.py'').