Determine and compare MD5 checksums with PHP
This script is a PHP port of my PowerShell script for ensuring data integrity. It identifies corrupted .jpg
, .jpeg
, .dng
and .cr2
files by determining their MD5 checksums and comparing them to previous checks. Of course, this script can also be used for other file types. If PHP is installed on the local computer or web server, it can be run with php checksum.php
directly from the command prompt (cmd.exe
, PowerShell
, Bash
etc.).
This script is primarily intended for archived data stocks that no longer change. Of course, you can also use it for data that is being edited from time to time. In this case, one must not forget that the MD5 checksums already change if, for example, the metadata of a JPG file is edited. So the message ATTENTION: Different MD5 checksums found
on the command prompt does not necessarily indicate a corrupted file.
Features
- Determines the MD5 checksums of files and writes them to the file
yyyyMMdd_HHmmss_checksum.txt
. - Compares the checksums of the last two checksum files and outputs a corresponding message in the command prompt and in the log file
yyyyMMdd_HHmmss_log.txt
. - Allowed file extensions can be specified.
- The number of checksum files and log files to be kept can be specified.
- The form of the time stamp in the file name of the log files and checksum files can be adjusted.
- The folder names for the log files and checksum files can also be specified.
- The command prompt will let you know which file is being processed.
Source Code
- checksum.php
<?php /** * Determine and compare MD5 checksums with PHP * * @author Helmut Kaczmarek <email@helmutkaczmarek.de> * @link https://wiki.helmutkaczmarek.de/code:php:checksum */ // Allowed file extensions $allowedExtensions = array('jpg', 'jpeg', 'dng', 'cr2'); // Directory containing the files to be examined $dataDirectory = 'D:\Folder'; // Directory containing this script $scriptDirectory = 'D:\Folder\Checksum'; // Folder name for checksum files (will be subfolder of $scriptDirectory) $listsDirectory = $scriptDirectory . '\Lists'; // Folder name for log files (will be subfolder of $scriptDirectory) $logsDirectory = $scriptDirectory . '\Logs'; // Number of checksum files and log files to be kept $filesToKeep = 10; // Timezone (otherwise $currentTimestamp will use UTC) date_default_timezone_set('Europe/Berlin'); // Generate the current timestamp in format YYYYMMDD_HHMMSS $currentTimestamp = date('Ymd_His'); // Check if the subfolders exist, otherwise create them if (!is_dir($listsDirectory)) { mkdir($listsDirectory); } if (!is_dir($logsDirectory)) { mkdir($logsDirectory); } // File name for the checksum file and log file including time stamp $checksumFilename = $listsDirectory . '\\' . $currentTimestamp . '_checksum.txt'; $logFilename = $logsDirectory . '\\' . $currentTimestamp . '_log.txt'; // Function to search directories recursively function scanDirectory($directory) { global $allowedExtensions, $checksumFilename; $files = scandir($directory); foreach ($files as $file) { if ($file !== '.' && $file !== '..') { $filePath = $directory . '\\' . $file; if (is_dir($filePath)) { scanDirectory($filePath); } else { $extension = strtolower(pathinfo($file, PATHINFO_EXTENSION)); if (in_array($extension, $allowedExtensions)) { $md5Checksum = md5_file($filePath); // Write the MD5 checksum to the checksum file file_put_contents($checksumFilename, "$md5Checksum\t$filePath\n", FILE_APPEND); // Display the current filename on the command prompt echo "Processing $filePath\n"; } } } } } // Search the directory recursively for files scanDirectory($dataDirectory); // Write log file $logEntry = "MD5 checksums have been created and stored in $checksumFilename.\n"; file_put_contents($logFilename, $logEntry, FILE_APPEND); // Function to delete old files function deleteOldFiles($directory) { global $filesToKeep; $allFiles = glob($directory . '\*.*'); // Use array_multisort() to sort by modification time in descending order array_multisort(array_map('filemtime', $allFiles), SORT_DESC, $allFiles); // Only keep the last $filesToKeep $filesToKeepArray = array_slice($allFiles, 0, $filesToKeep); foreach ($allFiles as $file) { if (!in_array($file, $filesToKeepArray)) { unlink($file); } } } // Delete old checksum files and log files deleteOldFiles($listsDirectory); deleteOldFiles($logsDirectory); // Checksum file comparison function function compareChecksumFiles($file1, $file2) { $checksums1 = file($file1, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES); $checksums2 = file($file2, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES); $differentFiles = array(); foreach ($checksums1 as $line1) { list($checksum1, $file1) = explode("\t", $line1); foreach ($checksums2 as $line2) { list($checksum2, $file2) = explode("\t", $line2); if ($file1 === $file2 && $checksum1 !== $checksum2) { $differentFiles[] = $file1; break; } } } return $differentFiles; } // Paths to the last two checksum files $lastChecksumFiles = glob($listsDirectory . '\*_checksum.txt', GLOB_NOSORT); if (count($lastChecksumFiles) >= 2) { $lastChecksumFile1 = $lastChecksumFiles[count($lastChecksumFiles) - 1]; $lastChecksumFile2 = $lastChecksumFiles[count($lastChecksumFiles) - 2]; $differentFiles = compareChecksumFiles($lastChecksumFile1, $lastChecksumFile2); // Output and log file $logMessage = ''; if (count($differentFiles) > 0) { echo "ATTENTION: Different MD5 checksums found! See log file in $logFilename.\n"; $logMessage .= "ATTENTION: The following files have different checksums:\n"; foreach ($differentFiles as $file) { $logMessage .= "$file\n"; } } else { $logMessage .= "INFO: No different MD5 checksums found.\n"; echo "INFO: No different MD5 checksums found.\n"; } file_put_contents($logFilename, $logMessage, FILE_APPEND); } echo $logEntry; ?>