I usually executed md5deep64.exe with “-d” parameter to create the result as XML format includes both file full path and MD5 value.
C:> md5deep64.exe -r -d * > C:\%COMPUTERNAME%_%DATE%.xml
The XML file that is the result of the command above shows like this as below.
<?xml version='1.0' encoding='UTF-8'?> <dfxml xmloutputversion='1.0'> <metadata xmlns='http://md5deep.sourceforge.net/md5deep/' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns:dc='http://purl.org/dc/elements/1.1/'> <dc:type>Hash List</dc:type> </metadata> <creator version='1.0'> <program>MD5DEEP</program> <version>4.3</version> <build_environment> <compiler>GCC 4.7</compiler> </build_environment> <execution_environment> <command_line>c:\temp\md5deep64.exe -r -d *</command_line> <start_time></start_time> </execution_environment> </creator> <configuration> <algorithms> <algorithm name='md5' enabled='1'/> <algorithm name='sha1' enabled='0'/> <algorithm name='sha256' enabled='0'/> <algorithm name='tiger' enabled='0'/> <algorithm name='whirlpool' enabled='0'/> </algorithms> </configuration> <fileobject> <filename>C:\bootmgr</filename> <filesize>398356</filesize> <ctime></ctime> <mtime></mtime> <atime></atime> <hashdigest type='MD5'>55272fe96ad87017755fd82f7928fda0</hashdigest> </fileobject> <fileobject> <filename>C:\BOOTNXT</filename> <filesize>1</filesize> <ctime></ctime> <mtime></mtime> <atime></atime> <hashdigest type='MD5'>93b885adfe0da089cdf634904fd59f71</hashdigest> </fileobject> </dfxml>
To extract md5 and filepath from the XML, we can use minidom python library.
from xml.dom import minidom
xmldoc = minidom.parse(fn)
files = xmldoc.getElementsByTagName('fileobject')
for fileobject in files:
fn = fileobject.getElementsByTagName('filename')[0]
md5 = fileobject.getElementsByTagName('hashdigest')[0]
print fn.firstChild.data +", "+ md5.firstChild.data
Once we execute the python code to parsing a huge XML, however, we can easily meet Memory Error. To avoid this kind of error, I used BeautifulSoup.
from bs4 import BeautifulSoup
fp = open(fn, 'r')
soup = BeautifulSoup(fp, 'xml')
for node in soup.findAll('fileobject'):
try:
print "%s, %s"%(node.hashdigest.string,node.filename.string)
except UnicodeEncodeError as e:
continue
The whole code is uploaded at my GitHub.
https://github.com/hojinpk/CodeSnippets/blob/master/extracting_md5_from_XML.py