Extracting Data from XML by Python

Posted on May 31, 2016 by Hojin

I usually executed md5deep64.exe with “-d” parameter to create the result as XML format includes both file full path and MD5 value.

C:> md5deep64.exe -r -d * > C:\%COMPUTERNAME%_%DATE%.xml

The XML file that is the result of the command above shows like this as below.

<?xml version='1.0' encoding='UTF-8'?>
<dfxml xmloutputversion='1.0'>
<metadata
xmlns='http://md5deep.sourceforge.net/md5deep/'
xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
xmlns:dc='http://purl.org/dc/elements/1.1/'>
<dc:type>Hash List</dc:type>
</metadata>
<creator version='1.0'>
<program>MD5DEEP</program>
<version>4.3</version>
<build_environment>
<compiler>GCC 4.7</compiler>
</build_environment>
<execution_environment>
<command_line>c:\temp\md5deep64.exe -r -d *</command_line>
<start_time></start_time>
</execution_environment>
</creator>
<configuration>
<algorithms>
<algorithm name='md5' enabled='1'/>
<algorithm name='sha1' enabled='0'/>
<algorithm name='sha256' enabled='0'/>
<algorithm name='tiger' enabled='0'/>
<algorithm name='whirlpool' enabled='0'/>
</algorithms>
</configuration>
<fileobject>
<filename>C:\bootmgr</filename>
<filesize>398356</filesize>
<ctime></ctime>
<mtime></mtime>
<atime></atime>
<hashdigest type='MD5'>55272fe96ad87017755fd82f7928fda0</hashdigest>
</fileobject>
<fileobject>
<filename>C:\BOOTNXT</filename>
<filesize>1</filesize>
<ctime></ctime>
<mtime></mtime>
<atime></atime>
<hashdigest type='MD5'>93b885adfe0da089cdf634904fd59f71</hashdigest>
</fileobject>
</dfxml>

To extract md5 and filepath from the XML, we can use minidom python library.

from xml.dom import minidom
xmldoc = minidom.parse(fn)
files = xmldoc.getElementsByTagName('fileobject')
for fileobject in files:
  fn = fileobject.getElementsByTagName('filename')[0]
  md5 = fileobject.getElementsByTagName('hashdigest')[0]
  print fn.firstChild.data +", "+ md5.firstChild.data

Once we execute the python code to parsing a huge XML, however, we can easily meet Memory Error. To avoid this kind of error, I used BeautifulSoup.

from bs4 import BeautifulSoup
fp = open(fn, 'r')
soup = BeautifulSoup(fp, 'xml')
for node in soup.findAll('fileobject'):
  try:
    print "%s, %s"%(node.hashdigest.string,node.filename.string)
  except UnicodeEncodeError as e:
    continue

The whole code is uploaded at my GitHub.
https://github.com/hojinpk/CodeSnippets/blob/master/extracting_md5_from_XML.py

BurpSuite를 이용한 HTTPS 웹사이트 분석

Posted on April 28, 2016 by Hojin

BurpSuite 를 사용해 HTTPS 웹사이트를 분석할 경우, 몇 가지 작업을 통해 정상적인 분석이 가능했다.

PortSwigger CA 인증서 설치

웹브라우져가 https 주소의 웹서버에 접근할 때 BurpSuite는 자신이 해당 웹서버에 접속하여 서버로 부터 받은 암호화된 정보를 분석 후, 다시 자신의 인증서로 HTTPS 를 웹브라우져에게 전달한다.

이와 같은 이유로 클라이언트에는 BurpSuite 의 인증서가 “신뢰할 수 있는 루트 인증기관”에 설치되어 있어야 한다. 로컬에 설치된 인증서들을 certmgr.msc 통해서 확인할 수 있다.

Java Cryptography Extension(JCE) 설치

BurpSuite 인증서를 클라이언트에 설치 후 다시 접속해 보면 “Received fatal alert: handshake_failure” 이라는 에러 메시지를 만날 수 있다. 이 문제를 해결하기 위해서는 JCE를 업데이트 해야 한다. (적어도 나의 경우에는 그랬다.)

클라이언트에 설지된 Java의 버전과 같은 JCE를 다운로드 받은 후 (압축을 풀어) local_policy.jar, US_export_policy.jar 파일을 %JAVA_HOME%\lib\security\ 폴더에 복사하면 된다. BurpSuite를 재시작 해주는 것도 잊지 말자. 필자의 경우 해당 폴더의 경로는, C:\Program Files (x86)\Java\jre1.8.0_91\lib\security\ 와 같았다.

Critical Items for Incident Report

Posted on January 5, 2016 by Hojin

I am going to talk about the essential requirement of Incident Report.

Today I saw an incident report as a worse case in accordance with my argument as above. The incident response in the bad report is reporting to the team manager. And the countermeasure is that the reporter would do something after getting an approval from the manager.

First, the report should be included the followings at least in my opinion.

Date and Time with UTC
What’s the incident?
Initial Response
Analysis
Countermeasure

Second, there is purpose of these items above. When we’re responding an incident, we tend to forget what is the incident because we can easily be flaggy under pressure. Once we’re under the incident, we have to be trying to focus to prevent to extent the incident and preserve the evidence as many as we can. In the meantime we’re on the initial response or after that, we should analyze the incident according the evident in order to identify the extend of the incident. Additionally it would be better to analyze it with over 2 analysts at least to avoid to be biased the result. With the accurate analysis result, we can also build countermeasure(s) in order to prevent re-occurrence.

In addition, there are a couple of method on each step. For example, MECE(Mutually Exclusive and Collectively Exhaustive), PDCA(Plan, Do, Check, Action), 5 Whys and so on. If we introduce the method when responding the incident, we can get a little bit closer the aforementioned purpose.

BitLocker without TPM

Posted on November 28, 2014 by Hojin

Most people know BitLocker can be enabled only if there is Trusted Platform Module (“TPM”) on a motherboard. However, we can use the BitLocker without TPM chip. Here is a manual.

TPM, Trusted Platform Module

Press “Windows+R” at the same time on your keyboard to start the Run command line.
Type “gpedit.msc” and click OK.
A new window will be opened.
Click on Administrative Templates under Computer Configuration
Double Click on Windows Components in the right window.
Double Click on BitLocker Drive Encryption.
Double Click on Operating System Drives.
Double Click on Require Additional Authentication at Startup
A new window will be opened.
Click “Enabled”
Click OK and close the window.

Now you are ready to use BitLocker for your drives.

Right Click on C or any drive in Computer folder.
Choose Turn On BitLocker
Follow the Steps

If you want to know whether your system has a TPM on Windows, you can check it out on Trusted Platform Module Management (tpm.msc)

Windows’ uptime

Posted on October 15, 2014 by Hojin

C:\> net stats srv
C:\> systeminfo | find “System Boot Time”
In Windows Task Manager, select the Performance tab.
GetTickCount64()
In Windows System Event Log, Event ID 6005

Note: the Event ID 6006 is what tells us when the server has gone down, so if there’s much time difference between the 6006 and 6005 events, the server was down for a long time.

error C2679

Posted on October 11, 2014 by Hojin

binary ‘+=’ : no operator found which takes a right-hand operand of type ‘BYTE [6]’ (or there is no acceptable conversion)

CString ret;
typedef struct _SID_IDENTIFIER_AUTHORITY {
BYTE Value[6];
} SID_IDENTIFIER_AUTHORITY, *PSID_IDENTIFIER_AUTHORITY;

ret += sid.IdentifierAuthority.Value;

I just “(TCHAR)” to change the value type like this.

ret += (TCHAR)sid.IdentifierAuthority.Value;

CFile::ReadHuge

Posted on October 11, 2014 by Hojin

ReadHuge is provided only for backward compatibility. ReadHuge and Read have the same semantics under Win32.

Source) http://msdn.microsoft.com/en-us/library/aa270527(v=vs.60).aspx

error MSB8031

Posted on October 11, 2014 by Hojin

When you build source code that was being made by VS6 in VS 2013, you may occurred this error message.

Building an MFC project for a non-Unicode character set is deprecated. You must change the project property to Unicode or download an additional library. See http://go.microsoft.com/fwlink/p/?LinkId=286820 for more information.

You can download Multibyte MFC Library for VS 2013. This add-on for VS 2013 contains the multibyte character set (MBCS) version of the Microsoft Foundation Class (MFC) Library.

http://www.microsoft.com/en-US/download/details.aspx?id=40770

Exclude something by regular expression on Splunk

Posted on May 14, 2014 by Hojin

Splunk support regular expression when you search. It’s very helpful for those who want to extract or exclude something. One day, I found suspicious domain name like ‘afyblkodyg’, ‘imdcbazmqh’, etc. in proxy log. Actually, these words are not domain in fact. Anyway, I want to know how often does it happen. So I decided to search such a strange words in the proxy log by Splunk.

Splunk> index=idxproxy
| rex field=cs_host(?<xdomain>.*(?<!\.com|\.co|\.lu|\.net|\.org)$)”
| search xdomain!=”” | table _time, c_ip, xdomain

_time c_ip xdomain
1 4/21/14 5:06:27.000 AM 10.10.250.252 afyblkodyg
2 4/21/14 5:06:27.000 AM 10.10.250.252 imdcbazmqh
3 4/21/14 5:06:27.000 AM 10.10.250.252 nidxikaxyh
4 4/17/14 9:39:43.000 PM 10.10.250.252 stqbnqsfok
5 4/17/14 9:39:43.000 PM 10.10.250.252 bbrsqktfut
6 4/17/14 9:39:43.000 PM 10.10.250.252 dnvujghghr

Base on the log

Posted on April 2, 2014 by Hojin

A suspicious user have been access to login a web page in order to login brute force attack. So Security engineer trying to block by using a sort of Network appliance. For this reason he/she needs a number of limit access time.

Basically, we should gather various logs and it should be able to search. The effectiveness is the next step. On based these systems, we have to extract a number from the log, not estimation so as to get the limit number.

In my case, I used the Splunk for it as below.

earliest=”4/2/2014:13:50:00″ latest=”4/2/2014:14:00:00″
source=IIS cs_method=POST
(cs_uri_stem=”/Authentication.asmx*” OR cs_uri_stem=”/Registration.asmx*”)
| where NOT cidrmatch(“10.10.0.0/16″,c_ip)
| timechart span=5s limit=5 count by c_ip usenull=f useother=f

Eventually, I can define the limit number is 20 per 5 seconds as above graph.

Practical Security

I'd like to discuss about security where in practical digital world.

Category Archives: Analysis

Extracting Data from XML by Python

BurpSuite를 이용한 HTTPS 웹사이트 분석

PortSwigger CA 인증서 설치

Java Cryptography Extension(JCE) 설치

Critical Items for Incident Report

BitLocker without TPM

Windows’ uptime

error C2679

CFile::ReadHuge

error MSB8031

Exclude something by regular expression on Splunk

Base on the log