Doop Notes
Written from June 2023.
Doop Output:
Installation and Usage
- 项目地址
- tutorial (基本在 plast-lab 的实验室官网)
- [[信息安全/程序分析/阅读/Pointer Analysis – Doop]]
- http://plast-lab.github.io/feb16-seminar/
- https://github.com/plast-lab/feb16-seminar
- (6) Seminar (2016): Program Analysis with Datalog – YouTube
- emm 实践上推荐 byteCodedl 项目中的描述
- 因为 byteCodedl 项目也是用 doop中的generate-facts工具生成规则
- 大致 谓词名及其含义 一致,主要作者手册写得好
- 本地Note [[BytecodeDL 测试]]
- MockDetector: Detecting and tracking mock objects in unit tests (推荐)
- master-thesis/thesis.pdf at master · cvrac/master-thesis (github.com)
- Discuss:
安装环节不再赘述,可以看官方Docs或者References,可以给gradle添加代理。
cat .gradle/gradle.properties
#代理服务器IP/域名
systemProp.http.proxyHost=192.168.98.1
#代理服务器端口
systemProp.http.proxyPort=7890
#代理服务器IP/域名
systemProp.https.proxyHost=192.168.98.1
#代理服务器端口
systemProp.https.proxyPort=7890
- basic usage:
./doop --help
> Task :run
usage: doop -i <INPUT> -a <NAME> [OPTION]...
Run an analysis on a program (given as a combination of code inputs, code libraries, and a platform).
== Basic options ==
-a,--analysis <NAME> The analysis to use. Examples: context-insensitive, 1-call-site-sensitive, micro
-i,--input-file <INPUT> The (application) input files of the analysis. Accepted formats: .jar, .war, .apk, .aar, maven-id
--id <ID> The analysis id. If omitted, it is automatically generated.
Usage(Part)
- 自用
-app-only 只分析输入app, 不分析依赖和框架
--id test 任务id, 用于out/${id}
-a context-insensitive 非上下文敏感
--generate-jimple 生成 jimple IR 文件
--extra-logic $BASE_DIR/rules/output.dl 加载自定义文件
Analysis Structrue
Doop 执行流程大致可以分为三步:
- 使用 soot 生成 jimple 文件
- 使用
--generate-jimple
参数可以输出 jimple 文件,在output/<ID>/database/jimple
文件夹下
- 使用
- 将 jimple 文件转换为 datalog 引擎的输入事实(.facts)
- 使用 souffle 引擎执行选定的分析,将关系输出为 .csv,即分析结果
在使用doop进行分析时,会采用-a
(即 --analysis
)来指定分析的类型。
官方Docs中 yanniss / doop / docs / doop-101.md — Bitbucket,以-a micro
为例,会调用 souffle-logic/analyses/micro/analysis.dl
其中的规则。
To examine the other analyses of the Doop framework, follow this structure:
- Their input schema can be found in
souffle-logic/facts/flow-insensitivite-schema.dl
.- emm,没有这个文件,类似的是
souffle-logic/facts/flow-insensitive-facts.dl
和souffle-logic/facts/flow-sensitive-schema.dl
- emm,没有这个文件,类似的是
- The rules for an analysis A, can be found in
souffle-logic/analyses/$A/analysis.dl
.
此外对于采用什么soufflé中dl文件与cmd option 之前的关系,可以查看groovy代码
src/main/groovy/org/clyze/doop/core/SouffleAnalysis.groovy
中 run
可以认为是 doop 利用 souffle 分析的起点,这里选择 runAnalysisAndProduceStats 和 produceStats 进行分析,实际上还有一堆 选项 来加载。
void runAnalysisAndProduceStats(File analysis) {
mainAnalysis(analysis)
produceStats(analysis)
}
void mainAnalysis(File analysis) {
// Check the open programs argument before calling the preprocessor.
String openProgramsProfile = null
String openProgramsRules = options.OPEN_PROGRAMS.value
// step 1:
// 根据 openProgramsRules,设置openProgramsProfile
if (openProgramsRules) {
// openProgramsRules 为 doop 的 --open-programs 参数
openProgramsProfile = "${Doop.souffleLogicPath}/addons/open-programs/rules-${openProgramsRules}.dl"
if (!(new File(openProgramsProfile)).exists())
throw DoopErrorCodeException.error35("Open program rules profile does not exist: " + openProgramsProfile)
}
// step 2:
// 使用基本的分析:
// 1. basic/basic.dl
// 2. ${Doop.souffleAnalysesPath}/${analysis.name}/analysis.dl, analysis.name 为 doop 的 -a | --analysis
// ${Doop.souffleAnalysesPath} 为 souffle-logic/analyses/
cpp.includeAtEnd("$analysis", "${Doop.souffleLogicPath}/basic/basic.dl")
cpp.includeAtEnd("$analysis", "${Doop.souffleAnalysesPath}/${getBaseName(analysis.name)}/analysis.dl")
// step 3:
// 用于 information-analysis(taint-analysis)
// 若 options.ANALYSIS.value (即 -a) 为 data-flow, 则采用 addons/information-flow/rules-data-flow.dl
// 按照论文的解释,是因为doop中的流敏感分析,会利用SSA将原来的流非敏感转换为流敏感的分析。
// 反之,采用 addons/information-flow/rules.dl
// 且无论怎么样, 都会加载 ${options.INFORMATION_FLOW.value}-sources-and-sinks.dl
// INFORMATION_FLOW 参数 为 doop 的 --information-flow-XXXX
if (options.INFORMATION_FLOW.value) {
String infoflowDir = "${Doop.souffleLogicPath}/addons/information-flow"
if (options.ANALYSIS.value == 'data-flow')
cpp.includeAtEnd("$analysis", "${infoflowDir}/rules-data-flow.dl")
else
cpp.includeAtEnd("$analysis", "${infoflowDir}/rules.dl")
cpp.includeAtEnd("$analysis", "${infoflowDir}/${options.INFORMATION_FLOW.value}${INFORMATION_FLOW_SUFFIX}.dl")
}
// 根据 step 1, 加载 openProgramsProfile
if (openProgramsProfile) {
log.debug "Using open-programs rules: ${openProgramsRules}"
cpp.includeAtEnd("$analysis", openProgramsProfile)
}
// step 4:
// sanity check
// 规则文件: addons/sanity.dl
//
if (options.SANITY.value) {
cpp.includeAtEnd("$analysis", "${Doop.souffleLogicPath}/addons/sanity.dl")
if (options.DISTINGUISH_REFLECTION_ONLY_STRING_CONSTANTS.value) {
log.warn("WARNING: The sanity check is not fully compatible with --" + options.DISTINGUISH_REFLECTION_ONLY_STRING_CONSTANTS.name)
}
if (options.DISTINGUISH_ALL_STRING_CONSTANTS.value) {
log.warn("WARNING: The sanity check is not fully compatible with --" + options.DISTINGUISH_ALL_STRING_CONSTANTS.name)
}
if (options.NO_MERGES.value) {
log.warn("WARNING: The sanity check is not fully compatible with --" + options.NO_MERGES.name)
}
}
// step 5:
// 添加 自定义的规则文件(列表)
if (options.EXTRA_LOGIC.value) {
Collection<String> extras = options.EXTRA_LOGIC.value as List<String>
for (String extraFile : extras) {
File extraLogic = new File(extraFile)
if (!extraLogic.exists())
throw new RuntimeException("Extra logic file does not exist: ${extraLogic}")
String extraLogicPath = extraLogic.canonicalPath
// Safety: check file extension to avoid using this mechanism
// to read files from anywhere in the system.
if (extraLogicPath.endsWith('.dl')) {
log.info "Adding extra logic file ${extraLogicPath}"
cpp.includeAtEnd("${analysis}", extraLogicPath)
} else
log.warn "WARNING: Ignoring file not ending in .dl: ${extraLogicPath}"
}
}
}
void produceStats(File analysis) {
// 该部分主要是处理统计信息
def statsPath = "${Doop.souffleLogicPath}/addons/statistics"
// options.X_EXTRA_METRICS 为 --extra-metrics
// 该参数表示将加载 addones/statistics/metrics.dl
if (options.X_EXTRA_METRICS.value) {
cpp.includeAtEnd("$analysis", "${statsPath}/metrics.dl")
}
// 实验性选项:
// doop 参数: Xstats-none
// Do not load logic for collecting statistics.
if (options.X_STATS_NONE.value) return
// 加载 analyses/$name/statistics.dl
// $name 为 -a --analysis
def specialStats = new File("${Doop.souffleAnalysesPath}/${name}/statistics.dl")
if (specialStats.exists()) {
cpp.includeAtEnd("$analysis", specialStats.toString())
return
}
cpp.includeAtEnd("$analysis", "${statsPath}/statistics-simple.dl")
if (options.X_STATS_FULL.value || options.X_STATS_DEFAULT.value) {
cpp.includeAtEnd("$analysis", "${statsPath}/statistics.dl")
}
}
Running custom logic
存在两种方式来使用自定义的规则进行分析:
- 在分析时,直接采用自定义的规则
- 在分析结束后,采用自定义的规则进行分析
Docs中介绍了这两种方式,来获取 Example.test()
的 variable-points-to information
Running custom logic inside the analysis(推荐)
- extra.dl
#!java
.decl Temp(v: Var, h: Value)
Temp(v, h) :-
// 需要注意的是,有些关系是隶属于特定组件的
mainAnalysis.VarPointsTo(_, h, _, v),
Var_DeclaringMethod(v, "<Example: void test(int)>").
.output Temp
运行下面的指令加载自定义规则
./doop -a context-insensitive -i docs/doop-101-examples/Example.jar --stats none --extra-logic extra.dl
缺点是,当需要修改自定义规则时,需要从头再跑一遍。
Running custom logic after the analysis
- Temp.dl 文件
#!java
.decl Var_DeclaringMethod(v: symbol, m: symbol)
.input Var_DeclaringMethod(IO="file", filename="Var-DeclaringMethod.facts", delimiter="\t")
.decl VarPointsTo(c1: symbol, h: symbol, c2: symbol, v: symbol)
.input VarPointsTo(IO="file", filename="VarPointsTo.csv", delimiter="\t")
.decl Temp(v: symbol, h: symbol)
Temp(v, h) :-
VarPointsTo(_, h, _, v),
Var_DeclaringMethod(v, "<Example: void test(int)>").
.output Temp
修改规则无需从头重新分析,但需要在规则中 .decl
和 .input
csv 。
之后的使用,与正常的souffle 使用 无异。
souffle —f facts-dir -D output-dir temp.dl -j 4
souffle – rules
- All code methods are in Method.facts (and thus Method_DeclaringType which is just a projection of Method that only records the class/interface that contains the method). Some of these methods may not be reachable (= not called by anything).
- Relation ApplicationMethod contains only the methods that are part of the application (not the Java platform). Again, these methods may be reachable or not.
- Relation Reachable contains the reachable methods. These methods are reachable because something invoked them, so they must appear in AnyCallGraphEdge. Almost always they also appear in CallGraphEdge (which also records the contexts if you ran a context-sensitive analysis).
具体的rules位于目录文件: doop/souffle-logic
,下面为部分文件的阅读笔记。
basic/basic.dl
确定 main 函数,以及引入其他 规则。
#include "../commonMacros.dl"
.comp Basic {
#include "exceptions.dl"
#include "finalization.dl"
#include "method-lookup.dl"
#include "method-resolution.dl"
#include "type-hierarchy.dl"
#include "native-strings.dl"
.decl AnyMainMethodDeclaration(?method:Method, ?type:ClassType)
AnyMainMethodDeclaration(?method, ?type) :-
Method_DeclaringType(?method, ?type),
Method_SimpleName(?method, "main"),
Method_Descriptor(?method, "void(java.lang.String[])"),
Method_Modifier("public", ?method),
Method_Modifier("static", ?method).
.decl MainMethodDeclaration(?method:Method)
MainMethodDeclaration(?method) :-
// DISCOVER_MAIN_METHODS 在 doop中的具体设置 --discover-main-methods
#ifdef DISCOVER_MAIN_METHODS
// Consider every main() in the application a "main method".
// ApplicationClass 为 soot 中设置的 表示被分析应用程序的类
// 用于表示该type类,是出现在待分析的应用中
// 如 jar 中含有 A class,则在 ApplicationClass.facts 中会有 "A" 值
ApplicationClass(?type),
#else
// Use input facts for "main" methods.
// doop 中的具体指令 --main <MAIN>
// 用于指定需要分析的main函数对应的class
// 同样也是需要soot生成的 MainClass.facts
MainClass(?type),
#endif // DISCOVER_MAIN_METHODS
// type 表示从哪个class中的main函数作为分析,作为 "main method"
AnyMainMethodDeclaration(?method, ?type).
#ifdef ANDROID
// Android apps start by running ActivityThread.main() or other
// internal entry points.
// TODO: this rule should only fire when analyzing an APK, not an AAR.
MainMethodDeclaration(?method) :-
( ?method = "<android.app.ActivityThread: void main(java.lang.String[])>"
; ?method = "<com.android.internal.os.RuntimeInit: void main(java.lang.String[])>"),
isMethod(?method).
#endif // ANDROID
}
// soufflé component 语法:
// 定义 component后,需要 initialize
.init basic = Basic
#ifdef CFG_ANALYSIS
#include "../addons/cfg-analysis/analysis.dl"
#endif
Experiment
见 DeSerSniffer 1.6.0 及之前的版本,后面的版本提出了一个BIFS(Bottom-up Information flow Summary)算法来解决多态导致的路径爆炸问题。
DOOP之魂 开始,请继续受苦,赞美太阳, Ẏ!
Motivation
使用 doop 分析框架 实现 污点分析、自定义 entry ,最终目标是实现利用 doop 实现 反序列化漏洞的静态分析,之后根据这些信息 进行 property-based fuzzing。
- [x] 实现 Entry 自定义,不然分析空间太大了,不利于分析
- [x] 考虑多态问题
- [x] 污点分析 – 剪枝
- [x] 为什么 P/Taint 是正交分析,那该如何Merge信息,以实现高效的污点分析
- [x] 如何 设置 Transform 、 Sink 、Sanitize
- [x] 实现 小样本实验
- [x] 基本实现 simple sample
- 各种Java 语言特性的测试样本
- [x] 反射
- [ ] 尝试符号执行: 见 Datalog Based Symbolic Program Reasoning for Java
Entry Points
test.java
class Test {
public void noEntry() {
calledByEntry();
}
public void calledByEntry() {
System.out.println("called by entry method");
}
public void calledByMain() {
System.out.println("called by main method");
}
public static void main(String[] args) {
Test t = new Test();
t.calledByMain();
}
}
manifest.txt
Manifest-Version: 1.0
Main-Class: Test
📦 jar
javac test.java
mv Test.class classes/
jar cvfm example.jar manifest.txt -C classes/ .
doop analysis
ID=withConcreteTest
BASE_DIR=/Users/fe1w0/Project/SoftWareAnalysis/DataSet/testjar
INPUT=$BASE_DIR/example.jar
DOOP_HOME=/Users/fe1w0/Project/SoftWareAnalysis/StaticAnalysis/doop
# doop setup
ANALYSIS="context-insensitive"
# INFO_FLOW="--information-flow minimal"
# JIMPLE=" --generate-jimple"
# OPEN PROGRAM
OPEN_PROGRAM="--open-programs concrete-types"
# souffle
SOUFFLE_JOBS="--souffle-jobs 16"
SOUFFLE_MODE="--souffle-mode interpreted"
# extra logic
EXTRA_LOGIC="--extra-logic $BASE_DIR/rules/output.dl"
# Remember `-app-only` must be in front!
# On the contrary, no facts will be generated by doop.
# Strange Error!
EXTRA_ARG="-app-only ${SOUFFLE_MODE} ${SOUFFLE_JOBS} ${EXTRA_LOGIC} ${OPEN_PROGRAM} ${JIMPLE}"
cd $DOOP_HOME
CMD="${DOOP_HOME}/doop -a $ANALYSIS -i ${INPUT} --id ${ID} ${EXTRA_ARG}"
echo "doop: $CMD"
eval "${DOOP_HOME}/doop -a $ANALYSIS -i ${INPUT} --id ${ID} ${EXTRA_ARG}"
在生成的gen_xxx.dl
文件中,component 和 instance,和其他 predicate 大致关系:
# ComponentName [Relation] StartLine EndLine
# - predicate Name
# - instance StatementLine
- Basic 2036 2698
- .init basic = Basic 2700
- AbstractContextSensitivity<Configuration> 2750, 6056
- 谓词: ApplicationEntryPoint
- .init configuration = Configuration
- configuration = ContextInsensitiveConfiguration
- BasicContextSensitivity 继承 AbstractContextSensitivity 6073 6099
- .init mainAnalysis = BasicContextSensitivity 6102
- AbstractConfiguration 6104 6120
- ContextInsensitiveConfiguration 继承 AbstractConfiguration 6128 6158
结合输出结果和生成的gen_xxx.dl
文件(建议阅读该文件),原 concrete-types
规则中,会将认为以下条件的 method 都可以作为 entry points 的 methods 和 classes
ClassHasPossibleOpenProgramEntryPoint(?class),
PossibleOpenProgramEntryPoint(?method) :-
Method_DeclaringType(?method, ?class),
Method_Modifier("public", ?method),
!Method_Modifier("abstract", ?method),
!ClassModifier("private", ?class).
OpenProgramEntryPoint(?method) :-
PossibleOpenProgramEntryPoint(?method).
ContextForOpenProgramEntryPoint(?calleeCtx, ?method) :-
ThisVar(?method, ?receiver),
Method_DeclaringType(?method, ?class),
MockObjectForType(?value, ?class),
mainAnalysis.isImmutableHContext(?immutablehctx),
mainAnalysis.isImmutableContext(?callerCtx),
mainAnalysis.configuration.ContextResponse(?callerCtx, ?immutablehctx, ?invo, ?value, ?method, ?calleeCtx),
mainAnalysis.configuration.ContextRequest(?callerCtx, ?immutablehctx, ?invo, ?value, ?method, 1),
OpenProgramEntryPoint(?method).
// 之后 ReachableContext 会影响 Reachable, VarPointsTo
// 通过 configuration 中的 ContextResponse 和 ContextRequest 传播到 CallGraphEdge
mainAnalysis.ReachableContext(?ctx, ?method) :-
ContextForOpenProgramEntryPoint(?ctx, ?method).
// ContextRequest 在ContextInsensitiveConfiguration 中,
// 可直接推导出 ContextResponse 和 CallGraphEdge
mainAnalysis.configuration.ContextRequest(?callerCtx, ?immutablehctx, ?invo, ?value, ?method, 1) :-
MockObjectForType(?value, ?class),
Method_DeclaringType(?method, ?class),
MockInvocationForEntryPoint(?value, ?method, ?invo),
mainAnalysis.isImmutableHContext(?immutablehctx),
mainAnalysis.isImmutableContext(?callerCtx).
MockInvocationForEntryPoint(?value, ?method, cat(cat(cat("<mock-invo ", ?value), ?method), ">")),
isInstruction(cat(cat(cat("<mock-invo ", ?value), ?method), ">")), isMethodInvocation(cat(cat(cat("<mock-invo ", ?value), ?method), ">")) :-
Method_DeclaringType(?method, ?class),
MockObjectForType(?value, ?class).
MockObjectForType(?value, ?staticType) :-
MockObject(?value, ?class),
StaticToActualType(?class, ?staticType).
MockObject(?mockObj, ?class),
mainAnalysis.Value_isMock(?mockObj), isValue(?mockObj), mainAnalysis.Value_Type(?mockObj, ?class), mainAnalysis.Value_DeclaringType(?mockObj, "java.lang.Object"),
mainAnalysis.Value_DeclaringType(?mockObj, ?class) :-
ObjToMock(?class),
// mockObj 等于 "${?class}::MockObject"
?mockObj = cat(?class, "::MockObject").
// 将所有 public 非 abstract method 对应的 class 进行 mock
MockObjFromOpenProgramEntryPoint(?class),
ObjToMock(?class) :-
// 所有 public 非 abstract method
OpenProgramEntryPoint(?method),
Method_DeclaringType(?method, ?class),
isReferenceType(?class),
!ClassModifier("abstract", ?class).
若不采用--open-programs concrete-types
,可以模拟 concrete-types
中的实现,只约定 特定的入口点函数,以避免资源浪费,比如直接修改 addons/open-programs/entry-points.dl 中的规则。
// fork form Serhybrid
ClassHasPossibleOpenProgramEntryPoint(?class),
PossibleOpenProgramEntryPoint(?method) :-
Method_DeclaringType(?method, ?class),
?class = "instrumenter.trampolines.Trampolines",
Method_Modifier("public", ?method),
Method_SimpleName(?method, "entry"),
!Method_Modifier("abstract", ?method).
CHA 算法
在souffle-logic/basic/method-resolution.dl
中,且ResolveInvocation
会在后面的流程中用于构造configuration.ContextRequest
。
/**
* Encodes a CHA callgraph, effectively
*/
.decl ResolveInvocation(?type:Type, ?invocation:MethodInvocation, ?tomethod:Method)
//.output ResolveInvocation
// Auxiliary
.decl VirtualMethodInvocation_BaseType(?invocation:MethodInvocation, ?type:Type)
VirtualMethodInvocation_BaseType(?invocation, ?basetype) :-
VirtualMethodInvocation_Base(?invocation, ?base),
Var_Type(?base, ?basetype).
// 处理子类的逻辑
ResolveInvocation(?type, ?invocation, ?tomethod) :-
VirtualMethodInvocation_SimpleName(?invocation, ?simplename),
VirtualMethodInvocation_Descriptor(?invocation, ?descriptor),
VirtualMethodInvocation_BaseType(?invocation, ?basetype),
SubtypeOf(?type, ?basetype),
MethodLookup(?simplename, ?descriptor, ?type, ?tomethod).
// 处理父类的逻辑
ResolveInvocation(?basetype, ?invocation, ?tomethod) :-
SuperMethodInvocation_SimpleName(?invocation, ?simplename),
SuperMethodInvocation_Descriptor(?invocation, ?descriptor),
SuperMethodInvocation_Base(?invocation, ?base),
Var_Type(?base, ?basetype),
DirectSuperclass(?basetype, ?supertype),
MethodLookup(?simplename, ?descriptor, ?supertype, ?tomethod).
// 将分析结果进行推断
configuration.ContextRequest(?callerCtx, ?hctx, ?invocation, ?value, ?tomethod, 1) :-
OptVirtualMethodInvocationBase(?invocation, ?base),
// 必须有实际的 VarPointsTo,才允许 ContextRequest
VarPointsTo(?hctx, ?value, ?callerCtx, ?base),
Value_Type(?value, ?valuetype),
basic.ResolveInvocation(?valuetype, ?invocation, ?tomethod).
引入拓展调用边,来解决多态问题
设置虚拟假设的 configuration.ContextRequest
,该 configuration.ContextRequest
在源代码中并不存在,是为了检测反序列化漏洞添加的。
Taint – Analysis
参考阅读 :
- P.Taint@ Unified Points-to and Taint Analysis
- A Hybrid Analysis to Detect Java Serialisation Vulnerabilities
直接根据DOOP的Readme来看,对应的设置参数是--information-flow
,且相关规则文件的加载逻辑见上文中的src/main/groovy/org/clyze/doop/core/SouffleAnalysis.groovy
代码注释。
简而言之,会根据是否使用data-flow
加载rules-data-flow.dl
或rules.dl
,以及--information-flow
的选项不同(如minimal-sources-and-sinks.dl
),加载不同的${options.INFORMATION_FLOW.value}-sources-and-sinks.dl
文件。
此外,Doop 中一般需要自己自定义Sink和Source,自带的Sink和Source有点少。
--information-flow minimal
在--information-flow minimal
下,会按序加载以下文件
- souffle-logic/addons/information-flow/rules.dl // 支持 反序列的 污点转移 以及 基本的 污点转移 逻辑
- souffle-logic/addons/information-flow/macros.dl // 定义一些宏
- souffle-logic/addons/information-flow/declarations.dl // 声明信息(谓词)
- souffle-logic/addons/information-flow/delta.dl // isInformationLabel
- souffle-logic/addons/information-flow/core.dl
- souffle-logic/addons/information-flow/minimal-sources-and-sinks.dl // 一些 定制信息
在souffle-logic/addons/information-flow/
的文件中,可以发现doop中采用这样的方式来自定义设置Source
和Sink
信息。
// take from declarations.dl
/** Taint specifications that may come from the user. */
.decl TaintSpec(?type:symbol, ?tag:symbol, ?id:symbol)
.input TaintSpec(filename="TaintSpec.facts")
// take from core.dl
// User-provided taint sources (methods).
TaintSourceMethod(?label, ?method) :-
TaintSpec("TAINT_SOURCE", ?label, ?method),
isMethod(?method).
// User-provided taint sinks (methods).
LeakingSinkMethod(?label, ?method) :-
TaintSpec("TAINT_SINK", ?label, ?method),
isMethod(?method).
minimal-sources-and-sinks.dl
中 存在默认的配置信息
#include "common-transfer-methods.dl"
//InformationLabel("default").
TaintSourceMethod("default", "<java.io.BufferedReader: java.lang.String readLine()>").
TaintSourceMethod("default", "<java.io.BufferedReader: int read(char[],int,int)>").
// The latter is not a great taint source (since it returns ints) but it's good for minimal testing
LeakingSinkMethodArg("default", 0, "<java.io.PrintWriter: void println(java.lang.String)>").
在下面的分析内容中,将从TaintSourceMethod
和LeakingSinkMethod
分开分析。
TaintSourceMethod
// requirement: 需要该method 存在application Method 中
CallTaintingMethod(?label, ?ctx, ?invocation) :-
TaintSourceMethod(?label, ?tomethod),
MethodInvocationInContext(?ctx, ?invocation, ?tomethod),
Instruction_Method(?invocation, ?inmethod),
ApplicationMethod(?inmethod).
// 见 macos.dl 中的 TaintedValueIntroduced 定义
// #define 是个好方法
/*
TaintedValue, SourceFromTaintedValue, LabelFromSource,
mainAnalysis.VarPointsTo, mainAnalysis.Value_isMock, mainAnalysis.Value_Type, mainAnalysis.Value_DeclaringTyp
*/
#define ValueIdMacro(id, type, breadcrumb) \
cat(cat(cat(cat(id, "::: "), type), "::: "), breadcrumb)
// id 为 invocation instruction
#define TaintedValueIntroduced(declaringType, id, type, label, value) \
mainAnalysis_MockValueConsMacro(value, type), \
TaintedValue(value), \
SourceFromTaintedValue(value, id), \
LabelFromSource(id, label), \
mainAnalysis.Value_DeclaringType(value, declaringType)
TaintedValueIntroduced(?declaringType, ?invo, ?type, ?label, ValueIdMacro(?invo, ?type, DEFAULT_BREADCRUMB)),
mainAnalysis.VarPointsTo(?hctx, ValueIdMacro(?invo, ?type, DEFAULT_BREADCRUMB), ?ctx, ?to) :-
CallTaintingMethod(?label, ?ctx, ?invo),
mainAnalysis.isImmutableHContext(?hctx),
// ?invo: ?to = vcall
TypeForReturnValue(?type, ?to, ?invo),
Instruction_Method(?invo, ?method),
Method_DeclaringType(?method, ?declaringType).
// TaintedValue -> TaintedVarPointsTo
TaintedVarPointsTo(?value, ?ctx, ?var) :-
TaintedValue(?value),
mainAnalysis.VarPointsTo(_, ?value, ?ctx, ?var).
// 对 label 的 作用好奇
LeakingTaintedInformation(?sourceLabel, ?destLabel, ?ctx, ?invocation, ?source) :-
SourceFromTaintedValue(?value, ?source),
LabelFromSource(?source, ?sourceLabel),
TaintedVarPointsTo(?value, ?ctx, ?var),
LeakingSinkVariable(?destLabel, ?invocation, ?ctx, ?var).
// 用于统计信息的,但也没有具体用到label
LeakingSinkMethod
LeakingSinkMethodArg(?label, ?index, ?method) :-
LeakingSinkMethod(?label, ?method),
FormalParam(?index, ?method, _).
// In case method has some arguments, assume arg variable
LeakingSinkVariable(?label, ?invocation, ?ctx, ?var) :-
LeakingSinkMethodArg(?label, ?index, ?tomethod),
MethodInvocationInContextInApplication(?ctx, ?invocation, ?tomethod),
ActualParam(?index, ?invocation, ?var).
// In case method has no arguments, assume base variable.
LeakingSinkVariable(?label, ?invocation, ?ctx, ?var) :-
LeakingSinkMethod(?label, ?tomethod),
!FormalParam(_, ?tomethod, _),
MethodInvocationInContextInApplication(?ctx, ?invocation, ?tomethod),
MethodInvocation_Base(?invocation, ?var).
最后当找到结果时,会在 LeakingTaintedInformation
和 LeakingTaintedInformationVars
这两个表格中进行输出。
实验
基本的污点分析
设置自定义的 Source 和 Sink 函数信息,还是采用soufflé – logic 的方式,facts的方式书写太麻烦了。
test.java
class Test {
String name = "TEST";
public void calledByMain() {
System.out.println("called by main method");
}
public static void main(String[] args) {
Test t = new Test();
t.calledByMain();
// t.noEntry();
// source
int sourceId = Taint.source();
// Transform
String strId = t.entry(sourceId);
// Sink
Taint testTaint = new Taint();
testTaint.maybeEvil(strId);
}
public String entry(int str) {
String idStr = Taint.tranform(str);
return idStr;
}
}
class Taint {
String name = "Taint";
public static int source(){
return 1;
}
public static String tranform(int id){
if (id == 1){
return "One";
} else {
return "Two";
}
}
public void maybeEvil(String str){
System.out.println("Evil" + str);
}
}
- 添加信息
// take from define-source-and-sink-method-name.dl
.decl DefineSourceMethodName(simpleNmae: symbol)
DefineSourceMethodName("source").
.decl DefineSinkMethodName(simpleNmae: symbol)
DefineSinkMethodName("maybeEvil").
// take from definition-information.dl
// 确定 污点分析信息
#define INFO_FLOW_LABEL "UnSafeSerCheck"
#include "define-source-and-sink-method-name.dl"
TaintSourceMethod(?lable, ?method) :-
?lable = INFO_FLOW_LABEL,
DefineSourceMethodName(simpleMethodName),
Method_Modifier("public", ?method),
Method_SimpleName(?method, simpleMethodName),
!Method_Modifier("abstract", ?method).
LeakingSinkMethod(?lable, ?method) :-
?lable = INFO_FLOW_LABEL,
DefineSinkMethodName(simpleMethodName),
Method_Modifier("public", ?method),
Method_SimpleName(?method, simpleMethodName),
!Method_Modifier("abstract", ?method).
DOOP中的其他工具
- bytecode2jimple
- BytecodeDL/soot-fact-generator: generate facts from bytecode (source is https://github.com/plast-lab/doop-mirror/tree/master/generators)
- …
References:
- 🌟 指针分析工具 Doop 使用指南 | Jckling’s Blog
- Java和Android应用points-to analysis工具Doop的基本使用方法_souffle安装_蛐蛐蛐的博客-CSDN博客
- 使用doop识别最近commons text漏洞的污点信息流 – 先知社区 (aliyun.com)
Plast-lab 的 其他类似工作:
- LLVM 与 soufflé :plast-lab/cclyzer-souffle: CClyzer port to souffle lang (github.com)
- LLVM 与 datalog: plast-lab/cclyzer: A tool for analyzing LLVM bitcode using Datalog. (github.com)
- doop 底层框架: plast-lab/clue-common: Common functionality shared by the components of the Clyze unified analysis framework. (github.com)
- JNI 扫描器 : plast-lab/native-scanner: An analyzer of JNI code that matches native code information with Java code (github.com)
Appendix
Doop 中的污点传播规则(摘要使用)
- ParamToRetTaintTransferMethod
- 污点从函数参数转移到函数的返回值
- ParamToBaseTaintTransferMethod
- 污点从函数参数转移到当前的object
- BaseToParamTaintTransferMethod
- 污点为当前的object,且转移到函数参数上
- BaseToRetTaintTransferMethod
- 污点为当前的object, 且转移到函数返回值上
- MockBaseToRetTaintTransferMethod
- Mock 表示模拟出的 object
- MockParamToRetTaintTransferMethod
Object Mocking
Object Mocking 是什么,作用是什么,如何实现 Object Mocking?
摘录一下 discord 里的回答
Doop starts from the real "heap allocations" of the program (i.e. one allocation per new T() site). However, there are cases where an object is needed but there is no heap allocation available, so the analysis creates new "mock" values and passes them around. Examples are special objects simulated by logic (such as lambdas and proxies) or filling in pseudo-objects in entry points (as you witness open programs doing).
File souffle-logic/commonMacros.dl contains the macros that create such objects: MockHeapConsMacro, MockValueConsMacro. To create an object with such a macro, you place it in the head of your rule. For example this rule from dynamic-proxies.dl
MockValueConsMacro(?proxyObject, ?proxyClass),
ProxyClassInstance(?iface, ?invo, ?proxyObject) :-
java_lang_reflect_Proxy_newProxyInstance(?invo, _, _, _),
isInterfaceType(?iface),
ProxyClassOfInterface(?iface, ?proxyClass),
?proxyObject = cat(cat(cat(cat("<proxy object for interface ", ?iface), " at "), ?invo), ">").
creates ?proxyObject with type ?proxyClass. Note that the object is basically a unique string created with string concatenation (cat()) in the last line (objects in Doop logic are represented by strings).
This pattern also shows how you can track the rules that create mock objects: you start from the mock object ids you find in the results and search for text that matches these strings. In the case above, if you found in your results objects with ids of the form <proxy object for interface, searching for this text would lead you to the rule above.
归纳:
Mock 是的对没有 new allocation 的 对象进行建模模拟。
Reflection
反射问题
在doop中共有以下涉及到reflection的开关
== Reflection ==
--distinguish-reflection-only-string-constants Merge all string constants except those useful for reflection.
--distinguish-string-buffers-per-package Merges string buffer objects only on a per-package basis (default behavior for reflection-classic).
--light-reflection-glue Handle some shallow reflection patterns without full reflection support.
--reflection Enable logic for handling Java reflection.
--reflection-classic Enable (classic subset of) logic for handling Java reflection.
--reflection-dynamic-proxies Enable handling of the Java dynamic proxy API.
--reflection-high-soundness-mode Enable extra rules for more sound handling of reflection.
--reflection-invent-unknown-objects
--reflection-method-handles Reflection-based handling of the method handle APIs.
--reflection-refined-objects
--reflection-speculative-use-based-analysis
--reflection-substring-analysis Allows reasoning on what substrings may yield reflection objects.
--tamiflex <FILE> Use file with tamiflex data for reflection.
一般采用 --light-reflection-glue
和 --reflection-classic
,开了distinguish-reflection-only-string-constants
后的分析速率很低
Complete Usage
- 完整的usage (备份自查):
Starting a Gradle Daemon (subsequent builds will be faster)
> Task :run
usage: doop -i <INPUT> -a <NAME> [OPTION]...
== Configuration options ==
-a,--analysis <NAME> The name of the analysis. Valid values: 1-call-site-sensitive, 1-call-site-sensitive+heap,
1-object-1-type-sensitive+heap, 1-object-sensitive, 1-object-sensitive+heap, 1-type-sensitive,
1-type-sensitive+heap, 2-call-site-sensitive+2-heap, 2-call-site-sensitive+heap, 2-object-sensitive+2-heap,
2-object-sensitive+heap, 2-type-object-sensitive+2-heap, 2-type-object-sensitive+heap, 2-type-sensitive+heap,
3-object-sensitive+3-heap, 3-type-sensitive+2-heap, 3-type-sensitive+3-heap, adaptive-2-object-sensitive+heap,
basic-only, context-insensitive, context-insensitive-plus, context-insensitive-plusplus, data-flow,
dependency-analysis, fully-guided-context-sensitive, micro, partitioned-2-object-sensitive+heap,
selective-2-object-sensitive+heap, sound-may-point-to, sticky-2-object-sensitive, types-only, xtractor, ----- (LB
analyses) -----, 2-object-sensitive+heap-plus, adaptive-insens-2objH, adaptive2-insens-2objH, must-point-to, naive,
paddle-2-object-sensitive, paddle-2-object-sensitive+heap, partial-insens-s2objH, refA-2-call-site-sensitive+heap,
refA-2-object-sensitive+heap, refA-2-type-sensitive+heap, refB-2-call-site-sensitive+heap,
refB-2-object-sensitive+heap, refB-2-type-sensitive+heap, scc-2-object-sensitive+heap,
selective-2-type-sensitive+heap, selective_A-1-object-sensitive, selective_B-1-object-sensitive,
special-2-object-sensitive+heap, stutter-2-object-sensitive+heap, uniform-1-object-sensitive,
uniform-2-object-sensitive+heap, uniform-2-type-sensitive+heap
--android Force Android mode for code inputs that are not in .apk format.
--app-only Only analyze the application input(s), ignore libraries/platform.
--auto-app-regex-mode <MODE> When no app regex is given, either compute an app regex for the first input ('first') or for all inputs ('all').
--cfg Perform a CFG analysis.
--coarse-grained-allocation-sites Aggressively merge allocation sites for all regular object types, in lib and app alike.
--constant-folding Enable constant folding logic.
--cs-library Enable context-sensitive analysis for internal library objects.
--dacapo Load additional logic for DaCapo (2006) benchmarks properties.
--dacapo-bach Load additional logic for DaCapo (Bach) benchmarks properties.
--define-cpp-macro <MACRO> Define a C preprocessor macro that will be available in analysis logic.
--disable-merge-exceptions Do not merge exception objects.
--disable-points-to Disable (most) points-to analysis reasoning. This should only be combined with analyses that compensate (e.g.,
types-only).
--distinguish-all-string-buffers Avoids merging string buffer objects (not recommended).
--distinguish-all-string-constants Treat string constants as regular objects.
--dry-run Do a dry run of the analysis (generate facts and compile but don't run analysis logic).
--extra-logic <FILE> Include files with extra rules.
--featherweight-analysis Perform a featherweight analysis (global state and complex objects immutable).
--gen-opt-directives Generate additional relations for code optimization uses.
-h,--help <SECTION> Display help and exit. Valid values: all, configuration, data-flow, datalog-engine, entry-points, fact-generation,
heap-snapshots, information-flow, native-code, open-programs, python, reflection, server-logic, statistics, xtras
-i,--input-file <INPUT> The (application) input files of the analysis. Accepted formats: .jar, .war, .apk, .aar, maven-id
--id <ID> The analysis id. If omitted, it is automatically generated.
-L,--level <LOG_LEVEL> Set the log level: debug, info or error (default: info).
-l,--library-file <LIBRARY> The dependency/library files of the application. Accepted formats: .jar, .apk, .aar
--max-memory <MEMORY_SIZE> The maximum memory that the analysis can consume (does not include memory needed by fact generation). Example
values: 2m, 4g.
--no-merge-library-objects Disable the default policy of merging library (non-collection) objects of the same type per-method.
--no-merges No merges for string constants.
--no-standard-exports Do not export standard relations.
-p,--properties <PROPERTIES> The path to a properties file containing analysis options. This option can be mixed with any other and is processed
first.
--platform <PLATFORM> The platform on which to perform the analysis. For Android, the plaftorm suffix can either be 'stubs' (provided by
the Android SDK), 'fulljars' (a custom Android build), or 'apks' (custom Dalvik equivalent). Default: java_8. Valid
values: java_3, java_4, java_5, java_6, java_7, java_7_debug, java_8, java_8_debug, java_8_mini, java_9, java_10,
java_11, java_12, java_13, java_14, java_15, java_16, android_22_fulljars, android_25_fulljars, android_2_stubs,
android_3_stubs, android_4_stubs, android_5_stubs, android_6_stubs, android_7_stubs, android_8_stubs,
android_9_stubs, android_10_stubs, android_11_stubs, android_12_stubs, android_13_stubs, android_14_stubs,
android_15_stubs, android_16_stubs, android_17_stubs, android_18_stubs, android_19_stubs, android_20_stubs,
android_21_stubs, android_22_stubs, android_23_stubs, android_24_stubs, android_25_stubs, android_26_stubs,
android_27_stubs, android_28_stubs, android_29_stubs, android_25_apks, android_26_robolectric, python_2
--regex <EXPRESSION> A regex expression for the Java package names of the analyzed application.
--run-jphantom Run jphantom for non-existent referenced code.
--sanity Load additional logic for sanity checks.
--sarif Output SARIF results for specific relations.
--special-cs-methods <FILE> Use a file that specifies special context sensitivity for some methods.
--symbolic-reasoning Symbolic reasoning for expressions.
-t,--timeout <TIMEOUT> The analysis execution timeout in minutes (default: 90 minutes).
--use-local-java-platform <PATH> The path to the Java platform to use.
--user-defined-partitions <FILE> Use a file that specifies the partitions of the analyzed program.
-v,--version Display version and exit.
== Data flow ==
--data-flow-goto-lib Allow data-flow logic to go into library code using CHA.
--data-flow-only-lib Run data-flow logic only for library code.
== Datalog engine ==
--souffle-debug Enable profiling in the Souffle binary.
--souffle-force-recompile Force recompilation of Souffle logic.
--souffle-incremental-output Use the functor for incremental output in Souffle.
--souffle-jobs <NUMBER> Specify number of Souffle jobs to run (default: 4).
--souffle-live-profile Enable live profiling in the Souffle binary.
--souffle-mode <MODE> How to run Souffle: compile to binary, use interpreter, only translate to C++. Valid values: compiled, interpreted,
translated
--souffle-profile Enable profiling in the Souffle binary.
--souffle-provenance Call the provenance browser.
--souffle-use-functors Enable the use of user-defined functors in Souffle.
--use-analysis-binary <PATH> Use precompiled analysis binary (for Windows compatibility).
== Entry points ==
--discover-main-methods Discover main() methods.
--discover-tests Discover and treat test code (e.g. JUnit) as entry points.
--exclude-implicitly-reachable-code Don't make any method implicitly reachable.
--ignore-main-method If main class is not given explicitly, do not try to discover it from jar/filename info. Open-program analysis
variant may be triggered in this case.
--keep-spec <FILE> Give a 'keep' specification.
--main <MAIN> Specify the main class(es) separated by spaces.
== Fact generation ==
--also-resolve <CLASS> Force resolution of class(es) by Soot.
--cache The analysis will use the cached facts, if they exist.
--dont-cache-facts Don't cache generated facts.
--extract-more-strings Extract more string constants from the input code (may degrade analysis performance).
--fact-gen-cores <NUMBER> Number of cores to use for parallel fact generation.
--facts-only Only generate facts and exit.
--generate-artifacts-map Generate artifacts map.
--generate-jimple Generate Jimple/Shimple files along with .facts files.
--generate-tac Generate Three Address Code experimental representation, along with .facts files.
--input-id <ID> Import facts from dir with id ID and start the analysis. Application/library inputs are ignored.
--report-phantoms Report phantom methods/types during fact generation.
--thorough-fact-gen Attempt to resolve as many classes during fact generation (may take more time).
--unique-facts Eliminate redundancy from .facts files.
--wala-fact-gen Use WALA to generate the facts.
--Xfacts-subset <SUBSET> Produce facts only for a subset of the given classes. Valid values: PLATFORM, APP, APP_N_DEPS
--Xignore-factgen-errors Continue with analysis despite fact generation errors.
--Xsymlink-input-facts Use symbolic links instead of copying cached facts. Used with --cache or --input-id.
== Heap snapshots ==
--heapdl-dvpt Import dynamic var-points-to information.
--heapdl-file <HEAPDLS> Use dynamic information from memory dump, using HeapDL. Takes one or more files (`.hprof` format or stack traces).
--heapdl-nostrings Do not model string values uniquely in a memory dump.
--import-dynamic-facts <FACTS_FILE> Use dynamic information from file.
== Information flow ==
--information-flow <APPLICATION_PLATFORM> Load additional logic to perform information flow analysis. Valid values: alfresco, android, beans, minimal, spring,
webapps
--information-flow-extra-controls <CONTROLS> Load additional sensitive layout control from string triplets "id1,type1,parent_id1,...".
--information-flow-high-soundness Enter high soundness mode for information flow microbenchmarks.
== Native code ==
--native-code-backend <BACKEND> Use back-end to scan native code (portable built-in, system binutils, Radare2). Valid values: builtin, binutils,
radare
--only-precise-native-strings Skip strings without enclosing function information.
--scan-native-code Scan native code for specific patterns.
--simulate-native-returns Assume native methods return mock objects.
== Open programs ==
--open-programs <STRATEGY> Create analysis entry points and environment using various strategies (such as 'concrete-types' or 'jackee').
--open-programs-context-insensitive-entrypoints
--open-programs-heap-context-insensitive-entrypoints
== Python ==
--full-tensor-precision Full precision tensor shape analysis (not guaranteed to finish).
--single-file-analysis Flag to be passed to WALAs IR translator to produce IR that makes the analysis of a single script file easier.
--tensor-shape-analysis Enable tensor shape analysis for Python.
== Reflection ==
--distinguish-reflection-only-string-constants Merge all string constants except those useful for reflection.
--distinguish-string-buffers-per-package Merges string buffer objects only on a per-package basis (default behavior for reflection-classic).
--light-reflection-glue Handle some shallow reflection patterns without full reflection support.
--reflection Enable logic for handling Java reflection.
--reflection-classic Enable (classic subset of) logic for handling Java reflection.
--reflection-dynamic-proxies Enable handling of the Java dynamic proxy API.
--reflection-high-soundness-mode Enable extra rules for more sound handling of reflection.
--reflection-invent-unknown-objects
--reflection-method-handles Reflection-based handling of the method handle APIs.
--reflection-refined-objects
--reflection-speculative-use-based-analysis
--reflection-substring-analysis Allows reasoning on what substrings may yield reflection objects.
--tamiflex <FILE> Use file with tamiflex data for reflection.
== Server logic ==
--server-cha Run server queries related to CHA.
--server-logic Run server queries under addons/server-logic.
--server-logic-threshold <THRESHOLD> Threshold when reporting points-to information in server logic (per points-to set). default: 1000
== Statistics ==
--extra-metrics Run extra metrics logic under addons/statistics.
--stats <LEVEL> Set statistics collection logic. Valid values: none, default, full
== Xtras ==
--Xcontext-dependency-heuristic Run context dependency heuristics logic under addons/oracular.
--Xcontext-remover Run the context remover for reduced memory use (only available in context-insensitive analysis).
--Xdex Use custom front-end to generate facts for .apk inputs, using Soot for other inputs.
--Xextra-facts <FILE> Include files with extra facts.
--Xgenerics-pre Enable precise generics pre-analysis to infer content types for Collections and Maps.
--Xignore-wrong-staticness Ignore 'wrong static-ness' errors in Soot.
--Ximport-partitions <FILE> Specify the partitions.
--Xisolate-fact-generation Isolate invocations to the fact generator.
--Xlb Use legacy LB engine.
--Xlegacy-android-processing If true the analysis uses the legacy processor for Android resources.
--Xlegacy-soot-invocation If true, Soot will be invoked using a custom classloader (may use less memory, only supported on Java < 9).
--Xlow-mem Use less memory. Does not support all options.
--Xmodel-stdlib Model standard library APIs instead of analyzing their code.
--Xno-ssa Disable the default policy of using SSA transformation on input.
--Xoracular-heuristics Run sensitivity heuristics logic under addons/oracular.
--Xprecise-generics Precise handling for maps and collections.
--XR-out-dir <R_OUT_DIR> When linking .aar inputs, place generated R code in <R_OUT_DIR>.
--Xreflection-coloring Merge strings that will not conflict in reflection resolution.
--Xreflection-context-sensitivity Enable context-sensitive handling of reflection.
--Xscaler-pre Enable the analysis to be the pre-analysis of Scaler, and outputs the information required by Scaler.
--Xvia-ddlog Convert and run Souffle with DDlog.
--Xzipper <FILE> Use file with precision-critical methods selected by Zipper, these methods are analyzed context-sensitively.
--Xzipper-pre Enable the analysis to be the pre-analysis of Zipper, and outputs the information required by Zipper.
Use --help <SECTION> for more information, available sections: all, configuration, data-flow, datalog-engine, entry-points, fact-generation, heap-snapshots, information-flow,
native-code, open-programs, python, reflection, server-logic, statistics, xtras
# 形如 `--X...` 的选项是实验性选项,不一定支持所有分析。这些参数可能是一条 commit 增加的小功能,不建议使用,因为这些参数本身就只用于特定分析。
# Use --help <SECTION> for more information, available sections: all, configuration, data-flow, datalog-engine, entry-points, fact-generation, heap-snapshots, information-flow,
# native-code, open-programs, python, reflection, server-logic, statistics, xtras