Doop Notes – Java binary static analysis tool

Doop Notes

Written from June 2023.
Doop Output:

Installation and Usage


cat .gradle/ 
  • basic usage:
./doop --help    

> Task :run
usage: doop -i <INPUT> -a <NAME> [OPTION]...

Run an analysis on a program (given as a combination of code inputs, code libraries, and a platform). 

== Basic options ==
-a,--analysis <NAME>          The analysis to use. Examples: context-insensitive, 1-call-site-sensitive, micro
-i,--input-file <INPUT>       The (application) input files of the analysis. Accepted formats: .jar, .war, .apk, .aar, maven-id
   --id <ID>                  The analysis id. If omitted, it is automatically generated.


  • 自用
-app-only 只分析输入app, 不分析依赖和框架

--id test 任务id, 用于out/${id}

-a context-insensitive 非上下文敏感

--generate-jimple 生成 jimple IR 文件

--extra-logic $BASE_DIR/rules/output.dl 加载自定义文件

Analysis Structrue

Doop 执行流程大致可以分为三步:

  1. 使用 soot 生成 jimple 文件
    • 使用 --generate-jimple 参数可以输出 jimple 文件,在 output/<ID>/database/jimple 文件夹下
  2. 将 jimple 文件转换为 datalog 引擎的输入事实(.facts)
  3. 使用 souffle 引擎执行选定的分析,将关系输出为 .csv,即分析结果

在使用doop进行分析时,会采用-a(即 --analysis)来指定分析的类型。
官方Docs中 yanniss / doop / docs / — Bitbucket,以-a micro 为例,会调用 souffle-logic/analyses/micro/analysis.dl 其中的规则。

To examine the other analyses of the Doop framework, follow this structure:

  • Their input schema can be found in souffle-logic/facts/flow-insensitivite-schema.dl.
    • emm,没有这个文件,类似的是souffle-logic/facts/flow-insensitive-facts.dlsouffle-logic/facts/flow-sensitive-schema.dl
  • The rules for an analysis A, can be found in souffle-logic/analyses/$A/analysis.dl.

此外对于采用什么soufflé中dl文件与cmd option 之前的关系,可以查看groovy代码

src/main/groovy/org/clyze/doop/core/SouffleAnalysis.groovyrun 可以认为是 doop 利用 souffle 分析的起点,这里选择 runAnalysisAndProduceStats 和 produceStats 进行分析,实际上还有一堆 选项 来加载。

void runAnalysisAndProduceStats(File analysis) {

	void mainAnalysis(File analysis) {

		// Check the open programs argument before calling the preprocessor.
		String openProgramsProfile = null
		String openProgramsRules = options.OPEN_PROGRAMS.value
		// step 1:
		// 根据 openProgramsRules,设置openProgramsProfile
		if (openProgramsRules) {
			// openProgramsRules 为 doop 的 --open-programs 参数
			openProgramsProfile = "${Doop.souffleLogicPath}/addons/open-programs/rules-${openProgramsRules}.dl"
			if (!(new File(openProgramsProfile)).exists())
				throw DoopErrorCodeException.error35("Open program rules profile does not exist: " + openProgramsProfile)

		// step 2:
		// 使用基本的分析:
		// 1. basic/basic.dl
		// 2. ${Doop.souffleAnalysesPath}/${}/analysis.dl, 为 doop 的 -a | --analysis
		// ${Doop.souffleAnalysesPath} 为 souffle-logic/analyses/

		cpp.includeAtEnd("$analysis", "${Doop.souffleLogicPath}/basic/basic.dl")
		cpp.includeAtEnd("$analysis", "${Doop.souffleAnalysesPath}/${getBaseName(}/analysis.dl")

		// step 3:
		// 用于 information-analysis(taint-analysis)
		// 若 options.ANALYSIS.value (即 -a) 为 data-flow, 则采用 addons/information-flow/rules-data-flow.dl
		// 按照论文的解释,是因为doop中的流敏感分析,会利用SSA将原来的流非敏感转换为流敏感的分析。
		// 反之,采用 addons/information-flow/rules.dl
		// 且无论怎么样, 都会加载  ${options.INFORMATION_FLOW.value}-sources-and-sinks.dl
		// INFORMATION_FLOW 参数 为 doop 的 --information-flow-XXXX
		if (options.INFORMATION_FLOW.value) {
			String infoflowDir = "${Doop.souffleLogicPath}/addons/information-flow"
			if (options.ANALYSIS.value == 'data-flow')
				cpp.includeAtEnd("$analysis", "${infoflowDir}/rules-data-flow.dl")
				cpp.includeAtEnd("$analysis", "${infoflowDir}/rules.dl")
			cpp.includeAtEnd("$analysis", "${infoflowDir}/${options.INFORMATION_FLOW.value}${INFORMATION_FLOW_SUFFIX}.dl")
		// 根据 step 1, 加载 openProgramsProfile
		if (openProgramsProfile) {
			log.debug "Using open-programs rules: ${openProgramsRules}"
			cpp.includeAtEnd("$analysis", openProgramsProfile)

		// step 4:
		// sanity check
		// 规则文件: addons/sanity.dl
		if (options.SANITY.value) {
			cpp.includeAtEnd("$analysis", "${Doop.souffleLogicPath}/addons/sanity.dl")
				log.warn("WARNING: The sanity check is not fully compatible with --" +
				log.warn("WARNING: The sanity check is not fully compatible with --" +
			if (options.NO_MERGES.value) {
				log.warn("WARNING: The sanity check is not fully compatible with --" +

		// step 5:
		// 添加 自定义的规则文件(列表)
		if (options.EXTRA_LOGIC.value) {
			Collection<String> extras = options.EXTRA_LOGIC.value as List<String>
			for (String extraFile : extras) {
				File extraLogic = new File(extraFile)
				if (!extraLogic.exists())
					throw new RuntimeException("Extra logic file does not exist: ${extraLogic}")
				String extraLogicPath = extraLogic.canonicalPath
				// Safety: check file extension to avoid using this mechanism
				// to read files from anywhere in the system.
				if (extraLogicPath.endsWith('.dl')) { "Adding extra logic file ${extraLogicPath}"
					cpp.includeAtEnd("${analysis}", extraLogicPath)
				} else
					log.warn "WARNING: Ignoring file not ending in .dl: ${extraLogicPath}"

	void produceStats(File analysis) {
		// 该部分主要是处理统计信息
		def statsPath = "${Doop.souffleLogicPath}/addons/statistics"

		// options.X_EXTRA_METRICS 为 --extra-metrics
		// 该参数表示将加载 addones/statistics/metrics.dl
		if (options.X_EXTRA_METRICS.value) {
			cpp.includeAtEnd("$analysis", "${statsPath}/metrics.dl")

		// 实验性选项: 
		// doop 参数: Xstats-none
		// Do not load logic for collecting statistics.
		if (options.X_STATS_NONE.value) return

		// 加载 analyses/$name/statistics.dl
		// $name 为 -a --analysis
		def specialStats = new File("${Doop.souffleAnalysesPath}/${name}/statistics.dl")
		if (specialStats.exists()) {
			cpp.includeAtEnd("$analysis", specialStats.toString())

		cpp.includeAtEnd("$analysis", "${statsPath}/statistics-simple.dl")

		if (options.X_STATS_FULL.value || options.X_STATS_DEFAULT.value) {
			cpp.includeAtEnd("$analysis", "${statsPath}/statistics.dl")

Running custom logic


  • 在分析时,直接采用自定义的规则
  • 在分析结束后,采用自定义的规则进行分析

Docs中介绍了这两种方式,来获取 Example.test() 的 variable-points-to information

Running custom logic inside the analysis(推荐)

  • extra.dl
.decl Temp(v: Var, h: Value)

Temp(v, h) :-
  // 需要注意的是,有些关系是隶属于特定组件的
  mainAnalysis.VarPointsTo(_, h, _, v),
  Var_DeclaringMethod(v, "<Example: void test(int)>").

.output Temp


./doop -a context-insensitive -i docs/doop-101-examples/Example.jar --stats none --extra-logic extra.dl


Running custom logic after the analysis

  • Temp.dl 文件
.decl Var_DeclaringMethod(v: symbol, m: symbol)
.input Var_DeclaringMethod(IO="file", filename="Var-DeclaringMethod.facts", delimiter="\t")

.decl VarPointsTo(c1: symbol, h: symbol, c2: symbol, v: symbol)
.input VarPointsTo(IO="file", filename="VarPointsTo.csv", delimiter="\t")

.decl Temp(v: symbol, h: symbol)
Temp(v, h) :-
  VarPointsTo(_, h, _, v),
  Var_DeclaringMethod(v, "<Example: void test(int)>").

.output Temp

修改规则无需从头重新分析,但需要在规则中 .decl.input csv 。

之后的使用,与正常的souffle 使用 无异。

souffle —f facts-dir -D output-dir temp.dl -j 4

souffle – rules

  • All code methods are in Method.facts (and thus Method_DeclaringType which is just a projection of Method that only records the class/interface that contains the method). Some of these methods may not be reachable (= not called by anything).
  • Relation ApplicationMethod contains only the methods that are part of the application (not the Java platform). Again, these methods may be reachable or not.
  • Relation Reachable contains the reachable methods. These methods are reachable because something invoked them, so they must appear in AnyCallGraphEdge. Almost always they also appear in CallGraphEdge (which also records the contexts if you ran a context-sensitive analysis).

具体的rules位于目录文件: doop/souffle-logic,下面为部分文件的阅读笔记。


确定 main 函数,以及引入其他 规则。

#include "../commonMacros.dl"

.comp Basic {

#include "exceptions.dl"
#include "finalization.dl"
#include "method-lookup.dl"
#include "method-resolution.dl"
#include "type-hierarchy.dl"
#include "native-strings.dl"

.decl AnyMainMethodDeclaration(?method:Method, ?type:ClassType)
AnyMainMethodDeclaration(?method, ?type) :-
   Method_DeclaringType(?method, ?type),
   Method_SimpleName(?method, "main"),
   Method_Descriptor(?method, "void(java.lang.String[])"),
   Method_Modifier("public", ?method),
   Method_Modifier("static", ?method).

.decl MainMethodDeclaration(?method:Method)

MainMethodDeclaration(?method) :-
// DISCOVER_MAIN_METHODS 在 doop中的具体设置 --discover-main-methods 
  // Consider every main() in the application a "main method".
  // ApplicationClass 为 soot 中设置的 表示被分析应用程序的类
  // 用于表示该type类,是出现在待分析的应用中
  // 如 jar 中含有 A class,则在 ApplicationClass.facts 中会有 "A" 值
  // Use input facts for "main" methods.
  // doop 中的具体指令 --main <MAIN>
  // 用于指定需要分析的main函数对应的class
  // 同样也是需要soot生成的 MainClass.facts
  // type 表示从哪个class中的main函数作为分析,作为 "main method"
  AnyMainMethodDeclaration(?method, ?type).

#ifdef ANDROID
// Android apps start by running ActivityThread.main() or other
// internal entry points.
// TODO: this rule should only fire when analyzing an APK, not an AAR.
MainMethodDeclaration(?method) :-
  ( ?method = "< void main(java.lang.String[])>"
  ; ?method = "< void main(java.lang.String[])>"),
#endif // ANDROID


// soufflé component 语法:
// 定义 component后,需要 initialize 
.init basic = Basic

#include "../addons/cfg-analysis/analysis.dl"


见 DeSerSniffer 1.6.0 及之前的版本,后面的版本提出了一个BIFS(Bottom-up Information flow Summary)算法来解决多态导致的路径爆炸问题。

DOOP之魂 开始,请继续受苦,赞美太阳, Ẏ!


使用 doop 分析框架 实现 污点分析、自定义 entry ,最终目标是实现利用 doop 实现 反序列化漏洞的静态分析,之后根据这些信息 进行 property-based fuzzing。

  • [x] 实现 Entry 自定义,不然分析空间太大了,不利于分析
  • [x] 考虑多态问题
  • [x] 污点分析 – 剪枝
    • [x] 为什么 P/Taint 是正交分析,那该如何Merge信息,以实现高效的污点分析
    • [x] 如何 设置 Transform 、 Sink 、Sanitize
  • [x] 实现 小样本实验
    • [x] 基本实现 simple sample
    • 各种Java 语言特性的测试样本
      • [x] 反射
  • [ ] 尝试符号执行: 见 Datalog Based Symbolic Program Reasoning for Java

Entry Points

class Test {

    public void noEntry() {

    public void calledByEntry() {
        System.out.println("called by entry method");

    public void calledByMain() {
        System.out.println("called by main method");

    public static void main(String[] args) {
        Test t = new Test();


Manifest-Version: 1.0
Main-Class: Test

📦 jar

 mv Test.class classes/
 jar cvfm example.jar manifest.txt -C classes/ .

doop analysis


# doop setup

# INFO_FLOW="--information-flow minimal"
# JIMPLE=" --generate-jimple"

OPEN_PROGRAM="--open-programs concrete-types"

# souffle
SOUFFLE_JOBS="--souffle-jobs 16"
SOUFFLE_MODE="--souffle-mode interpreted"

# extra logic
EXTRA_LOGIC="--extra-logic $BASE_DIR/rules/output.dl"

# Remember `-app-only` must be in front!
# On the contrary, no facts will be generated by doop.
# Strange Error!


CMD="${DOOP_HOME}/doop -a $ANALYSIS -i ${INPUT} --id ${ID} ${EXTRA_ARG}"
echo "doop: $CMD"
eval "${DOOP_HOME}/doop -a $ANALYSIS -i ${INPUT} --id ${ID} ${EXTRA_ARG}"

在生成的gen_xxx.dl文件中,component 和 instance,和其他 predicate 大致关系:

# ComponentName [Relation] StartLine EndLine
# - predicate Name
# - instance StatementLine
- Basic 2036 2698
    - .init basic = Basic 2700

- AbstractContextSensitivity<Configuration> 2750, 6056
    - 谓词: ApplicationEntryPoint
    - .init configuration = Configuration
        - configuration = ContextInsensitiveConfiguration

- BasicContextSensitivity 继承 AbstractContextSensitivity 6073 6099 
	- .init mainAnalysis = BasicContextSensitivity 6102

- AbstractConfiguration 6104 6120

- ContextInsensitiveConfiguration 继承 AbstractConfiguration 6128 6158

结合输出结果和生成的gen_xxx.dl文件(建议阅读该文件),原 concrete-types 规则中,会将认为以下条件的 method 都可以作为 entry points 的 methods 和 classes

PossibleOpenProgramEntryPoint(?method) :-
  Method_DeclaringType(?method, ?class),
  Method_Modifier("public", ?method),
  !Method_Modifier("abstract", ?method),
  !ClassModifier("private", ?class).

OpenProgramEntryPoint(?method) :-

ContextForOpenProgramEntryPoint(?calleeCtx, ?method) :-
  ThisVar(?method, ?receiver),
  Method_DeclaringType(?method, ?class),
  MockObjectForType(?value, ?class),
  mainAnalysis.configuration.ContextResponse(?callerCtx, ?immutablehctx, ?invo, ?value, ?method, ?calleeCtx),
  mainAnalysis.configuration.ContextRequest(?callerCtx, ?immutablehctx, ?invo, ?value, ?method, 1),

// 之后 ReachableContext 会影响 Reachable, VarPointsTo
// 通过 configuration 中的 ContextResponse 和 ContextRequest 传播到 CallGraphEdge
mainAnalysis.ReachableContext(?ctx, ?method) :-
  ContextForOpenProgramEntryPoint(?ctx, ?method).

// ContextRequest 在ContextInsensitiveConfiguration 中,
// 可直接推导出 ContextResponse 和 CallGraphEdge
mainAnalysis.configuration.ContextRequest(?callerCtx, ?immutablehctx, ?invo, ?value, ?method, 1) :-
  MockObjectForType(?value, ?class),
  Method_DeclaringType(?method, ?class),
  MockInvocationForEntryPoint(?value, ?method, ?invo),

MockInvocationForEntryPoint(?value, ?method, cat(cat(cat("<mock-invo ", ?value), ?method), ">")),
isInstruction(cat(cat(cat("<mock-invo ", ?value), ?method), ">")), isMethodInvocation(cat(cat(cat("<mock-invo ", ?value), ?method), ">")) :-
  Method_DeclaringType(?method, ?class),
  MockObjectForType(?value, ?class).

MockObjectForType(?value, ?staticType) :-
  MockObject(?value, ?class),
  StaticToActualType(?class, ?staticType).

MockObject(?mockObj, ?class),
mainAnalysis.Value_isMock(?mockObj), isValue(?mockObj), mainAnalysis.Value_Type(?mockObj, ?class), mainAnalysis.Value_DeclaringType(?mockObj, "java.lang.Object"),
mainAnalysis.Value_DeclaringType(?mockObj, ?class) :-
  // mockObj 等于 "${?class}::MockObject"
  ?mockObj = cat(?class, "::MockObject").

// 将所有 public 非 abstract method 对应的 class 进行 mock
ObjToMock(?class) :-
  // 所有 public 非 abstract method
  Method_DeclaringType(?method, ?class),
  !ClassModifier("abstract", ?class).

若不采用--open-programs concrete-types,可以模拟 concrete-types 中的实现,只约定 特定的入口点函数,以避免资源浪费,比如直接修改 addons/open-programs/entry-points.dl 中的规则。

// fork form Serhybrid
PossibleOpenProgramEntryPoint(?method) :-
  Method_DeclaringType(?method, ?class),
  ?class = "instrumenter.trampolines.Trampolines",
  Method_Modifier("public", ?method),
  Method_SimpleName(?method, "entry"),
  !Method_Modifier("abstract", ?method).

CHA 算法


 * Encodes a CHA callgraph, effectively
.decl ResolveInvocation(?type:Type, ?invocation:MethodInvocation, ?tomethod:Method)
//.output ResolveInvocation

// Auxiliary
.decl VirtualMethodInvocation_BaseType(?invocation:MethodInvocation, ?type:Type)
VirtualMethodInvocation_BaseType(?invocation, ?basetype) :-
    VirtualMethodInvocation_Base(?invocation, ?base),
    Var_Type(?base, ?basetype).

// 处理子类的逻辑
ResolveInvocation(?type, ?invocation, ?tomethod) :-
    VirtualMethodInvocation_SimpleName(?invocation, ?simplename),
    VirtualMethodInvocation_Descriptor(?invocation, ?descriptor),
    VirtualMethodInvocation_BaseType(?invocation, ?basetype),
    SubtypeOf(?type, ?basetype),
    MethodLookup(?simplename, ?descriptor, ?type, ?tomethod).

// 处理父类的逻辑
ResolveInvocation(?basetype, ?invocation, ?tomethod) :-
    SuperMethodInvocation_SimpleName(?invocation, ?simplename),
    SuperMethodInvocation_Descriptor(?invocation, ?descriptor),
    SuperMethodInvocation_Base(?invocation, ?base),
    Var_Type(?base, ?basetype),
    DirectSuperclass(?basetype, ?supertype),
    MethodLookup(?simplename, ?descriptor, ?supertype, ?tomethod).

// 将分析结果进行推断
configuration.ContextRequest(?callerCtx, ?hctx, ?invocation, ?value, ?tomethod, 1) :-
  OptVirtualMethodInvocationBase(?invocation, ?base),
  // 必须有实际的 VarPointsTo,才允许 ContextRequest
  VarPointsTo(?hctx, ?value, ?callerCtx, ?base),
  Value_Type(?value, ?valuetype),
  basic.ResolveInvocation(?valuetype, ?invocation, ?tomethod).


设置虚拟假设的 configuration.ContextRequest ,该 configuration.ContextRequest 在源代码中并不存在,是为了检测反序列化漏洞添加的。

Taint – Analysis

参考阅读 :

直接根据DOOP的Readme来看,对应的设置参数是--information-flow,且相关规则文件的加载逻辑见上文中的src/main/groovy/org/clyze/doop/core/SouffleAnalysis.groovy 代码注释。


此外,Doop 中一般需要自己自定义Sink和Source,自带的Sink和Source有点少。

--information-flow minimal

--information-flow minimal下,会按序加载以下文件

  • souffle-logic/addons/information-flow/rules.dl // 支持 反序列的 污点转移 以及 基本的 污点转移 逻辑
    • souffle-logic/addons/information-flow/macros.dl // 定义一些宏
    • souffle-logic/addons/information-flow/declarations.dl // 声明信息(谓词)
    • souffle-logic/addons/information-flow/delta.dl // isInformationLabel
    • souffle-logic/addons/information-flow/core.dl
  • souffle-logic/addons/information-flow/minimal-sources-and-sinks.dl // 一些 定制信息


// take from declarations.dl

/** Taint specifications that may come from the user. */
.decl TaintSpec(?type:symbol, ?tag:symbol, ?id:symbol)
.input TaintSpec(filename="TaintSpec.facts")

// take from core.dl

// User-provided taint sources (methods).
TaintSourceMethod(?label, ?method) :-
  TaintSpec("TAINT_SOURCE", ?label, ?method),
// User-provided taint sinks (methods).
LeakingSinkMethod(?label, ?method) :-
  TaintSpec("TAINT_SINK", ?label, ?method),

minimal-sources-and-sinks.dl 中 存在默认的配置信息

#include "common-transfer-methods.dl"


TaintSourceMethod("default", "< java.lang.String readLine()>").
TaintSourceMethod("default", "< int read(char[],int,int)>").
// The latter is not a great taint source (since it returns ints) but it's good for minimal testing

LeakingSinkMethodArg("default", 0, "< void println(java.lang.String)>").


// requirement: 需要该method 存在application Method 中
CallTaintingMethod(?label, ?ctx, ?invocation) :-
  TaintSourceMethod(?label, ?tomethod),
  MethodInvocationInContext(?ctx, ?invocation, ?tomethod),
  Instruction_Method(?invocation, ?inmethod),

// 见 macos.dl 中的 TaintedValueIntroduced 定义
// #define 是个好方法
TaintedValue, SourceFromTaintedValue, LabelFromSource,
mainAnalysis.VarPointsTo, mainAnalysis.Value_isMock, mainAnalysis.Value_Type, mainAnalysis.Value_DeclaringTyp

#define ValueIdMacro(id, type, breadcrumb) \
  cat(cat(cat(cat(id, "::: "), type), "::: "), breadcrumb)

// id 为 invocation instruction
#define TaintedValueIntroduced(declaringType, id, type, label, value) \
  mainAnalysis_MockValueConsMacro(value, type), \
  TaintedValue(value), \
  SourceFromTaintedValue(value, id), \
  LabelFromSource(id, label), \
  mainAnalysis.Value_DeclaringType(value, declaringType)

TaintedValueIntroduced(?declaringType, ?invo, ?type, ?label, ValueIdMacro(?invo, ?type, DEFAULT_BREADCRUMB)),
mainAnalysis.VarPointsTo(?hctx, ValueIdMacro(?invo, ?type, DEFAULT_BREADCRUMB), ?ctx, ?to) :-
  CallTaintingMethod(?label, ?ctx, ?invo),
  // ?invo: ?to = vcall
  TypeForReturnValue(?type, ?to, ?invo),
  Instruction_Method(?invo, ?method),
  Method_DeclaringType(?method, ?declaringType).

// TaintedValue -> TaintedVarPointsTo
TaintedVarPointsTo(?value, ?ctx, ?var) :-
  mainAnalysis.VarPointsTo(_, ?value, ?ctx, ?var).

// 对 label 的 作用好奇
LeakingTaintedInformation(?sourceLabel, ?destLabel, ?ctx, ?invocation, ?source) :-
  SourceFromTaintedValue(?value, ?source),
  LabelFromSource(?source, ?sourceLabel),
  TaintedVarPointsTo(?value, ?ctx, ?var),
  LeakingSinkVariable(?destLabel, ?invocation, ?ctx, ?var).

// 用于统计信息的,但也没有具体用到label
LeakingSinkMethodArg(?label, ?index, ?method) :-
  LeakingSinkMethod(?label, ?method),
  FormalParam(?index, ?method, _).

// In case method has some arguments, assume arg variable 
LeakingSinkVariable(?label, ?invocation, ?ctx, ?var) :-
  LeakingSinkMethodArg(?label, ?index, ?tomethod),
  MethodInvocationInContextInApplication(?ctx, ?invocation, ?tomethod),
  ActualParam(?index, ?invocation, ?var).

// In case method has no arguments, assume base variable.
LeakingSinkVariable(?label, ?invocation, ?ctx, ?var) :-
   LeakingSinkMethod(?label, ?tomethod),
   !FormalParam(_, ?tomethod, _),
   MethodInvocationInContextInApplication(?ctx, ?invocation, ?tomethod),
   MethodInvocation_Base(?invocation, ?var).

最后当找到结果时,会在 LeakingTaintedInformationLeakingTaintedInformationVars 这两个表格中进行输出。



设置自定义的 Source 和 Sink 函数信息,还是采用soufflé – logic 的方式,facts的方式书写太麻烦了。

class Test {

    String name = "TEST";

    public void calledByMain() {
        System.out.println("called by main method");

    public static void main(String[] args) {
        Test t = new Test();
        // t.noEntry();

        // source
        int sourceId = Taint.source();

        // Transform
        String strId = t.entry(sourceId);

        // Sink
        Taint testTaint = new Taint();

    public String entry(int str) {
        String idStr = Taint.tranform(str);
        return idStr;


class Taint {

    String name = "Taint";

    public static int source(){
        return 1;

    public static String tranform(int id){
        if (id == 1){
            return "One";
        } else {
            return "Two";

    public void maybeEvil(String str){
        System.out.println("Evil" + str);

  • 添加信息
// take from define-source-and-sink-method-name.dl
.decl DefineSourceMethodName(simpleNmae: symbol)


.decl DefineSinkMethodName(simpleNmae: symbol)


// take from definition-information.dl
// 确定 污点分析信息

#define INFO_FLOW_LABEL "UnSafeSerCheck"

#include "define-source-and-sink-method-name.dl"

TaintSourceMethod(?lable, ?method) :-
	?lable = INFO_FLOW_LABEL,
	Method_Modifier("public", ?method),
	Method_SimpleName(?method, simpleMethodName),
	!Method_Modifier("abstract", ?method).

LeakingSinkMethod(?lable, ?method) :-
	?lable = INFO_FLOW_LABEL,
	Method_Modifier("public", ?method),
	Method_SimpleName(?method, simpleMethodName),
	!Method_Modifier("abstract", ?method).



Plast-lab 的 其他类似工作:


Doop 中的污点传播规则(摘要使用)

  • ParamToRetTaintTransferMethod
    • 污点从函数参数转移到函数的返回值
  • ParamToBaseTaintTransferMethod
    • 污点从函数参数转移到当前的object
  • BaseToParamTaintTransferMethod
    • 污点为当前的object,且转移到函数参数上
  • BaseToRetTaintTransferMethod
    • 污点为当前的object, 且转移到函数返回值上
  • MockBaseToRetTaintTransferMethod
    • Mock 表示模拟出的 object
  • MockParamToRetTaintTransferMethod

Object Mocking

Object Mocking 是什么,作用是什么,如何实现 Object Mocking?

摘录一下 discord 里的回答

Doop starts from the real "heap allocations" of the program (i.e. one allocation per new T() site). However, there are cases where an object is needed but there is no heap allocation available, so the analysis creates new "mock" values and passes them around. Examples are special objects simulated by logic (such as lambdas and proxies) or filling in pseudo-objects in entry points (as you witness open programs doing).
File souffle-logic/commonMacros.dl contains the macros that create such objects: MockHeapConsMacro, MockValueConsMacro. To create an object with such a macro, you place it in the head of your rule. For example this rule from dynamic-proxies.dl

MockValueConsMacro(?proxyObject, ?proxyClass),
ProxyClassInstance(?iface, ?invo, ?proxyObject) :-
   java_lang_reflect_Proxy_newProxyInstance(?invo, _, _, _),
   ProxyClassOfInterface(?iface, ?proxyClass),
   ?proxyObject = cat(cat(cat(cat("<proxy object for interface ", ?iface), " at "), ?invo), ">").

creates ?proxyObject with type ?proxyClass. Note that the object is basically a unique string created with string concatenation (cat()) in the last line (objects in Doop logic are represented by strings).

This pattern also shows how you can track the rules that create mock objects: you start from the mock object ids you find in the results and search for text that matches these strings. In the case above, if you found in your results objects with ids of the form <proxy object for interface, searching for this text would lead you to the rule above.

Mock 是的对没有 new allocation 的 对象进行建模模拟。




== Reflection ==
    --distinguish-reflection-only-string-constants         Merge all string constants except those useful for reflection.
    --distinguish-string-buffers-per-package               Merges string buffer objects only on a per-package basis (default behavior for reflection-classic).
    --light-reflection-glue                                Handle some shallow reflection patterns without full reflection support.
    --reflection                                           Enable logic for handling Java reflection.
    --reflection-classic                                   Enable (classic subset of) logic for handling Java reflection.
    --reflection-dynamic-proxies                           Enable handling of the Java dynamic proxy API.
    --reflection-high-soundness-mode                       Enable extra rules for more sound handling of reflection.
    --reflection-method-handles                            Reflection-based handling of the method handle APIs.
    --reflection-substring-analysis                        Allows reasoning on what substrings may yield reflection objects.
    --tamiflex <FILE>                                      Use file with tamiflex data for reflection.

一般采用 --light-reflection-glue--reflection-classic,开了distinguish-reflection-only-string-constants 后的分析速率很低

Complete Usage

  • 完整的usage (备份自查):
Starting a Gradle Daemon (subsequent builds will be faster)

> Task :run
usage: doop -i <INPUT> -a <NAME> [OPTION]...

== Configuration options ==
 -a,--analysis <NAME>                                      The name of the analysis. Valid values: 1-call-site-sensitive, 1-call-site-sensitive+heap,
                                                           1-object-1-type-sensitive+heap, 1-object-sensitive, 1-object-sensitive+heap, 1-type-sensitive,
                                                           1-type-sensitive+heap, 2-call-site-sensitive+2-heap, 2-call-site-sensitive+heap, 2-object-sensitive+2-heap,
                                                           2-object-sensitive+heap, 2-type-object-sensitive+2-heap, 2-type-object-sensitive+heap, 2-type-sensitive+heap,
                                                           3-object-sensitive+3-heap, 3-type-sensitive+2-heap, 3-type-sensitive+3-heap, adaptive-2-object-sensitive+heap,
                                                           basic-only, context-insensitive, context-insensitive-plus, context-insensitive-plusplus, data-flow,
                                                           dependency-analysis, fully-guided-context-sensitive, micro, partitioned-2-object-sensitive+heap,
                                                           selective-2-object-sensitive+heap, sound-may-point-to, sticky-2-object-sensitive, types-only, xtractor, ----- (LB
                                                           analyses) -----, 2-object-sensitive+heap-plus, adaptive-insens-2objH, adaptive2-insens-2objH, must-point-to, naive,
                                                           paddle-2-object-sensitive, paddle-2-object-sensitive+heap, partial-insens-s2objH, refA-2-call-site-sensitive+heap,
                                                           refA-2-object-sensitive+heap, refA-2-type-sensitive+heap, refB-2-call-site-sensitive+heap,
                                                           refB-2-object-sensitive+heap, refB-2-type-sensitive+heap, scc-2-object-sensitive+heap,
                                                           selective-2-type-sensitive+heap, selective_A-1-object-sensitive, selective_B-1-object-sensitive,
                                                           special-2-object-sensitive+heap, stutter-2-object-sensitive+heap, uniform-1-object-sensitive,
                                                           uniform-2-object-sensitive+heap, uniform-2-type-sensitive+heap
    --android                                              Force Android mode for code inputs that are not in .apk format.
    --app-only                                             Only analyze the application input(s), ignore libraries/platform.
    --auto-app-regex-mode <MODE>                           When no app regex is given, either compute an app regex for the first input ('first') or for all inputs ('all').
    --cfg                                                  Perform a CFG analysis.
    --coarse-grained-allocation-sites                      Aggressively merge allocation sites for all regular object types, in lib and app alike.
    --constant-folding                                     Enable constant folding logic.
    --cs-library                                           Enable context-sensitive analysis for internal library objects.
    --dacapo                                               Load additional logic for DaCapo (2006) benchmarks properties.
    --dacapo-bach                                          Load additional logic for DaCapo (Bach) benchmarks properties.
    --define-cpp-macro <MACRO>                             Define a C preprocessor macro that will be available in analysis logic.
    --disable-merge-exceptions                             Do not merge exception objects.
    --disable-points-to                                    Disable (most) points-to analysis reasoning. This should only be combined with analyses that compensate (e.g.,
    --distinguish-all-string-buffers                       Avoids merging string buffer objects (not recommended).
    --distinguish-all-string-constants                     Treat string constants as regular objects.
    --dry-run                                              Do a dry run of the analysis (generate facts and compile but don't run analysis logic).
    --extra-logic <FILE>                                   Include files with extra rules.
    --featherweight-analysis                               Perform a featherweight analysis (global state and complex objects immutable).
    --gen-opt-directives                                   Generate additional relations for code optimization uses.
 -h,--help <SECTION>                                       Display help and exit. Valid values: all, configuration, data-flow, datalog-engine, entry-points, fact-generation,
                                                           heap-snapshots, information-flow, native-code, open-programs, python, reflection, server-logic, statistics, xtras
 -i,--input-file <INPUT>                                   The (application) input files of the analysis. Accepted formats: .jar, .war, .apk, .aar, maven-id
    --id <ID>                                              The analysis id. If omitted, it is automatically generated.
 -L,--level <LOG_LEVEL>                                    Set the log level: debug, info or error (default: info).
 -l,--library-file <LIBRARY>                               The dependency/library files of the application. Accepted formats: .jar, .apk, .aar
    --max-memory <MEMORY_SIZE>                             The maximum memory that the analysis can consume (does not include memory needed by fact generation). Example
                                                           values: 2m, 4g.
    --no-merge-library-objects                             Disable the default policy of merging library (non-collection) objects of the same type per-method.
    --no-merges                                            No merges for string constants.
    --no-standard-exports                                  Do not export standard relations.
 -p,--properties <PROPERTIES>                              The path to a properties file containing analysis options. This option can be mixed with any other and is processed
    --platform <PLATFORM>                                  The platform on which to perform the analysis. For Android, the plaftorm suffix can either be 'stubs' (provided by
                                                           the Android SDK), 'fulljars' (a custom Android build), or 'apks' (custom Dalvik equivalent). Default: java_8. Valid
                                                           values: java_3, java_4, java_5, java_6, java_7, java_7_debug, java_8, java_8_debug, java_8_mini, java_9, java_10,
                                                           java_11, java_12, java_13, java_14, java_15, java_16, android_22_fulljars, android_25_fulljars, android_2_stubs,
                                                           android_3_stubs, android_4_stubs, android_5_stubs, android_6_stubs, android_7_stubs, android_8_stubs,
                                                           android_9_stubs, android_10_stubs, android_11_stubs, android_12_stubs, android_13_stubs, android_14_stubs,
                                                           android_15_stubs, android_16_stubs, android_17_stubs, android_18_stubs, android_19_stubs, android_20_stubs,
                                                           android_21_stubs, android_22_stubs, android_23_stubs, android_24_stubs, android_25_stubs, android_26_stubs,
                                                           android_27_stubs, android_28_stubs, android_29_stubs, android_25_apks, android_26_robolectric, python_2
    --regex <EXPRESSION>                                   A regex expression for the Java package names of the analyzed application.
    --run-jphantom                                         Run jphantom for non-existent referenced code.
    --sanity                                               Load additional logic for sanity checks.
    --sarif                                                Output SARIF results for specific relations.
    --special-cs-methods <FILE>                            Use a file that specifies special context sensitivity for some methods.
    --symbolic-reasoning                                   Symbolic reasoning for expressions.
 -t,--timeout <TIMEOUT>                                    The analysis execution timeout in minutes (default: 90 minutes).
    --use-local-java-platform <PATH>                       The path to the Java platform to use.
    --user-defined-partitions <FILE>                       Use a file that specifies the partitions of the analyzed program.
 -v,--version                                              Display version and exit.

== Data flow ==
    --data-flow-goto-lib                                   Allow data-flow logic to go into library code using CHA.
    --data-flow-only-lib                                   Run data-flow logic only for library code.

== Datalog engine ==
    --souffle-debug                                        Enable profiling in the Souffle binary.
    --souffle-force-recompile                              Force recompilation of Souffle logic.
    --souffle-incremental-output                           Use the functor for incremental output in Souffle.
    --souffle-jobs <NUMBER>                                Specify number of Souffle jobs to run (default: 4).
    --souffle-live-profile                                 Enable live profiling in the Souffle binary.
    --souffle-mode <MODE>                                  How to run Souffle: compile to binary, use interpreter, only translate to C++. Valid values: compiled, interpreted,
    --souffle-profile                                      Enable profiling in the Souffle binary.
    --souffle-provenance                                   Call the provenance browser.
    --souffle-use-functors                                 Enable the use of user-defined functors in Souffle.
    --use-analysis-binary <PATH>                           Use precompiled analysis binary (for Windows compatibility).

== Entry points ==
    --discover-main-methods                                Discover main() methods.
    --discover-tests                                       Discover and treat test code (e.g. JUnit) as entry points.
    --exclude-implicitly-reachable-code                    Don't make any method implicitly reachable.
    --ignore-main-method                                   If main class is not given explicitly, do not try to discover it from jar/filename info. Open-program analysis
                                                           variant may be triggered in this case.
    --keep-spec <FILE>                                     Give a 'keep' specification.
    --main <MAIN>                                          Specify the main class(es) separated by spaces.

== Fact generation ==
    --also-resolve <CLASS>                                 Force resolution of class(es) by Soot.
    --cache                                                The analysis will use the cached facts, if they exist.
    --dont-cache-facts                                     Don't cache generated facts.
    --extract-more-strings                                 Extract more string constants from the input code (may degrade analysis performance).
    --fact-gen-cores <NUMBER>                              Number of cores to use for parallel fact generation.
    --facts-only                                           Only generate facts and exit.
    --generate-artifacts-map                               Generate artifacts map.
    --generate-jimple                                      Generate Jimple/Shimple files along with .facts files.
    --generate-tac                                         Generate Three Address Code experimental representation, along with .facts files.
    --input-id <ID>                                        Import facts from dir with id ID and start the analysis. Application/library inputs are ignored.
    --report-phantoms                                      Report phantom methods/types during fact generation.
    --thorough-fact-gen                                    Attempt to resolve as many classes during fact generation (may take more time).
    --unique-facts                                         Eliminate redundancy from .facts files.
    --wala-fact-gen                                        Use WALA to generate the facts.
    --Xfacts-subset <SUBSET>                               Produce facts only for a subset of the given classes. Valid values: PLATFORM, APP, APP_N_DEPS
    --Xignore-factgen-errors                               Continue with analysis despite fact generation errors.
    --Xsymlink-input-facts                                 Use symbolic links instead of copying cached facts. Used with --cache or --input-id.

== Heap snapshots ==
    --heapdl-dvpt                                          Import dynamic var-points-to information.
    --heapdl-file <HEAPDLS>                                Use dynamic information from memory dump, using HeapDL. Takes one or more files (`.hprof` format or stack traces).
    --heapdl-nostrings                                     Do not model string values uniquely in a memory dump.
    --import-dynamic-facts <FACTS_FILE>                    Use dynamic information from file.

== Information flow ==
    --information-flow <APPLICATION_PLATFORM>              Load additional logic to perform information flow analysis. Valid values: alfresco, android, beans, minimal, spring,
    --information-flow-extra-controls <CONTROLS>           Load additional sensitive layout control from string triplets "id1,type1,parent_id1,...".
    --information-flow-high-soundness                      Enter high soundness mode for information flow microbenchmarks.

== Native code ==
    --native-code-backend <BACKEND>                        Use back-end to scan native code (portable built-in, system binutils, Radare2). Valid values: builtin, binutils,
    --only-precise-native-strings                          Skip strings without enclosing function information.
    --scan-native-code                                     Scan native code for specific patterns.
    --simulate-native-returns                              Assume native methods return mock objects.

== Open programs ==
    --open-programs <STRATEGY>                             Create analysis entry points and environment using various strategies (such as 'concrete-types' or 'jackee').

== Python ==
    --full-tensor-precision                                Full precision tensor shape analysis (not guaranteed to finish).
    --single-file-analysis                                 Flag to be passed to WALAs IR translator to produce IR that makes the analysis of a single script file easier.
    --tensor-shape-analysis                                Enable tensor shape analysis for Python.

== Reflection ==
    --distinguish-reflection-only-string-constants         Merge all string constants except those useful for reflection.
    --distinguish-string-buffers-per-package               Merges string buffer objects only on a per-package basis (default behavior for reflection-classic).
    --light-reflection-glue                                Handle some shallow reflection patterns without full reflection support.
    --reflection                                           Enable logic for handling Java reflection.
    --reflection-classic                                   Enable (classic subset of) logic for handling Java reflection.
    --reflection-dynamic-proxies                           Enable handling of the Java dynamic proxy API.
    --reflection-high-soundness-mode                       Enable extra rules for more sound handling of reflection.
    --reflection-method-handles                            Reflection-based handling of the method handle APIs.
    --reflection-substring-analysis                        Allows reasoning on what substrings may yield reflection objects.
    --tamiflex <FILE>                                      Use file with tamiflex data for reflection.

== Server logic ==
    --server-cha                                           Run server queries related to CHA.
    --server-logic                                         Run server queries under addons/server-logic.
    --server-logic-threshold <THRESHOLD>                   Threshold when reporting points-to information in server logic (per points-to set). default: 1000

== Statistics ==
    --extra-metrics                                        Run extra metrics logic under addons/statistics.
    --stats <LEVEL>                                        Set statistics collection logic. Valid values: none, default, full

== Xtras ==
    --Xcontext-dependency-heuristic                        Run context dependency heuristics logic under addons/oracular.
    --Xcontext-remover                                     Run the context remover for reduced memory use (only available in context-insensitive analysis).
    --Xdex                                                 Use custom front-end to generate facts for .apk inputs, using Soot for other inputs.
    --Xextra-facts <FILE>                                  Include files with extra facts.
    --Xgenerics-pre                                        Enable precise generics pre-analysis to infer content types for Collections and Maps.
    --Xignore-wrong-staticness                             Ignore 'wrong static-ness' errors in Soot.
    --Ximport-partitions <FILE>                            Specify the partitions.
    --Xisolate-fact-generation                             Isolate invocations to the fact generator.
    --Xlb                                                  Use legacy LB engine.
    --Xlegacy-android-processing                           If true the analysis uses the legacy processor for Android resources.
    --Xlegacy-soot-invocation                              If true, Soot will be invoked using a custom classloader (may use less memory, only supported on Java < 9).
    --Xlow-mem                                             Use less memory. Does not support all options.
    --Xmodel-stdlib                                        Model standard library APIs instead of analyzing their code.
    --Xno-ssa                                              Disable the default policy of using SSA transformation on input.
    --Xoracular-heuristics                                 Run sensitivity heuristics logic under addons/oracular.
    --Xprecise-generics                                    Precise handling for maps and collections.
    --XR-out-dir <R_OUT_DIR>                               When linking .aar inputs, place generated R code in <R_OUT_DIR>.
    --Xreflection-coloring                                 Merge strings that will not conflict in reflection resolution.
    --Xreflection-context-sensitivity                      Enable context-sensitive handling of reflection.
    --Xscaler-pre                                          Enable the analysis to be the pre-analysis of Scaler, and outputs the information required by Scaler.
    --Xvia-ddlog                                           Convert and run Souffle with DDlog.
    --Xzipper <FILE>                                       Use file with precision-critical methods selected by Zipper, these methods are analyzed context-sensitively.
    --Xzipper-pre                                          Enable the analysis to be the pre-analysis of Zipper, and outputs the information required by Zipper.

Use --help <SECTION> for more information, available sections: all, configuration, data-flow, datalog-engine, entry-points, fact-generation, heap-snapshots, information-flow, 
native-code, open-programs, python, reflection, server-logic, statistics, xtras

# 形如 `--X...` 的选项是实验性选项,不一定支持所有分析。这些参数可能是一条 commit 增加的小功能,不建议使用,因为这些参数本身就只用于特定分析。

# Use --help <SECTION> for more information, available sections: all, configuration, data-flow, datalog-engine, entry-points, fact-generation, heap-snapshots, information-flow, 
# native-code, open-programs, python, reflection, server-logic, statistics, xtras 

