补充之前没有太了解的一些 tricks,添加一下examples和description。
若是该文中没有提及的,推荐参考 官方Docs
还望见谅
总体上的语法:
Program | Soufflé • A Datalog Synthesis Tool for Static Analysis (souffle-lang.github.io)
语法铁路图
见以下讨论,完整铁路见附录
安利 LSP – Souffle Vscode 插件
Soufflé Language Server · souffle-lang/souffle · Discussion #2350 (github.com)
Type
Algebraic Data Types
Zhihu 讨论
https://www.zhihu.com/question/24460419
10 分钟 了解 ADT -> https://github.com/niltok/magic-in-ten-mins/blob/main/doc/ADT.md
Algebraic Data Types are extended versions of records, which permit several type shapes for an ADT. Each shape in the ADT we call a branch. A branch is defined similar to a record, however, it has a unique branch identifier.
基本语法如下
.type <new-adt> = <branch-id> { <name_1>: <type_1>, ..., <name_k>: <type_k> } | ...
where <new-adt>
is the name of the ADT. Each branch has a unique identifier <branch-id>
followed by the fields in that branch. The fields of a branch are defined in curly braces.
Below is given an example for an ADT definition,
.type T = A { x: number }
| B { x: symbol }
.type Nat = S {x : Nat}
| Zero {}
.type Expression = Number { x : number }
| Variable { v : symbol}
| Add {e_1 : Expression, e_2 :Expression}
| Minus {e_1 : Expression, e_2 : Expression}
| Mult {e_1 : Expression, e_2 : Expression}
| Divide {e_1 : Expression, e_2 : Expression}
.type Tree = Empty {}
| Node {t1: Tree, val: unsigned, t2: Tree}
Note that nil
cannot be used for ADTs and termination branches for recursive definition must be expressed explicitly by the programmer.
nil 无法在ADT中使用。
In rules, branches can be constructed using the dollar sign followed by the branch identifier, i.e., $branch_constructor(args...)
. The following example demonstrate how branch constructors can be used in rules:
.type Expression = Number { x : number }
| Variable { v : symbol}
| Add {e_1 : Expression, e_2 :Expression}
| Imaginary {}
.decl A(x:Expression)
A($Number(10)).
A($Add($Number(10),$Imaginary())).
A($Add($Number(10), $Variable("x"))).
A($Number(x+1)) :- A($Number(x)), x < 20.
.output A
Note that a constructor can only be used for an ADT once. For example, the following type definition is illegal:
.type A = Number { x:number }
| Symbol { v:symbol }
.type B = Number { x:number } // error: reuse of branch identifier
| Symbol { v:symbol } // error: reuse of branch identifier
Since the branch identifier Number
and Symbol
are reused in the ADT B
. One way to work around this is to use Components:
.type OuterType = Number { n: number }
| Symbol { s: symbol }
.comp MyComponent {
.type InnerType = Number { n: number }
| Symbol { s: symbol }
}
.init component = MyComponent
.decl Relation(x:component.InnerType)
Relation($component.Symbol("x")).
Type Conversion
Soufflé permits type conversion of an argument using a functor notation. The type of argument <expr>
is converted to a new type <new-type>
using the type cast as(<expr>, <new-type>)
. The correctness of the cast is left to the user.
cast grammar:
as(<expr>, <new-type>)
例如
.type Variable <: symbol
.type StackIndex <: symbol
.type VariableOrStackIndex = Variable | StackIndex
.decl A(a: VariableOrStackIndex)
.decl B(a: Variable)
// 将 expr a 转换为 Variable 类型
B(as(a, Variable)) :- A(a).
Subsumption
disjunction
disjunction 中 atom (pre_atom) 与 atom (next_atom) 为同一个 rules,但 relations 不同,若满足disjunction中的条件,pre_atom 将被 next_atom 替换,从而达到 将原 relation 删除的效果 (disjunction)。
disjunction 的语法规则见附录,可以理解为 涉及同类关系的 谓词条件,符合disjunction的atom 将被删除。
rule ::= atom '<=' atom ':-' disjunction '.' query_plan?
Example
.decl test(s: symbol, code: number)
test("a", 1).
test("b", 2).
test("a", 3).
test(t_s, t_c) <= test(n_s, n_c) :-
n_s = t_s,
t_c >= n_c.
.output test
Output:
---------------
test
s code
===============
a 1
b 2
===============
issue 中 作者给的 example
.type Node <: symbol
.type Path = [
rest : Path,
node : Node
]
.decl E(x:Node, y:Node)
E("s","1").
E("1","2").
E("2","2").
E("2","3").
.decl S(p:Path, l:number)
S([nil,"s"],1).
.decl P(p:Path) btree_delete
P([nil, "s"]).
P([[p,u],v]), S([[p,u],v],l+1) :-
P([p,u]),
E(u,v),
S([p,u],l).
.output P,S
P([p1,u]) <= P([p2, u]) :-
S(p1, l1),
S(p2, l2),
l2 <= l1.
example: https://github.com/souffle-lang/souffle/discussions/2050
type Domain = Number { x : number } | Top {}
.decl value(var: symbol, val: Domain) btree_delete
.decl decl(var: symbol, val: number)
.decl add(res: symbol, var1: symbol, var2: symbol)
// Deal with constants
value(Var,$Number(Val)) :- decl(Var,Val).
value(Var,$Number(Result)) :- add(Var, Var1, Var2), value(Var1, $Number(Val1)), value(Var2, $Number(Val2)), Result = Val1 + Val2.
// Top always wins
value(Var,_) <= value(Var, $Top()) :- true.
// Multiple values implies Top
value(Var,$Top()) :- value(Var,$Number(V1)), value(Var,$Number(V2)), V1 != V2.
// Top propagates
value(Var,$Top()) :- add(Var, Var1, _Var2), value(Var1, $Top()).
value(Var,$Top()) :- add(Var, _Var1, Var2), value(Var2, $Top()).
.output value
decl("x", 1).
decl("y", 2).
add("a", "x", "y").
add("b", "x", "x").
add("c", "x", "y").
add("c", "y", "x").
add("d", "x", "x").
add("d", "y", "y").
Relations
souffle 中也提供了多样的指定==Representation==的Relations, 主要是不同数据类型上的性能差异。
Representation:
- default
- B-tree
- others
- Brie
- Eqrel
- Nullaries
Representation of Relations
B-tree
.decl A(x:number, y:symbol) btree
btree_delete
The relation qualifier "btree_delete" will automatically be set for default relations if there is a subsumptive clause for the relation.
.decl A(x:number)
A(1).
A(2).
A(x1) <= A(x2) :- x1 <= x2.
.output A
Brie
定义如下
.decl A(x:number, y:symbol) brie
适用于稠密数据 (可以理解为数据被压缩)
// dense data
// Consider a 3-arity relation containing tuples
// 0, 0 is the common prefix
(0,0,0), (0,0,1), (0,0,2)
// following data is not dense, and has different prefix
(0,0,0), (1,2,3), (4,5,6)
Equivalence
Equivalence
has three properties:
- reflexivity 自反性
- symmetry 对称性
- transitivity 传递性
.decl eqrel_fast(x : number, y : number) eqrel
eqrel_fast(a,b) :- rel1(a), rel2(b).
// 可以自动推出
eqrel_fast(a, a) :- eqrel_fast(a, _).
eqrel_fast(a, b) :- eqrel_fast(b, a).
eqrel_fast(a, c) :- eqrel_fast(a, b), eqrel_fast(b, c).
Nullaries
可以用于布尔运算。
.decl judge()
.decl data(n: number)
data(1).
data(2).
data(3).
.decl isjudge()
isjudge() :-
judge().
judge(),
data(5) :-
data(3).
data(10) :-
judge().
.output data, judge
Inlining Relations
inline 是指在内存中存储计算结果,而是将relation直接 inline 到指定 relation。
.decl natural_number(x:number)
natural_number(0).
natural_number(x+1) :- natural_number(x), x < 10000.
.decl natural_pairs(x:number, y:number) inline
natural_pairs(x,y) :- natural_number(x), natural_number(y).
.decl query(x:number)
query(x) :- natural_pairs(x,y), x < 5, y < x.
转换后结果
.decl natural_number(x:number)
natural_number(0).
natural_number(x+1) :- natural_number(x), x < 10000.
.decl query(x:number)
query(x) :- natural_number(x), natural_number(y), x < 5, y < x.
Trickers
Plan
手册看的,我有点迷糊😧
Rules may have qualifiers, which are used to change the execution behavior / semantics. Qualifiers are used to set a query plan for a rule. The qualifer .plan
let’s the programmer chose a query plan for a rule.
简而言之,就是查询的优先级,涉及到执行顺序和优化部分
默认存在的.plan 0(1,2,3,...,n), n 为实际的约束条件
- example
A(x,y) :-
B(y,z,w),
C(z,x),
D(u,v,y).
.plan 1:(2, 1, 3)
同时存在默认的查询计划 .plan 0:(1, 2, 3)
最终实际上的查询计划是两个查询结果的合集,即 .plan 0
.plan 1
。
这时候就有小伙伴要问了,这个查询计划有什么用吗?
souffle上,我没怎么见过讨论,doop 的 discussion 中,doop 作者给我的理解是,和 谓词/逻辑 之间的循环依赖有关,souffle 的整体上的分析都是讲究一个不动点分析。
讨论的原文链接: https://discord.com/channels/759681783309795359/818589881261096971/835126720482115597 , 下文进行简要描述。
假设上面example, A 依赖 B, C, D,若当B,C 也分别依赖A时,分别存在循环依赖,就需要添加一个 .plan 1:(2, 1, 3)
进行查询,最终的plan为 .plan 0:(1, 2, 3), 1:(2, 1, 3)
,也表面在plan 0中 B 谓词是一个与A循环依赖的谓词,在plan 1 中 C是一个与A循环依赖的谓词。
个人对优化的理解是,D 不受到A的约束,所在在递归查询时,可以放入递归的最里层;B, C 受到 A 的约束,若一开始就放到递归的最里层,运行速度将受到很大影响。合理的设计是,将这些需要循环递归查询的谓词,放到循环查询的外层。
DAG 图构造
https://github.com/souffle-lang/souffle/issues/2126
使用 btree_delete
.type Node <: symbol
.type Path = [
rest : Path,
node : Node
]
.decl E(x:Node, y:Node)
E("s","1").
E("1","2").
E("2","2").
E("2","3").
.decl S(p:Path, l:number)
S([nil,"s"],1).
.decl P(p:Path) btree_delete
P([nil, "s"]).
P([[p,u],v]), S([[p,u],v],l+1) :-
P([p,u]),
E(u,v),
S([p,u],l).
.output P,S
P([p1,u]) <= P([p2, u]) :-
S(p1, l1),
S(p2, l2),
l2 <= l1.
==可行路径检测,慢的出奇,不太推荐!==
.type Node <: symbol
.type Path = [
rest: Path,
node: Node
]
.type Method = Node
.type InformationLabel = symbol
.type Lable = symbol
.decl fromSourceCallGraph(?fromMethod: Method, ?toMethod: Method)
.decl LeakingSinkMethod(?lable:InformationLabel, ?method:Method)
.input fromSourceCallGraph(filename="fromSourceCallGraph.csv")
.input LeakingSinkMethod(filename="LeakingSinkMethod.csv")
.decl SourceMethod(method: Method)
.decl SinkMethod(method: Method)
SinkMethod(method) :-
LeakingSinkMethod(_, method).
SourceMethod("<sun.reflect.annotation.AnnotationInvocationHandler: void readObject(java.io.ObjectInputStream)>").
.decl isOpaqueMethod(method: Method)
isOpaqueMethod(method) :-
SinkMethod(method),
match("<javax[.].*", method).
isOpaqueMethod(method) :-
fromSourceCallGraph(_, method),
match("<javax[.].*", method).
isOpaqueMethod(method) :-
fromSourceCallGraph(method, _),
match("<javax[.].*", method).
.decl PotentialVulnGraphStep(current_path: Path, step: number)
.decl PotentialVulnPath(path:Path) btree_delete
PotentialVulnGraphStep(path, step) :-
path = [nil, method],
SourceMethod(method),
step = 1.
PotentialVulnPath(path) :-
path = [nil, method],
SourceMethod(method).
PotentialVulnPath(current_path),
PotentialVulnGraphStep(current_path, new_step) :-
PotentialVulnPath(path),
path = [oldFromMethod, oldToMethod],
oldToMethod = fromMethod,
fromSourceCallGraph(fromMethod, toMethod),
current_path = [oldFromMethod, toMethod],
PotentialVulnGraphStep(path, step),
new_step = step + 1,
new_step < 10.
PotentialVulnPath(remove) <= PotentialVulnPath(new) :-
remove = [remove_path, node],
new = [new_path, node],
PotentialVulnGraphStep(remove_path, remove_step),
PotentialVulnGraphStep(new_path, new_step),
new_step <= remove_step.
.output PotentialVulnPath, PotentialVulnGraphStep
printsize
,打印 atom 数量。
.decl data(n: number)
data(1).
.printsize data
limitsize
官方Docs,忘记有没有提及了,下面是从 issue 找到的
limitsize 将会限制迭代次数,只针对迭代规则。
.decl A(n: number)
A(1).
A(x + 1) :-
A (x), x < 1000.
.printsize A
.output A
.limitsize A(n=10)
执行显示,只迭代10次
➜ souffle limitsize.dl -D-
A 10
---------------
A
n
===============
1
2
3
4
5
6
7
8
9
10
===============
Appendix
soufflé language grammar:
//
// From tree-sitter-souffle/src/grammar.json
//
//
// EBNF to generate railroad diagram at https://www.bottlecaps.de/rr/ui
//
program ::= ( block_comment | line_comment | pragma | functor_decl | component_decl | component_init | directive | _rule | fact | relation_decl | type_decl | legacy_type_decl | preprocessor ) *
preprocessor ::= '#line' ( [0-9]+ ) ( string )
//block_comment ::= '/* [^*]*\*+([^/*][^*]*\*+)* /'
line_comment ::= '//' [^#x0A]* '\n'
pragma ::= '.pragma' ( string ) ( string ) ?
functor_decl ::= '.functor' ( ident ) ( '(' ( ( ( _type_name ) | ( attribute ) ) ? ( ',' ( ( _type_name ) | ( attribute ) ) ) * ) ')' ) ':' ( _type_name ) 'stateful' ?
component_decl ::= '.comp' ( component_type ) ( ':' ( ( component_type ) ( ',' ( component_type ) ) * ) ) ? '{' ( ( block_comment | line_comment | component_decl | component_init | directive | fact | relation_decl | _rule | type_decl | preprocessor | ( '.override' ident ) ) * ) '}'
component_type ::= ( ident ) ( '<' ( ( ident ) ( ',' ( ident ) ) * ) '>' ) ?
component_init ::= '.init' ( ident ) '=' ( component_type )
directive ::= ( _directive_qualifier ) ( ( qualified_name ) ( ',' ( qualified_name ) ) * ) ( '(' ( ( ( ident ) '=' ( _directive_value ) ) ? ( ',' ( ( ident ) '=' ( _directive_value ) ) ) * ) ')' ) ?
_directive_qualifier ::= '.input' | '.limitsize' | '.output' | '.printsize'
_directive_value ::= string | ident | number
bool ::= 'true' | 'false'
_rule ::= monotonic_rule | subsumptive_rule
monotonic_rule ::= ( ( atom ( ',' atom ) * ) ) ':-' ( disjunction ) '.' ( query_plan? )
subsumptive_rule ::= ( atom ) '<=' ( atom ) ':-' ( disjunction ) '.' ( query_plan? )
query_plan ::= '.plan' ( ( ( [0-9]+ ) ':' ( '(' ( ( [0-9]+ ) ? ( ',' ( [0-9]+ ) ) * ) ')' ) ) ( ',' ( ( [0-9]+ ) ':' ( '(' ( ( [0-9]+ ) ? ( ',' ( [0-9]+ ) ) * ) ')' ) ) ) * )
disjunction ::= ( ( conjunction ( ';' conjunction ) * ) )
negation ::= ( '!' )
conjunction ::= ( negation* ( atom | _constraint | ( '(' disjunction ')' ) ) ) ( ',' ( negation* ( atom | _constraint | ( '(' disjunction ')' ) ) ) ) *
_constraint ::= bool | comparison | match | contains
match ::= 'match' ( '(' ( ( _argument ) ? ( ',' ( _argument ) ) * ) ')' )
contains ::= 'contains' ( '(' ( ( _argument ) ? ( ',' ( _argument ) ) * ) ')' )
comparison ::= ( _argument ) ( ( '=' | '!=' | '<=' | '>=' | '<' | '>' ) ) ( _argument )
fact ::= ( atom ) '.'
atom ::= ( qualified_name ) ( '(' ( ( _argument ) ? ( ',' ( _argument ) ) * ) ')' )
_argument ::= anonymous | constant | variable | nil | record_constructor | adt_constructor | ( '(' _argument ')' ) | as | functor_call | aggregator | unary_op | binary_op
adt_constructor ::= '$' ( qualified_name ) ( '(' ( ( _argument ) ? ( ',' ( _argument ) ) * ) ')' ) ?
record_constructor ::= ( '[' ( _argument? ( ',' _argument ) * ) ']' )
constant ::= ipv4 | number | string
//ipv4 ::= "[0-9]{0,3}\.[0-9]{0,3}\.[0-9]{0,3}\.[0-9]"/*{0,3}*/
//string ::= '([^"]|\\")*'
number ::= float | integer | unsigned
integer ::= decimal | hex | binary
decimal ::= ( '-'?[0-9]+ )
unsigned ::= [0-9]+u
hex ::= '0x'([0-9]|[A-F]|[a-f])+
binary ::= '0b'[0-1]+
float ::= ( '-'?[0-9]+'.'[0-9]+ )
variable ::= ident
nil ::= 'nil'
anonymous ::= '_'
as ::= 'as' '(' ( _argument ) ',' ( _type_name ) ')'
functor_call ::= ( ( user_defined_functor | intrinsic_functor ) ) ( '(' ( ( _argument ) ? ( ',' ( _argument ) ) * ) ')' )
user_defined_functor ::= '@' ( ident )
intrinsic_functor ::= ( ( 'acos' | 'acosh' | 'asin' | 'asinh' | 'atan' | 'atanh' | 'cos' | 'cosh' | 'exp' | 'log' | 'sin' | 'sinh' | 'tan' | 'tanh' | 'autoinc' | 'cat' | 'max' | 'min' | 'ord' | 'strlen' | 'substr' | 'to_float' | 'to_number' | 'to_string' | 'to_unsigned' ) )
aggregator ::= ( ( ( ( ( ( 'max' | 'mean' | 'min' | 'sum' ) ) ( _argument ) ) | 'count' ) ':' ( ( atom | ( '{' disjunction '}' ) ) ) ) | range )
range ::= 'range' ( '(' ( ( _argument ) ',' ( _argument ) ( ',' ( _argument ) ) ? ) ')' )
_unary_operator ::= ( ( 'bnot' | 'lnot' | '-' | '~' | '!' ) )
unary_op ::= ( ( ( _unary_operator ) ( _argument ) ) )
binary_op ::= ( ( ( _argument ) ( '+' ) ( _argument ) ) ) | ( ( ( _argument ) ( '-' ) ( _argument ) ) ) | ( ( ( _argument ) ( '*' ) ( _argument ) ) ) | ( ( ( _argument ) ( '/' ) ( _argument ) ) ) | ( ( ( _argument ) ( '%' ) ( _argument ) ) ) | ( ( ( _argument ) ( '^' ) ( _argument ) ) ) | ( ( ( _argument ) ( 'land' ) ( _argument ) ) ) | ( ( ( _argument ) ( 'lor' ) ( _argument ) ) ) | ( ( ( _argument ) ( 'lxor' ) ( _argument ) ) ) | ( ( ( _argument ) ( 'band' ) ( _argument ) ) ) | ( ( ( _argument ) ( 'bor' ) ( _argument ) ) ) | ( ( ( _argument ) ( 'bxor' ) ( _argument ) ) ) | ( ( ( _argument ) ( 'bshl' ) ( _argument ) ) ) | ( ( ( _argument ) ( 'bshr' ) ( _argument ) ) ) | ( ( ( _argument ) ( 'bshru' ) ( _argument ) ) ) | ( ( ( _argument ) ( '&' ) ( _argument ) ) ) | ( ( ( _argument ) ( '|' ) ( _argument ) ) ) | ( ( ( _argument ) ( '&&' ) ( _argument ) ) ) | ( ( ( _argument ) ( '||' ) ( _argument ) ) ) | ( ( ( _argument ) ( '**' ) ( _argument ) ) ) | ( ( ( _argument ) ( '^^' ) ( _argument ) ) ) | ( ( ( _argument ) ( '<<' ) ( _argument ) ) ) | ( ( ( _argument ) ( '>>' ) ( _argument ) ) ) | ( ( ( _argument ) ( '>>>' ) ( _argument ) ) )
relation_decl ::= '.decl' ( ( ident? ( ',' ident ) * ) ) ( '(' ( ( attribute ) ? ( ',' ( attribute ) ) * ) ')' ) ( _relation_qualifier* ) ( choice_domain ) ?
_relation_qualifier ::= 'brie' | 'btree' | 'btree_delete' | 'eqrel' | 'inline' | 'magic' | 'no_inline' | 'no_magic' | 'override' | 'overridable' | 'input' | 'output' | 'printsize'
choice_domain ::= 'choice-domain' ( ( ident | ( '(' ( ident ( ',' ident ) * ) ')' ) ) ( ',' ( ident | ( '(' ( ident ( ',' ident ) * ) ')' ) ) ) * )
legacy_type_decl ::= ( ( '.number_type' | '.symbol_type' ) ) ( _type_name )
type_decl ::= '.type' ( ( type_synonym | subtype | type_union | type_record | adt ) | legacy_bare_type_decl )
legacy_bare_type_decl ::= ( ident )
subtype ::= ( ident ) '<:' ( _type_name )
type_synonym ::= ( ident ) '=' ( _type_name )
type_union ::= ( ( ( ident ) '=' ( ( _type_name ( '|' _type_name ) + ) ) ) )
type_record ::= ( ( ( ident ) '=' ( '[' ( ( attribute ) ? ( ',' ( attribute ) ) * ) ']' ) ) )
adt ::= ( ( ( ident ) '=' ( ( _adt_branch ( '|' _adt_branch ) * ) ) ) )
_adt_branch ::= ( ident ) ( '{' ( ( attribute ) ? ( ',' ( attribute ) ) * ) '}' )
_type_name ::= primitive_type | qualified_name
primitive_type ::= 'number' | 'symbol' | 'unsigned' | 'float'
attribute ::= ( ident ) ':' ( _type_name )
qualified_name ::= ( ( ident ( '.' ident ) * ) )
ident ::= ('_'|'?'|[A-Z]|[a-z])(([A-Z]|[a-z])|[0-9]|'_'|'?')*