Decompilation Pipeline
At the core of Ghidra's decompilation transformation are Action
and Rule
objects. An Action
represents a transform or analysis across a whole function, whereas a Rule
is used for matching and rewriting of small sequences of P-Code opcodes. Both primitives are chained together in order to transform the low-level P-Code output of the lifter into high-level P-Code that is then "rendered" as decompiled code. At the time of writing, all actions and rules inside of Ghidra are defined in ruleaction.cc
and coreaction.cc
, with their declarations being split over several header files. The most important definitions for understanding the decompilation flow can be found towards the end of coreaction.cc
: the definition of buildDefaultGroups
and universalAction
.
Instantiating Actions and Action Groups
The universal action contains all possible actions and rules and specifies the order of their execution. The following code is an excerpt of the definition function:
// ...
act = new ActionRestartGroup(Action::rule_onceperfunc,"universal",1);
registerAction(universalname,act);
act->addAction( new ActionStart("base"));
act->addAction( new ActionConstbase("base"));
act->addAction( new ActionNormalizeSetup("normalanalysis"));
act->addAction( new ActionDefaultParams("base"));
// act->addAction( new ActionParamShiftStart("paramshift") );
act->addAction( new ActionExtraPopSetup("base",stackspace) );
act->addAction( new ActionPrototypeTypes("protorecovery"));
act->addAction( new ActionFuncLink("protorecovery") );
act->addAction( new ActionFuncLinkOutOnly("noproto") );
{
actfullloop = new ActionGroup(Action::rule_repeatapply,"fullloop");
{
actmainloop = new ActionGroup(Action::rule_repeatapply,"mainloop");
actmainloop->addAction( new ActionUnreachable("base") );
actmainloop->addAction( new ActionVarnodeProps("base") );
actmainloop->addAction( new ActionHeritage("base") );
actmainloop->addAction( new ActionParamDouble("protorecovery") );
actmainloop->addAction( new ActionSegmentize("base"));
actmainloop->addAction( new ActionInternalStorage("base") );
// ...
This excerpt shows the creation of several Action
objects, as well as action containers, which allow grouping actions into a sequence. The code snippets contains two of these: ActionRestartGroup
and ActionGroup
. The ActionRestartGroup
is unique here, as it will execute its contained actions from the beginning if a certain flag is set. Because this flag is set per function, there can only be one such group in the compilation pipeline and it is therefore used to mark the root of the universal action. Besides that, ActionGroup
and ActionRestartGroup
are both Action
objects themselves, meaning that groups can also be added as actions into other groups. Together with the flags that can be specified for actions (in the excerpt you can see e.g. Action::rule_onceperfunc
and Action::rule_repeatapply
) this can be used to create loops of related actions that are applied repeatedly until the action no longer triggers a change.
Instantiating Rules
There is another special container called ActionPool
, which contains Rule
objects instead of actions. As can be seen in the code snippet below, the API for using an ActionPool
is analogous to using an ActionGroup
, but with rules instead.
/// ...
actprop = new ActionPool(Action::rule_repeatapply,"oppool1");
actprop->addRule( new RuleEarlyRemoval("deadcode"));
actprop->addRule( new RuleTermOrder("analysis"));
actprop->addRule( new RuleSelectCse("analysis"));
actprop->addRule( new RuleCollectTerms("analysis"));
actprop->addRule( new RulePullsubMulti("analysis"));
actprop->addRule( new RulePullsubIndirect("analysis"));
actprop->addRule( new RulePushMulti("nodejoin"));
actprop->addRule( new RuleSborrow("analysis") );
actprop->addRule( new RuleIntLessEqual("analysis") );
actprop->addRule( new RuleTrivialArith("analysis") );
actprop->addRule( new RuleTrivialBool("analysis") );
actprop->addRule( new RuleTrivialShift("analysis") );
/// ...
In contrast to actions, rules can not contain other rules, as they are meant to be simple and atomic. They therefore represent the leaf nodes of the analysis hierarchy.
Default Groups
In a somewhat confusing naming clash, Ghidra has a concept of groups that is interwoven with the definition of the actions inside the universal action, but not related to the ActionGroup
objects. There is the notion of default groups, which specify the members of certain default actions. For example, this excerpt of buildDefaultGroups
shows the members of the decompile
group, which is a top-level analysis action executed by Ghidra:
const char *members[] = {
"base", "protorecovery", "protorecovery_a", "deindirect",
"localrecovery", "deadcode", "typerecovery", "stackptrflow",
"blockrecovery", "stackvars", "deadcontrolflow", "switchnorm",
"cleanup", "splitcopy", "splitpointer", "merge", "dynamic", "casts",
"analysis", "fixateglobals", "fixateproto", "constsequence",
"segment", "returnsplit", "nodejoin", "doubleload", "doubleprecis",
"unreachable", "subvar", "floatprecision",
"conditionalexe", ""
};
setGroup("decompile",members);
Some names of members are familiar from the excerpts of the universal action definition, e.g. base
, protorecovery
, deadcode
and others. This is no coincidence, but rather a mechanism for selecting relevant actions from the universal action. To this end, every Action
and Rule
has a name and a group identifier. The name itself is typically set in the constructor, whereas the group identifier is set during the construction of the universal action. That means, all strings in the universalAction
function (i.e. the previous excerpts) are group identifiers, with one exception: Action and rule containers have an empty group identifier and the strings shown in universalAction
represent their names. That means, the oppool1
in ActionPool(Action::rule_repeatapply,"oppool1")
refers to the name of the action, but base
in ActionStart("base")
refers to the group identifier.
This might seem illogical at first, but makes sense under consideration of how the default actions, such as decompile
, are constructed. In order to derive this action, Ghidra walks through the hierarchy of the universal action. At every step it tests whether the current action or rule has a group identifier that matches any of the member names of the default group. If yes, the action/rule is included, otherwise it is skipped. This keeps the overall structure of the universal action, but simply deactivates certain actions; deactivating action groups would however change the overall structure and may lead to unpredictable results.