Use AST technology to restore JavaScript obfuscated code

Original author: Brother K Reptile | Published on 2022-04-28

AST技术封面

AST解混淆示例


1. First understand: What exactly is AST?

AST (Abstract Syntax Tree) is a tree representation of the syntax structure of the source code - each node in the tree corresponds to a syntax unit in the code, such as variable declaration, function call,ifConditional judgment, etc.

You can think of a piece of obfuscated JavaScript code as a bunch of blocks piled together in a mess: through AST analysis, the type and position of each block can be accurately located; by modifying the AST, we can rearrange them in the correct logical order, and finally piece together readable code.

Wide application of AST

AST is not an exclusive toy for reverse engineering. It is used behind many development tools you use daily:

  • IDE functions: syntax highlighting, smart completion, automatic formatting
  • Code Translation: Babel translates ES6+ syntax into ES5. The core is to operate AST
  • Code Compression: UglifyJS, Terser and other tools delete useless code and shorten variable names, also relying on AST
  • Reverse Deobfuscation: Restore scrambled/encrypted JS code ← This is the focus of this article

Get started: AST online visualization tool

It is highly recommended to read at https://astexplorer.net/ while making changes:

  1. Top bar: Select JavaScript as language and @babel/parser as compiler (default selection for this article)
  2. Area①: Paste the obfuscated code or normal code you want to analyze
  3. Area②: See the corresponding JSON tree structure in real time. Click on the node to highlight the corresponding source code part, and vice versa.
  4. Area③: Write Babel conversion script (add, delete, modify and check the syntax tree here)
  5. Area ④: Preview the code generated after conversion

AST在线解析工具


2. Basic foreshadowing: The positioning of AST in the compilation process

Maybe most front-end students don't usually write compilers, but understanding the following three steps is the basic skill for reverse deobfuscation:

源代码 → 词法分析 → 语法分析 → AST → [我们在这里修改] → 代码生成 → 目标代码

编译过程示意图

1. Lexical analysis: split into "words"

Scan the code character by character from left to right, splitting them into a stream of independently meaningful Tokens. for examplelog('Hi')will be split into:

  • log→ identifier
  • (→ left parenthesis
  • 'Hi'→ string literal
  • )→ right parenthesis

2. Grammatical analysis: spell out complete "sentences"

Combine Token sequences according to grammatical rules, establish nesting, sequence and other relationships, and finally generate an AST. For examplelog('Hi')The AST looks roughly like this:

ExpressionStatement(表达式语句)
  └── CallExpression(函数调用表达式)
      ├── Identifier(标识符):log
      └── [Arguments]
          └── StringLiteral(字符串字面量):“Hi”

3. Code generation: restore the modified tree into code

After trimming, replacing, sorting and other operations are completed at the AST layer, an executable code string is generated. **All the work we do to deobfuscate is concentrated on the AST layer in the middle. **


3. Core weapon: Babel family bucket to get started quickly

Babel is currently the most mainstream JavaScript compiler. It provides a mature and easy-to-use AST operation API, which can fully meet the needs of entry-level reverse engineering.

Installation method

npm install @babel/core @babel/parser @babel/traverse @babel/generator @babel/types

Quick overview of core packages

Package nameFunction
@babel/parserParse code string into AST (JSON tree structure)
@babel/traverseCombined with Visitor mode to traverse/modify AST nodes in batches
@babel/generatorRestore the modified AST to a code string
@babel/typesDetermine node type and quickly create new AST nodes

@babel/parser: Parse code into AST

The most commonly used isparsemethod:

const parser = require("@babel/parser");

const code = "const a = 1;";
const ast = parser.parse(code, { 
  sourceType: "module" // 支持 ES Module 语法,可选
});
console.log(ast); // 输出嵌套的 JSON,直接在 astexplorer 里看更直观

@babel/traverse: batch processing nodes

If it is only a single point of modification, you can directly access it using the path (such asast.program.body[0]), but when dealing with large batches of nodes, you must rely ontraverse + visitor

Basic example: batch modification of numbers and strings

const parser = require("@babel/parser");
const generate = require("@babel/generator").default;
const traverse = require("@babel/traverse").default;

const code = `
const a = 1500;
const b = "hi";
const c = 787;
`;
const ast = parser.parse(code);

// visitor 对象:键是节点类型,值是处理函数
const visitor = {
  NumericLiteral(path) {
    path.node.value = (path.node.value + 100) * 2;
  },
  StringLiteral(path) {
    path.node.value = "I Love JavaScript!";
  }
};

traverse(ast, visitor);
console.log(generate(ast).code);

Output result:

const a = 3200;
const b = "I Love JavaScript!";
const c = 1774;

Several common ways to write visitor

The following four methods have the same effect and can be selected according to your habits:

// 1. 简写方法(最常用)
const visitor = { NumericLiteral(path) {...} };

// 2. 统一入口 + 类型判断(适合需要统一处理的情况)
const visitor = {
  enter(path) {
    if (path.node.type === "NumericLiteral") {...}
  }
};

// 3. 多个类型共享一个处理函数
const visitor = {
  "NumericLiteral|StringLiteral"(path) {...}
};

Tip:enteris the hook called when entering the node,exitIt is a hook called when leaving a node (generally used for post-order processing).


@babel/types: Build new nodes

When you need to add new code to the AST, you must use@babel/types, its method name is usually the "first letter lowercase" of the node type.

Efficient Tips

If you're not sure what parameters a method requires, in an editor that supports TypeScript, press and holdCtrl+ Click the method name** to jump to the type definition, which is much faster than reading the official documentation.


4. Practical practice: five typical confusion restoration solutions

4.1 Unicode/Hex string restoration

Obfuscation tools often convert ordinary strings into encoded form, such asconsole['\u006c\u006f\u0067']

Key findings

In the AST, such strings areextra.rawIt's a mess of escape sequences, butvalueThe attributes have been automatically processed by Node.js into normal text!

Restore code

const visitor = {
  StringLiteral(path) {
    // 删除 raw 属性,生成代码时会基于 value 重新输出
    delete path.node.extra?.raw;
    delete path.node.extra?.rawValue;
  }
};

4.2 Static expression evaluation and restoration

Obfuscated code replaces simple constants with a long list of operations, such asconst a = !![]+!![]+!![](The actual result is 3).

Core methods

Use Babel’s built-inpath.evaluate(), which automatically determines whether the expression can be evaluated without execution (i.e. statically computable).

Restore code

const types = require("@babel/types");

const visitor = {
  "BinaryExpression|CallExpression|ConditionalExpression"(path) {
    const { confident, value } = path.evaluate();
    if (confident) {
      // 将计算结果值转换成 AST 节点并原地替换
      path.replaceInline(types.valueToNode(value));
    }
  }
};

Note: If an external variable is referenced in the expression,confidentwill returnfalse, do not force replacement at this time, otherwise the logic will be lost.


4.3 Clear unused variables and functions

Obfuscated code is often stuffed with a large number of useless statements, which purely interfere with analysis.

Core methods

usepath.scope.getBinding()Check the references of variables/functions:

  • referenced: Whether it is referenced by other codes
  • constant: Whether it is a constant (considering whether it is reassigned)

Restore code

const visitor = {
  VariableDeclarator(path) {
    const binding = path.scope.getBinding(path.node.id.name);
    // 未找到绑定 or 被重新赋值过 or 被引用过 → 保留
    if (!binding || binding.constantViolations.length > 0 || binding.referenced) return;
    // 否则是“僵尸代码”,直接删除
    path.remove();
  }
};

4.4 Delete the if-else branches that are always true/false

Often seen in obfuscated codeif (false)orif (1)Some of these branches will never be executed, and some will definitely be executed.

Restore code

const types = require("@babel/types");

const visitor = {
  IfStatement(path) {
    const test = path.node.test;
    // 只处理条件值是布尔或数字字面量的情况(变量条件无法静态判断)
    if (!types.isBooleanLiteral(test) && !types.isNumericLiteral(test)) return;

    if (test.value) {
      // 条件为真 → 用 if 分支的语句直接替换 if 语句
      path.replaceInline(path.node.consequent.body);
    } else if (path.node.alternate) {
      // 条件为假且有 else → 用 else 分支替换
      path.replaceInline(path.node.alternate.body);
    } else {
      // 条件为假且没有 else → 删除整个 if
      path.remove();
    }
  }
};

4.5 switch-case anti-control flow flattening (entry version)

Control flow flattening is one of the most commonly used obfuscation methods. It useswhile-switch-caseAdd a disordered array to the structure, cut the code that was originally executed sequentially into pieces, and then "play" it again in array order.

Core idea

  1. Position the array controlling the sequence (such as'3,4,0,5,1,2'['split'](',')
  2. Take out the corresponding ones in order of indexcaseblock content
  3. Remove eachcaseat the endcontinuestatement
  4. Replace the entire code with the spliced ​​normal sequence codewhilestructure

Restore code (pre-dependency search version)

const visitor = {
  WhileStatement(path) {
    const switchNode = path.node.body.body[0];
    // 拿到控制流数组的变量名
    const arrayName = switchNode.discriminant.object.name;

    let controlFlowArr = [];
    // 向 while 前面的兄弟节点查找数组定义
    path.getAllPrevSiblings().forEach(prevPath => {
      const { id, init } = prevPath.node.declarations?.[0] || {};
      if (id?.name === arrayName) {
        // 简化版 split 模拟,实际中可能遇到更复杂的生成方式
        const str = init.callee.object.value;
        const separator = init.arguments[0].value;
        controlFlowArr = str.split(separator);
        // 删除已经用过的数组定义
        prevPath.remove();
      }
    });

    // 按顺序拼接 case 内容
    let replaceNodes = [];
    controlFlowArr.forEach(index => {
      const caseBody = switchNode.cases[index].consequent;
      // 如果最后一条语句是 continue,就删掉
      if (types.isContinueStatement(caseBody.at(-1))) caseBody.pop();
      replaceNodes = replaceNodes.concat(caseBody);
    });

    // 替换掉整个 while
    path.replaceWithMultiple(replaceNodes);
  }
};

5. Key points quick check and learning resources

Quick check on core technology points

OperationsCore Methods
Parse code@babel/parserparse()
Batch traversal/modification@babel/traversetraverse(ast, visitor)
Generate code@babel/generatorgenerate(ast)
Create new node@babel/typestypes.xxx()
Evaluate static expressionspath.evaluate()
Check variable referencespath.scope.getBinding()
Replace node inlinepath.replaceInline()
Delete nodepath.remove()
ResourcesLinks
AST online visualizationastexplorer.net
Babel Chinese official websitebabeljs.cn
Babel Traverse Chinese Documentationevilrecluse.top/Babel-traverse-api-doc

There is no hard-and-fast universal formula for deobfuscation. The core idea is: According to the specific output of the target obfuscation tool, cooperate with the AST tool to restore the real logic bit by bit. After mastering the basic knowledge and routines in this article, 90% of entry-level JavaScript obfuscations will not stop you.