Use AST technology to restore JavaScript obfuscated code
Original author: Brother K Reptile | Published on 2022-04-28


1. First understand: What exactly is AST?
AST (Abstract Syntax Tree) is a tree representation of the syntax structure of the source code - each node in the tree corresponds to a syntax unit in the code, such as variable declaration, function call,ifConditional judgment, etc.
You can think of a piece of obfuscated JavaScript code as a bunch of blocks piled together in a mess: through AST analysis, the type and position of each block can be accurately located; by modifying the AST, we can rearrange them in the correct logical order, and finally piece together readable code.
Wide application of AST
AST is not an exclusive toy for reverse engineering. It is used behind many development tools you use daily:
- IDE functions: syntax highlighting, smart completion, automatic formatting
- Code Translation: Babel translates ES6+ syntax into ES5. The core is to operate AST
- Code Compression: UglifyJS, Terser and other tools delete useless code and shorten variable names, also relying on AST
- Reverse Deobfuscation: Restore scrambled/encrypted JS code ← This is the focus of this article
Get started: AST online visualization tool
It is highly recommended to read at https://astexplorer.net/ while making changes:
- Top bar: Select JavaScript as language and @babel/parser as compiler (default selection for this article)
- Area①: Paste the obfuscated code or normal code you want to analyze
- Area②: See the corresponding JSON tree structure in real time. Click on the node to highlight the corresponding source code part, and vice versa.
- Area③: Write Babel conversion script (add, delete, modify and check the syntax tree here)
- Area ④: Preview the code generated after conversion

2. Basic foreshadowing: The positioning of AST in the compilation process
Maybe most front-end students don't usually write compilers, but understanding the following three steps is the basic skill for reverse deobfuscation:

1. Lexical analysis: split into "words"
Scan the code character by character from left to right, splitting them into a stream of independently meaningful Tokens. for examplelog('Hi')will be split into:
log→ identifier(→ left parenthesis'Hi'→ string literal)→ right parenthesis
2. Grammatical analysis: spell out complete "sentences"
Combine Token sequences according to grammatical rules, establish nesting, sequence and other relationships, and finally generate an AST. For examplelog('Hi')The AST looks roughly like this:
3. Code generation: restore the modified tree into code
After trimming, replacing, sorting and other operations are completed at the AST layer, an executable code string is generated. **All the work we do to deobfuscate is concentrated on the AST layer in the middle. **
3. Core weapon: Babel family bucket to get started quickly
Babel is currently the most mainstream JavaScript compiler. It provides a mature and easy-to-use AST operation API, which can fully meet the needs of entry-level reverse engineering.
Installation method
Quick overview of core packages
@babel/parser: Parse code into AST
The most commonly used isparsemethod:
@babel/traverse: batch processing nodes
If it is only a single point of modification, you can directly access it using the path (such asast.program.body[0]), but when dealing with large batches of nodes, you must rely ontraverse + visitor。
Basic example: batch modification of numbers and strings
Output result:
Several common ways to write visitor
The following four methods have the same effect and can be selected according to your habits:
Tip:
enteris the hook called when entering the node,exitIt is a hook called when leaving a node (generally used for post-order processing).
@babel/types: Build new nodes
When you need to add new code to the AST, you must use@babel/types, its method name is usually the "first letter lowercase" of the node type.
Efficient Tips
If you're not sure what parameters a method requires, in an editor that supports TypeScript, press and holdCtrl+ Click the method name** to jump to the type definition, which is much faster than reading the official documentation.
4. Practical practice: five typical confusion restoration solutions
4.1 Unicode/Hex string restoration
Obfuscation tools often convert ordinary strings into encoded form, such asconsole['\u006c\u006f\u0067']。
Key findings
In the AST, such strings areextra.rawIt's a mess of escape sequences, butvalueThe attributes have been automatically processed by Node.js into normal text!
Restore code
4.2 Static expression evaluation and restoration
Obfuscated code replaces simple constants with a long list of operations, such asconst a = !![]+!![]+!.
Core methods
Use Babel’s built-inpath.evaluate(), which automatically determines whether the expression can be evaluated without execution (i.e. statically computable).
Restore code
Note: If an external variable is referenced in the expression,
confidentwill returnfalse, do not force replacement at this time, otherwise the logic will be lost.
4.3 Clear unused variables and functions
Obfuscated code is often stuffed with a large number of useless statements, which purely interfere with analysis.
Core methods
usepath.scope.getBinding()Check the references of variables/functions:
referenced: Whether it is referenced by other codesconstant: Whether it is a constant (considering whether it is reassigned)
Restore code
4.4 Delete the if-else branches that are always true/false
Often seen in obfuscated codeif (false)orif (1)Some of these branches will never be executed, and some will definitely be executed.
Restore code
4.5 switch-case anti-control flow flattening (entry version)
Control flow flattening is one of the most commonly used obfuscation methods. It useswhile-switch-caseAdd a disordered array to the structure, cut the code that was originally executed sequentially into pieces, and then "play" it again in array order.
Core idea
- Position the array controlling the sequence (such as
'3,4,0,5,1,2'['split'](',')) - Take out the corresponding ones in order of index
caseblock content - Remove each
caseat the endcontinuestatement - Replace the entire code with the spliced normal sequence code
whilestructure
Restore code (pre-dependency search version)
5. Key points quick check and learning resources
Quick check on core technology points
Recommended learning resources
There is no hard-and-fast universal formula for deobfuscation. The core idea is: According to the specific output of the target obfuscation tool, cooperate with the AST tool to restore the real logic bit by bit. After mastering the basic knowledge and routines in this article, 90% of entry-level JavaScript obfuscations will not stop you.

