Python simulation execution JavaScript tutorial

Table of contents

  1. 为什么需要这技能?
  2. 环境快速搭建
  3. 实战:破解某球星站加密Token
  4. 避坑指南+性能小技巧
  5. 总结

Why is this skill needed?

Today, the core logic of web development is no longer simply back-end rendering of HTML. Front-end rendering, interface parameter encryption, and dynamic cookie generation are almost all performed by JavaScript. As a developer or crawler engineer, you will most likely encounter the following scenarios:

  • To grab B-station dramas or comments, you need to do it firstbili_jct_signatureSuch encryption parameters.
  • To crawl the product inventory of the e-commerce platform, you need to get dynamically generatedanti_scrape_token
  • Complete the automated testing of front-end and back-end separation projects. Data constructed in Python must be encrypted by the front-end to pass back-end verification.

Faced with these situations, being able to run JavaScript logic directly in Python can save a lot of time in reverse engineering and rewriting. This tutorial will use PyExecJS, a lightweight library, to teach you how to seamlessly move front-end JS code into Python. The whole process does not involve advanced mathematics, but only practical execution skills.


Quick environment setup

Core dependencies

To simulate JavaScript execution in Python, you need two things:

  1. PyExecJS: The bridge between Python and JS runtime environments.
  2. JS running environment: Node.js is highly recommended. Its V8 engine natively supports ES6+ andcrypto-jsThis type of third-party encryption library is much easier to use than the JScript that comes with Windows.

Installation step by step

1. Install PyExecJS

pip install PyExecJS

2. Install Node.js

Go to the official website to download the latest LTS version (stable and worry-free):

https://nodejs.org/

After the installation is complete, open a terminal to verify:

node -v   # 例如输出 v20.x.x
npm -v    # 后面安装第三方库时会用到

3. Force binding to Node.js (key!)

After many novices install PyExecJS, the system will use the JScript that comes with Windows by default (ES6+ is not supported), causing errors to be reported as soon as the code is run. So be sure to actively check and bind Node.js:

import execjs

# 查看所有可用的 JS 运行时
print("可用的 JS 环境:", execjs.runtimes().keys())

# 强制指定 Node.js
ctx = execjs.get(execjs.runtime_names.Node)
print("当前绑定的 JS 环境:", ctx.name)  # 应输出类似 "Node.js (V8)"

SeeNode.js (V8)This means that the environment is stable.


Practical combat: Cracking the encrypted token of a certain football star station

We take the officially provided SPA7 crawler practice station (https://spa7.scrape.center/)练手。这个站点里每个球星信息都由一个加密的`token`protection, only the correct generation of thetoken, to request complete player data.

Step 1: Reverse front-end encryption logic

Don’t rush to write Python code yet. Open the browser’s developer tools (F12) and follow the steps below to lock the encryption logic:

  1. Switch to the Network tab.
  2. Refresh the page and look for requests that return complete player information (usually containingapiordetail)。
  3. Check the request parameters and you will see a very conspicuous encryption field:token
  4. Click the Initiator (Enabler) tab and find the location of the JS file that initiated the request.
  5. Use it after enteringCtrl+FsearchgetToken(common naming of encryption functions) to locate specific encryption codes.

Step 2: Extract and prepare encryption code

At the practice station,getTokenThe function relies on the third-party library crypto-js to perform DES encryption. In order for it to run properly in Node.js, we need to do two things:

  1. Getcrypto-jscompressed version ofcrypto-js.min.js(Can be downloaded directly from npm or CDN).
  2. Combine the encryption function andcrypto-js.min.jsMerge into the same JS file and manually solve the global variable mounting problem.

Step 3: Python seamless call

First organize the encrypted files (crypto_utils.js)

Node.js does not expose third-party libraries as global objects by default, so we need to use a self-executing function to return manuallyCryptoJS, and hang it globally:

// 1. 手动挂载全局 CryptoJS(解决 Node.js 环境下的变量问题)
var CryptoJS = (function() {
    // 这里粘贴完整的 crypto-js.min.js 内容,篇幅所限此处省略
    return e();
})();

// 2. 从练习站提取的加密函数
function getToken(player) {
    // 固定密钥(练习站逆向得到,实际项目需要自行分析)
    const key = "XwKsGlMcdPMEhR1B";
    // 只提取加密所需字段
    const plainObj = {
        name: player.name,
        birthday: player.birthday,
        height: player.height,
        weight: player.weight
    };
    // 转为 JSON 字符串
    const plainText = JSON.stringify(plainObj);
    // 执行 DES-ECB-Pkcs7 加密
    const encrypted = CryptoJS.DES.encrypt(
        CryptoJS.enc.Utf8.parse(plainText),
        CryptoJS.enc.Utf8.parse(key),
        {
            mode: CryptoJS.mode.ECB,
            padding: CryptoJS.pad.Pkcs7
        }
    );
    // 返回 Base64 编码的加密结果
    return encrypted.toString();
}

Write the Python calling code again

import execjs

def generate_spa7_token(player_info):
    # 1. 预编译 JS 文件(只编译一次,后续复用性能更高)
    with open('crypto_utils.js', 'r', encoding='utf-8') as f:
        js_ctx = execjs.compile(f.read())
    
    # 2. 直接调用 JS 函数
    token = js_ctx.call('getToken', player_info)
    return token

if __name__ == "__main__":
    # 练习站的球星数据
    lebron = {
        "name": "LeBron James",
        "birthday": "1984-12-30",
        "height": "2.06m",
        "weight": "113.4kg"
    }
    
    # 生成 Token
    spa7_token = generate_spa7_token(lebron)
    print(f"加密成功!Token:\n{spa7_token}")

Run this code and you should see an encrypted string in Base64 formattoken, exactly the same as in the browser request.


Pitfall avoidance guide + performance tips

1. Avoid pitfalls: CryptoJS is undefined

Reason: Directly in Node.js environmentrequire('crypto-js')It will not be automatically linked to the global situation. If our self-executing function does not return correctly, it will also causeCryptoJS is not definedReport an error.

Solution:

  • Strictly use the self-executing function + global mounting method above.
  • Or pre-installed and imported in Python: ctx.eval("const CryptoJS = require('crypto-js')")
    The premise is that it has been executed in the current directorynpm install crypto-js

2. Avoid pitfalls: garbled Chinese characters

Performance: If the encrypted object contains Chinese, it will be generatedtokenIf it is inconsistent with the browser, the request will be rejected.

Solution:

  • Make sure JS files are saved in UTF-8 without BOM encoding (avoiding Windows Notepad's default encoding).
  • must be specified when Python reads JS filesencoding='utf-8'

3. Performance optimization: Don’t compile JS every time!

The compilation process of PyExecJS is the most time-consuming because a new Node.js child process will be opened every time it is compiled. If you want to crawl 1,000 star data, but repeatedly in the loopexecjs.compile, the performance will collapse directly.

Optimization method: putexecjs.compile()When it comes to outside the loop, the entire script is only executed once and the same one is reused all the time.ctxobject.

import execjs
import time

# 循环外预编译,只做一次
with open('crypto_utils.js', 'r', encoding='utf-8') as f:
    js_ctx = execjs.compile(f.read())

# 模拟爬取 10 个球星
players = [
    {"name": "LeBron James", "birthday": "1984-12-30", "height": "2.06m", "weight": "113.4kg"},
    # ...省略另外 9 个球星数据
]

start_time = time.time()
for p in players:
    token = js_ctx.call('getToken', p)
end_time = time.time()

print(f"优化后爬取 10 个球星耗时:{end_time - start_time:.2f}s")
# 如果每次循环都重新 compile,耗时可能会膨胀 10 倍以上

This little trick can bring about qualitative improvements when calling JS functions in batches.


Summarize

Through this tutorial, you have mastered the following core capabilities:

  1. How to correctly install and bind the running environment of PyExecJS + Node.js.
  2. Reverse the basic idea of ​​front-end encryption logic from browser developer tools.
  3. Seamlessly migrate JS encryption code that relies on third-party libraries to Python for execution.
  4. SolveCryptoJSUndefined, Chinese garbled and other common pitfalls, as well as improving execution performance through pre-compilation.

If you encounter more complex encryption in the future - such as highly obfuscated JS or WebAssembly, you can also choose two paths:

  • Further research on JS reverse engineering (such as AST reduction, script behavior analysis).
  • Use browser automation tools like Selenium/Playwright to sacrifice some performance but skip the manual reversing step.

The complete sample code can be found in the corresponding GitHub repository: https://github.com/Python3WebSpider/ScrapeSpa7