Python simulation execution JavaScript tutorial
Table of contents
Why is this skill needed?
Today, the core logic of web development is no longer simply back-end rendering of HTML. Front-end rendering, interface parameter encryption, and dynamic cookie generation are almost all performed by JavaScript. As a developer or crawler engineer, you will most likely encounter the following scenarios:
- To grab B-station dramas or comments, you need to do it first
bili_jct、_signatureSuch encryption parameters. - To crawl the product inventory of the e-commerce platform, you need to get dynamically generated
anti_scrape_token。 - Complete the automated testing of front-end and back-end separation projects. Data constructed in Python must be encrypted by the front-end to pass back-end verification.
Faced with these situations, being able to run JavaScript logic directly in Python can save a lot of time in reverse engineering and rewriting. This tutorial will use PyExecJS, a lightweight library, to teach you how to seamlessly move front-end JS code into Python. The whole process does not involve advanced mathematics, but only practical execution skills.
Quick environment setup
Core dependencies
To simulate JavaScript execution in Python, you need two things:
- PyExecJS: The bridge between Python and JS runtime environments.
- JS running environment: Node.js is highly recommended. Its V8 engine natively supports ES6+ and
crypto-jsThis type of third-party encryption library is much easier to use than the JScript that comes with Windows.
Installation step by step
1. Install PyExecJS
2. Install Node.js
Go to the official website to download the latest LTS version (stable and worry-free):
After the installation is complete, open a terminal to verify:
3. Force binding to Node.js (key!)
After many novices install PyExecJS, the system will use the JScript that comes with Windows by default (ES6+ is not supported), causing errors to be reported as soon as the code is run. So be sure to actively check and bind Node.js:
SeeNode.js (V8)This means that the environment is stable.
Practical combat: Cracking the encrypted token of a certain football star station
We take the officially provided SPA7 crawler practice station (https://spa7.scrape.center/)练手。这个站点里每个球星信息都由一个加密的`token`protection, only the correct generation of thetoken, to request complete player data.
Step 1: Reverse front-end encryption logic
Don’t rush to write Python code yet. Open the browser’s developer tools (F12) and follow the steps below to lock the encryption logic:
- Switch to the Network tab.
- Refresh the page and look for requests that return complete player information (usually containing
apiordetail)。 - Check the request parameters and you will see a very conspicuous encryption field:
token。 - Click the Initiator (Enabler) tab and find the location of the JS file that initiated the request.
- Use it after entering
Ctrl+FsearchgetToken(common naming of encryption functions) to locate specific encryption codes.
Step 2: Extract and prepare encryption code
At the practice station,getTokenThe function relies on the third-party library crypto-js to perform DES encryption. In order for it to run properly in Node.js, we need to do two things:
- Get
crypto-jscompressed version ofcrypto-js.min.js(Can be downloaded directly from npm or CDN). - Combine the encryption function and
crypto-js.min.jsMerge into the same JS file and manually solve the global variable mounting problem.
Step 3: Python seamless call
First organize the encrypted files (crypto_utils.js)
Node.js does not expose third-party libraries as global objects by default, so we need to use a self-executing function to return manuallyCryptoJS, and hang it globally:
Write the Python calling code again
Run this code and you should see an encrypted string in Base64 formattoken, exactly the same as in the browser request.
Pitfall avoidance guide + performance tips
1. Avoid pitfalls: CryptoJS is undefined
Reason: Directly in Node.js environmentrequire('crypto-js')It will not be automatically linked to the global situation. If our self-executing function does not return correctly, it will also causeCryptoJS is not definedReport an error.
Solution:
- Strictly use the self-executing function + global mounting method above.
- Or pre-installed and imported in Python:
ctx.eval("const CryptoJS = require('crypto-js')")
The premise is that it has been executed in the current directorynpm install crypto-js。
2. Avoid pitfalls: garbled Chinese characters
Performance: If the encrypted object contains Chinese, it will be generatedtokenIf it is inconsistent with the browser, the request will be rejected.
Solution:
- Make sure JS files are saved in UTF-8 without BOM encoding (avoiding Windows Notepad's default encoding).
- must be specified when Python reads JS files
encoding='utf-8'。
3. Performance optimization: Don’t compile JS every time!
The compilation process of PyExecJS is the most time-consuming because a new Node.js child process will be opened every time it is compiled. If you want to crawl 1,000 star data, but repeatedly in the loopexecjs.compile, the performance will collapse directly.
Optimization method: putexecjs.compile()When it comes to outside the loop, the entire script is only executed once and the same one is reused all the time.ctxobject.
This little trick can bring about qualitative improvements when calling JS functions in batches.
Summarize
Through this tutorial, you have mastered the following core capabilities:
- How to correctly install and bind the running environment of PyExecJS + Node.js.
- Reverse the basic idea of front-end encryption logic from browser developer tools.
- Seamlessly migrate JS encryption code that relies on third-party libraries to Python for execution.
- Solve
CryptoJSUndefined, Chinese garbled and other common pitfalls, as well as improving execution performance through pre-compilation.
If you encounter more complex encryption in the future - such as highly obfuscated JS or WebAssembly, you can also choose two paths:
- Further research on JS reverse engineering (such as AST reduction, script behavior analysis).
- Use browser automation tools like Selenium/Playwright to sacrifice some performance but skip the manual reversing step.
The complete sample code can be found in the corresponding GitHub repository: https://github.com/Python3WebSpider/ScrapeSpa7

