pdd-recruitment-anticontent-reverse

Want to obtain Pinduoduo social recruitment positions in batches and do technical research such as industry salary analysis and talent portrait construction? First pass the dynamic level in front of you - the core anti-crawling parameters under Ruisu Dynamic Security Protection:anti-content

This article will use the real-life scenario of Pinduoduo’s social recruitment as an anchor to take you through the complete reverse process from environment detection to the core function Hook, and finally provide a reusablePython + Node.js(execjs)implementation plan.

Preparation: Be familiar with the basic operations of Chrome DevTools, understand the common methods of JavaScript reverse engineering, and knowexecjsHow to call JavaScript code in Python.


1. Overview

1.1 Scenario and core parameters

The interface data of Pinduoduo social recruitment platform (<URL0>) must be carriedanti-contentto return normally. This parameter is dynamically generated by obfuscated JavaScript. Each refresh of the front-end code may produce subtle obfuscation changes, but the core encryption framework is stable.

1.2 The “hard nut” to be gnawed this time

Compared with static encryption (such as MD5+salt, AES), Ruisu's dynamic security protection product has several headaches:

  1. Dynamic confusion, static analysis is prone to missing dependencies: The extracted JS code is often "missing something" because it relies on the Webpack module or some global variables automatically injected by Ruisu. If it is not completed, an error will be reported directly.
  2. Strict environmental fingerprint detection: will be carefully checkednavigator.webdriverwindow.chrome, browser plug-in list, etc., to determine whether you are using automated tools to fake the environment.
  3. Anti-replay and link binding: The Cookie, Referer, and User-Agent carried in the request must be consistent, and a timestamp will be embedded in the parameters to prevent old requests from being reused.
  4. Dynamic Constructor Call: The encryption function is not static exposed at the top levelmd5(), but generated by obfuscation similar towindow.hhh(4)Such a dynamic constructor.

2. Web page reverse analysis ideas

2.1 Locate the target interface

according toF12Open DevTools, switch to the Network panel, refresh the page or click the "Next Page" filter. Soon you will see the interface that returns the job list:

POST https://careers.pinduoduo.com/api/recruit/position/list

Click this request, view Payload, and you will findanti-contentThis is the core parameter we want to reverse this time.

2.2 Positioning parameter generation location

Ruishu parameters are generally not written directly in a static JS file, but are usually hidden in dynamically injected code. Two common positioning methods:

Method 1: Global search keywords

In the Sources panel of DevTools, pressCtrl+Shift+F(Windows) orCmd+Option+F(Mac) Open global search and enter one of the following keywords:

  • anti-content
  • antiContent
  • getAnti
  • hhh(You will find out later that this is the name of the core constructor)

Generally, you can quickly locate the place where the constructor is called.

Method 2: XHR/Fetch breakpoint

In the Network panel, right-click the target interface, select "XHR/fetch Breakpoints" to add breakpoints, and then refresh the page. When the request is sent, it will automatically break at the place where the code is sent - trace back along the call stack to find the generatedanti-contentfunction.


3. Simplified analysis of core encryption logic

Through debugging and Hook, we split the core logic into three steps:

Ruisu will detect a large number of environment variables before encryption. If these features are not completed, even if the generatedanti-contentIf the format is correct, the backend verification will fail. The following are key variables that must be filled in:

// 最基础的自动化特征 — 隐藏 webdriver 痕迹
Object.defineProperty(navigator, 'webdriver', {
    get: () => false
});

// 模拟 Chrome 环境的扩展 API(瑞数会检查这些对象是否存在)
window.chrome = {
    runtime: {},
    loadTimes: () => {},
    csi: () => {}
};

// 模拟合法的插件列表
navigator.plugins = [
    { name: 'Chrome PDF Plugin', filename: 'internal-pdf-viewer' },
    { name: 'Chrome PDF Viewer', filename: 'mhjfbmdgcfjbbpaeojofohoefgiehjai' }
];

3.2 Step 2: Call the dynamic constructor

Ruisu will pass a piece of self-executing obfuscated code towindowInjection is similar tohhh's constructor. The name of this constructor may change, but the calling method and parameter meanings are fixed. such as numbers4It stands for "anti-content encryption of the job list interface".

// 简化后的核心调用
function get_anti() {
  // window.hhh(4) 是岗位列表接口专属的加密构造器
  // serverTime 用来传入当前毫秒时间戳,防止重放攻击
  const encryptor = new window.hhh(4)({
    serverTime: new Date().getTime()
  });
  // 最后调用序列化方法,生成最终字符串
  return encryptor.messagePack();
}

3.3 Step 3: Serialized output

messagePack()The method will encode the encrypted binary data into a string similar to Base64 (not standard Base64, with a custom character mapping). This string is what we ultimately want.anti-content


4. Reusable Python implementation solution

We can use Python'sexecjslibrary to execute the JavaScript files that complete the environment and generateanti-content, then userequestsSend a request.

4.1 Complete the JS file of the environment (demo.js)

⚠️ Note: The following code only gives the framework of environment completion. realwindow.hhhYou need to completely extract and complete all the modules it depends on from Ruisu's obfuscated code. You cannot just copy and paste this framework.

// ========== 环境补全 ==========
Object.defineProperty(navigator, 'webdriver', { get: () => false });
window.chrome = { runtime: {}, loadTimes: () => {}, csi: () => {} };
navigator.plugins = [
    { name: 'Chrome PDF Plugin', filename: 'internal-pdf-viewer' },
    { name: 'Chrome PDF Viewer', filename: 'mhjfbmdgcfjbbpaeojofohoefgiehjai' }
];
// User-Agent 也需要补上
navigator.userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36';

// ========== 关键:从瑞数混淆代码中提取并补全 window.hhh 及相关模块 ==========
// 此处你需要通过 Chrome DevTools 调试、Hook,把完整的构造器逻辑搬过来

// ========== 暴露给 Python 调用的函数 ==========
function get_anti() {
  const encryptor = new window.hhh(4)({
    serverTime: new Date().getTime()
  });
  return encryptor.messagePack();
}

4.2 Python calling code

# -*- coding: utf-8 -*-
import json
import requests
import execjs

def get_anti_content():
    """调用补好环境的 JS 文件生成 anti-content"""
    try:
        with open('demo.js', 'r', encoding='utf-8') as f:
            js_code = f.read()
        ctx = execjs.compile(js_code)
        return ctx.call('get_anti')
    except Exception as e:
        print(f"生成 anti-content 失败: {e}")
        return None

def fetch_pdd_jobs(page=1, page_size=10):
    """获取拼多多社招岗位列表"""
    anti_content = get_anti_content()
    if not anti_content:
        return None

    # 请求头和 Cookie 需要和你提取 anti-content 时的环境保持一致
    headers = {
        "accept": "*/*",
        "accept-language": "zh-CN,zh;q=0.9,en;q=0.8",
        "content-type": "application/json",
        "origin": "https://careers.pinduoduo.com",
        "referer": "https://careers.pinduoduo.com/jobs",
        "sec-ch-ua": '"Not/A)Brand";v="8", "Chromium";v="132", "Google Chrome";v="132"',
        "sec-ch-ua-mobile": "?0",
        "sec-ch-ua-platform": '"Windows"',
        "sec-fetch-dest": "empty",
        "sec-fetch-mode": "cors",
        "sec-fetch-site": "same-origin",
        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36"
    }
    cookies = {
        # ⚠️ 这里的 Cookie 需要你自己从浏览器里获取合法值
        "_nano_fp": "替换成你自己的_nano_fp",
        "api_uid": "替换成你自己的api_uid"
    }
    url = "https://careers.pinduoduo.com/api/recruit/position/list"
    payload = {
        "job": "",
        "page": page,
        "pageSize": page_size,
        "name": "",
        "workLocationList": [],
        "anti_content": anti_content
    }

    try:
        # 注意 payload 要转成紧凑的 JSON 字符串,不然后端可能校验失败
        response = requests.post(
            url,
            headers=headers,
            cookies=cookies,
            data=json.dumps(payload, separators=(',', ':'))
        )
        response.raise_for_status()
        return response.json()
    except Exception as e:
        print(f"获取岗位列表失败: {e}")
        return None

if __name__ == "__main__":
    jobs = fetch_pdd_jobs(page=1, page_size=10)
    if jobs:
        print("获取成功!")
        print(json.dumps(jobs, indent=2, ensure_ascii=False))

5. Notes and Summary

5.1 Notes

  1. Control request frequency: Pinduoduo’s risk control will detect the frequency of requests. Too high a frequency can easily lead to IP or cookies being blocked.
  2. Cookie needs to be updated regularly: in Cookie_nano_fpapi_uidThe field will expire and you need to get a new copy from the browser after expiration.
  3. Ruishu code will be dynamically updated: Obfuscated variable names (such aswindow.hhh) and the internal logic may change from time to time, so you have to debug and follow up regularly.
  4. For technical learning purposes only: Batch scraping may violate Pinduoduo’s terms of service, so please be sure to use it legally and compliantly.

5.2 Review of core knowledge points

Key stepsSpecific content
Locate interfaceFind the job list interface requested by POST through the Network panel
Location parametersUse XHR/Fetch breakpoints or global search keywords to find the generation location
Complete environmentKey pointsnavigator.webdriverwindow.chrome, plug-in list
Extract logicExtract from obfuscated codewindow.hhhComplete modules and their dependencies
Send RequestUseexecjsCall JavaScript to generate parameters and keep Cookie/Referer consistent

I hope this article can help you successfully pass the threshold of Ruisu dynamic protection. If you encounter problems during the reproduction process, please feel free to discuss them in the comment area.