Jingdong e-commerce data batch collection h5st reverse engineering

Practical case website: https://www.jd.com/

Overview

h5st is a "signature pass" used by JD.com's web client (a variant generated by the PC/H5 general basic framework, this case is mainly based on the PC client) to protect core interfaces. It effectively blocks machine requests without real browser context through dynamically obfuscated JavaScript, a combination of algorithms (such as hashes, signatures), and environmental fingerprint binding.

This actual combat will target Jingdong homepage infinite scrolling feed flow interface, analyze the h5st generation link, quickly locate key codes and clarify implementation ideas.


Web analysis

First open the JD.com homepage and pressF12Enter the developer tools and switch to the Network panel:

  1. Refresh the page and scroll down to trigger infinite loading.
  2. Enter in the filter fieldfunctionId=pc_home_feedfunctionIdis the fixed identifier of the interface).
  3. Find the request that returns the feed content and view the request parameters.

The key request parameters are shown in the screenshot:

请求参数截图1 请求参数截图2


Core technical points

Anti-debugging and code obfuscation

  • Variable/function name obfuscation: all identifiers replaced with_$A meaningless name at the beginning.
  • Control flow flattening: Disrupts the normal order and branching logic of the code, greatly improving the difficulty of reading.
  • Strong code compression: remove spaces, newlines, comments, one line to the end.
  • Anti-Dynamic Debugging: Detectiondebuggerand developer tool status, interfering with breakpoint debugging (the anti-debugging of feed interface related logic is relatively weak).

Key encryption parameters

Core parameters and functions extracted from the request:

Parameter nameDescription
appidFixed application identifier, the PC-side core interface is usuallywww-jd-com
bodySHA256 hash value of the request body, used to verify data integrity
functionIdFixed function identifier, Feed flow ispc_home_feed
tMillisecond-level timestamp, valid for about 30 to 60 seconds, used to prevent replay
h5stFinally generated signature parameters, which combines the results of environmental fingerprints, algorithm combinations and parameter verification

Environment completion and key positioning

Quick completion of basic environment

When running obfuscated JS in a non-browser environment such as Node.js, the first step must be to complete the browser core global object, otherwise the code cannot be executed. It is not necessary to complete all attributes at the beginning. Later, you can use Agent Monitoring to locate the missing key attributes.

// 快速补全基础全局对象框架
globalThis.navigator = {
  userAgent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36",
  platform: "Win32",
  // 其他属性可根据代理监控补充
};
globalThis.window = globalThis; // Node.js 下全局对象对齐
globalThis.document = {
  cookie: "", // 后续可按需绑定 Cookie
};
globalThis.location = {
  href: "https://www.jd.com/",
  host: "www.jd.com",
};

Agent monitoring system (must use skills)

Agent monitoring can help us quickly locate which environment objects/attributes are accessed by the obfuscated code and avoid blind completion. Here we focus on monitoringwindowand possibly for environmental fingerprintingcanvas

Although the following code runs in the Node environment, the principle is JavaScriptProxymechanism.

// 通用代理监控函数
function setProxy(proxyObjArr) {
    for (let i = 0; i < proxyObjArr.length; i++) {
        const objName = proxyObjArr[i];
        const handler = {
            get: function(target, property, receiver) {
                console.log(`[GET] 对象: ${objName} | 属性: ${property} | 类型: ${typeof property} | 值: ${target[property]}`);
                return Reflect.get(...arguments);
            },
            set: function(target, property, value, receiver) {
                console.log(`[SET] 对象: ${objName} | 属性: ${property} | 类型: ${typeof property} | 原值: ${target[property]} | 新值: ${value}`);
                return Reflect.set(...arguments);
            }
        };

        try {
            globalThis[objName];
            globalThis[objName] = new Proxy(globalThis[objName], handler);
        } catch (e) {
            globalThis[objName] = {};
            globalThis[objName] = new Proxy(globalThis[objName], handler);
        }
    }
}

// 配置需监控的对象
const proxyArray = ['window', 'canvas'];
setProxy(proxyArray);

Load the proxy script together with the obfuscation library. After running, the console will print all access records to facilitate the discovery of uncompleted attributes.


Key code location and analysis

Positioning ideas

  1. Keyword global search: In the developer toolsSourcesPanel searchh5stParamsSign(observed global object keyword).
  2. XHR/fetch breakpoint: at网络Right-click the target interface in the panel, select "At Fetch/XHR Break Point", scroll down to trigger the breakpoint, and then view the call stack.
  3. Hook key object: If the global search directly finds the exposed object, just hook it directly.

Core parameters and calling process

In this actual combat, we directly found the global exposure through keyword search.ParamsSignConstructor, the remaining work is to call and pass in the parameters:

// 构造请求参数(body 部分先用固定测试值)
const reqParams = {
    "appid": "www-jd-com",
    "body": "224029fa85a1a3b9d6e229f4d578057f080a2f6738837120a79a91934252476f",
    "clientVersion": "1.0.0",
    "client": "pc",
    "functionId": "pc_home_feed",
    "t": Date.now()
};

// 浏览器控制台或补全环境后直接运行
const signer = new window.ParamsSign();
const h5stResult = signer.sign(reqParams);
console.log("生成的 h5st:", h5stResult);

At this point, we have obtained the server-side verificationh5stsign.


Other completion ideas

If the obfuscated code does not expose the global constructor, or the environment fingerprint is deeply bound, you can also use the browser plug-in to complete the environment with one click (friends who need it can get it by private message). The plug-in can automatically simulate the browser context and directly output the available signature logic.

Plug-in example picture: 插件截图1 插件截图2


Frequently Asked Questions and Answers

Incomplete environment completion

Phenomenon: An error occurs when Node.js runs obfuscated code.Cannot read properties of undefined (reading 'xxx')

Solution steps:

  1. Add the objects/properties involved in the error reportproxyArrayRerun.
  2. Observe what is printed on the console[GET]Logging, missing attributes found.
  3. Add the corresponding simulation value to the basic environment configuration (usually a completely real fingerprint is not required, just pass the "weak verification" of the obfuscation library).

Summarize

This actual combat followed the standard process of "Request Parameter Observation → Basic Environment Completion → Agent Monitoring and Positioning → Keyword Search/Hook Key Object"**, and quickly located the h5st generation entrance of Jingdong PC Feed stream interface.

For deeper algorithm restoration (such as AES key extraction, SHA256 combination rules, etc.), further analysis of the obfuscatedsign()The internal logic of the method, this part will be updated in subsequent notes.