title: 📘 Practical teaching: automated collection of Xiaohongshu based on DrissionPage description: This document will summarize for you the entire process from automated collection to data analysis by Xiaohongshu (XHS). We will use DrissionPage + Listen listening mode, which is currently the most stable and efficient practical solution to bypass Xiaohongshu’s complex encryption (x-s signature).
Beginning: Why do we need to engage in "XHS monitoring collection"?
If you userequestsorScrapyIf you climb through Xiaohongshu, you must have stepped on these two pitfalls:
- Reverse Cost Explosion: Light Analysis
x-s、x-tThis type of dynamic signature takes most of a day, and the algorithm will be reset by risk control at every turn. - DOM structure jumps around: Xiaohongshu’s front-end often changes class names and nesting, and the XPath that was painstakingly written becomes invalid after a week.
The DrissionPage + Listen solution we are going to talk about today perfectly circumvents these two minefields - directly intercepts the API packets before the browser renders, and gets the cleanest and most complete JSON structured data, and no longer has to compete with the messy HTML.
🛠️ Minimalist environment configuration
- Python basics: version ≥ 3.6 (recommended 3.9~3.11, best compatibility)
- One-click installation of core library:
- DrissionPage: Automation + packet capture, built-in WebDriver feature erasure, no need to install ChromeDriver
- loguru: replace native
print, logs with time and color, doubling debugging efficiency - DataRecorder: DrissionPage’s official supporting repository supports one-click generation of Excel/CSV/JSON files.
🧠 Dismantling of core principles (corresponding to code logic)
The entire collection process can be condensed into 7 closed-loop actions. Each step has been verified in actual combat and is 10 times more stable than blindly splicing click logic:
🚀 Complete and executable code
📝 Novice avoidance pitfalls + advanced optimization guide
💥 High frequency pit avoidance
- **Monitoring keeps timing out? **
- Check whether the "graphic verification code" or "SMS verification code" pops up in the browser. After handling it manually, the process will continue automatically.
- Confirm that the monitoring interface is written correctly (Xiaohongshu will occasionally change the interface version. You can press F12 to open the Network panel and grab a detailed request to check the interface path).
- **XPath cannot find the card? **
- Prioritize using the "general outer section" positioning given in the code, and don't stick to the specific class name.
- If the page structure really changes, you can right-click any card in F12 → Copy → Copy XPath, and then change the specific
div[1]/div[2]Change tocontainsWith fuzzy matching, the robustness will be much better.
- **How many items were collected before being risk controlled? **
- Must log in! Must log in! Must log in!
- Increase the random waiting time (it has been added in the code, do not manually change it to a fixed 0.5 seconds).
- Do not collect more than 100 keywords for each keyword at one time. Collection in batches is safer.
🎯 Advanced optimization
- Improve collection efficiency
- can put
recorder.record()Move out of the internal loop and call it uniformly after the keyword collection is completed, which is suitable for large-scale data scenarios. - Want to use multi-threading to speed up? **Highly not recommended! ** Xiaohongshu has a very low tolerance for single-account multi-window/multi-threading, and single-threading is the most stable.
- Expand collection fields
- exist
extract_note_dataJust add corresponding dictionary keys to the function, such as tags, image URLs, IP locations, etc. This information is all in the captured JSON, which is very complete.
- Avoid account ban
- Use "idle account" specifically for collection, do not use the main account.
- The total collection amount per day should be controlled within 500 items.
- You can add low-probability actions such as "random likes/favorites" into the code to simulate real-person browsing behavior and further reduce risks.
💡 Final emphasis: The greatest value of this solution is that completely say goodbye to x-s reverse engineering and use JSON directly, while minimizing risk control through random waiting, login status, ESC reset and other means. As long as you follow the steps, novices can run through a stable Xiaohongshu automated collection process in a short time. If you encounter problems, you are welcome to fine-tune the XPath or interface path based on the actual returned network packet content, and iterate flexibly.

