Douyin APP packet capture analysis practice
Regardless of content analysis or user behavior research, the data of top short video APPs such as Douyin is almost unavoidable - unfortunately, the web version either has functional castration, or the anti-crawling mechanism is updated every day, while the APP side seems to be closer to the real data, but it faces the threshold of SSL pinning, complex signatures, and device binding.
Today we will start with the "Entry-level Minimalist Solution" to help you clarify a complete idea: Capture packet traffic → Identify core API → Parse packet capture response → Avoid signature capture data. At the end of the article, a Python parsing tool that can be run directly with packet capture data is attached to facilitate verification of data structure and extraction logic.
1. Pre-operation: First get “understandable plaintext traffic”
Packet capture is the first step in all API analysis and crawler development. If this step fails, all the rest will be in vain.
Why choose "Old Combination"?
Mainly to bypass the two most basic anti-climbing thresholds of Douyin:
-
System certificate trust restrictions Android 12 and above no longer trust the CA certificate installed by the user by default, and the certificate must be stuffed into the system directory after rooting; the same is true for iOS, which is difficult to handle without jailbreaking. Therefore, choosing Android 9 and below, or an old emulator/old real machine with Root/Frida is the most worry-free method.
-
SSL Pinning (Certificate Pinning) The new version of Douyin after mid-2024 will basically all have certificate locking turned on. It will only recognize the built-in Bytedance root certificate. It will be useless even if you install the CA of the packet capture tool. Prioritize looking for older versions of APKs from 23.5.0 to 23.9.0, which can bypass most pinning restrictions.
Minimalist entry-level solution: Thunderbolt 9 emulator + Mitmproxy + old version of Douyin
This combination has the lowest threshold, and novices can most likely clear the traffic within 15 minutes.
1. Tool installation and basic configuration
💡 Common pitfalls:
- Do not fill in the proxy IP
127.0.0.1, that points to the emulator itself, and you need to fill in the LAN IP of your host machine.- When installing the certificate, the system may ask you to set a lock screen password. Just follow the prompts to set it.
2. Verify whether the configuration is successful
- First open Mitmweb’s Web monitoring page (
http://localhost:8081) to see if any basic network requests pop up. - Install an old version of Douyin around 23.7.0 (don’t search in the app store, go to APKPure or the historical version site to find the installation package).
- Open Douyin, browse 3 to 5 recommended videos, return to the Web monitoring page, and see if there are any
aweme.snssdk.comThe request** at the beginning and the returned content is plain text JSON - if there is, it means that the pun has been passed and you can continue.
2. Traffic filtering: only capture useful "core APIs"
A single packet capture will generate hundreds or thousands of requests. Don't panic. Focus on the key domain names and return formats first.
Three principles of rapid filtering
- Static resources are ignored directly:
p*.douyinpic.com(picture),v*.douyinvod.com(Video file) This type of request is only downloading material and has little value for data capture. - Leave only JSON plain text: focus
aweme.snssdk.combegins, and Content-Type isapplication/jsoninterface. - Recording high-frequency and easy-to-use core interfaces, let’s take a look at the following ones:
📌 Simple understanding:
sec_user_idIt is a highly stable version of user ID and will not become invalid due to changes in purely numerical IDs. Use it first.max_cursorIt is a paging cursor, and there will be ahas_moreFields and to be used in the next requestmax_cursor, you can turn pages according to the input.
3. Code practice: "API response parsing tool" after packet capture
⚠️ Special statement
Douyin’s real signature algorithm (like_signature、x-gorgon、x-khronos、x-ss-stubetc.) Extremely complex, involving the native layer and dynamically generated functions, it is almost impossible for novices to reproduce it in a short time.
Therefore, what is provided here is a tool that "parses + saves the real and complete URL/JSON obtained by packet capture" - it does not touch signatures at all, and directly uses the plaintext data you have captured to help you verify the API response structure and data extraction logic.
Complete Python code
🛠 Usage:
- Find it once from the Mitmweb monitoring page
/aweme/v1/aweme/post/The response body of the interface is copied completely;- Create a new one in the script directory
response.json, paste the copied content into it and save it;- Run the script and a CSV file with timestamp will be generated in the current directory, which contains structured video information.
4. Subsequent advancement and compliance tips
Advanced data retrieval scheme (avoid/resolve signature)
If you don't want to just stay in the semi-automatic state of "first capture packets, then parse" and want to achieve semi-automatic or even fully-automatic data collection, you can pay attention to the following two relatively novice-friendly directions:
-
Appium + Mitmproxy linkage Use Appium to simulate the sliding and clicking operations of real people. Mitmproxy intercepts the requests and responses of the real interface in the middle and saves the structured data directly. This method does not require reproducing the signature algorithm at all, because you are always hijacking legitimate requests made by the real APP.
-
Frida Hook signature function If you already have a certain foundation in reverse engineering, you can try using Frida hook on Douyin
libxgorgon.soOr the Java layer's signature generation class. Call these native functions in real time to generate realx-gorgonWait for the parameters, and then cooperate with PythonrequestsBy simulating the request, you can run through the interface separately from the packet capture.
⚠️ Compliance Tips (Very Important)
Please be sure to comply with relevant laws and regulations such as the Cybersecurity Law of the People's Republic of China, the Data Security Law of the People's Republic of China, and the Personal Information Protection Law of the People's Republic of China:
- Only crawl publicly visible non-sensitive data;
- Do not make high-frequency requests to avoid affecting Douyin’s normal services;
- Do not use the captured data for commercial purposes;
- Do not spread crawled personal information (such as mobile phone number, address, real-name information, etc.).
The above is the practical idea of douyin APP packet capture analysis from zero to one. There are actually three core points: Choose to bypass basic protection for older versions, Use Mitmproxy to see API traffic, Use packet capture data to verify the parsing logic first, and then consider automation. As long as these points are firmly established, whether it is data analysis or advanced reverse engineering, it will be much easier.

