E-commerce App product sliding capture project
This article provides a set of Android e-commerce App sliding collection solutions that require no root, are lightweight and easy to deploy. Our Python wrapper based on Google's UI testing tool -uiautomator2, compared to Appium, configuration is faster and the possibility of triggering risk control is lower. It is quickly compatible with multiple platforms through the adapter mode, and has built-in SQLite persistent storage and basic data analysis capabilities based on pandas/matplotlib. The core code can be run directly.
⚠️ Legal and Compliance Statement: This project is only used for personal technical learning and research. Batch capture of data from unauthorized platforms is strictly prohibited. Please be sure to abide by the "User Agreement", "Privacy Policy" and relevant laws and regulations of each platform, and reasonably control the collection frequency and single/total collection volume!
1. Core architecture: three-module sliding acquisition engine
We split the system into three low-coupling modules: Configuration Management, SQLite Light Storage and UI Interaction and Extraction to facilitate rapid iteration and expansion.
Why choose uiautomator2 first?
The following comparison table can help you quickly understand the characteristics of different solutions:
Core code (lite version)
We have deleted redundant fields (such as discount coupons, user reviews, etc.) and retained the core logic applicable to the basic product list scenarios of most e-commerce apps. Anti-automatic detection methods such as randomized sliding and search processes have been built into the code.
2. Data value-added: quickly view analysis results in 1 minute
After the collection is completed, you can use the following script to quickly count and visualize the data in SQLite. The code has solved the problem of Chinese displaying garbled characters.
💡 Tips: If you encounter font problems, you can install the corresponding Chinese fonts, or directly
plt.rcParams['font.sans-serif']Change it to the existing Chinese font name in the system.
3. Quick Deployment Guide
Environment preparation
- Hardware/Software: A Windows/macOS/Linux computer, an Android phone or emulator with USB debugging turned on (turned on in Developer Options).
- Python environment: Python 3.8 and above (3.9 - 3.11 recommended for better compatibility).
- Depends on installation:
Execute
python -m uiautomator2 initAfterwards, an application called "ATX" will be installed on the phone to monitor automation instructions.
Run steps
- Connect the device: Connect the phone to the computer with a USB data cable. When "Allow USB debugging" pops up on the phone, please click Allow. if there is
adbenvironment, you can enter it in the terminaladb devicesConfirm that the device is recognized. - Modify configuration: Open
core_scraper.py, find the last line ofcustom_config, modify according to your own needs:app_package: The package name of the target e-commerce app (such as Taobaocom.taobao.taobao)category_keywords: List of product keywords you want to collectmax_products_per_category: The maximum number of products collected for each keyword
- Execute collection:
- View analysis results: After the collection is completed, run the following command to generate statistical reports and price distribution charts.
4. Simple anti-automation tips (optional)
If you want to further reduce the risk of being detected by the platform, you can try these lightweight optimizations (some of which are already implemented in the code):
- Randomized sliding parameters: Random offsets are added to the sliding starting point, end point, and interval time to make the operation more like a real person.
- Occasional simulation pauses or "wrong clicks": In
_simulate_scroll_downThe gap can be randomly added with an additional pause of 0.5 ~ 1.5 seconds. - Modify ATX service characteristics: Some platforms will detect ATX related processes or package names. Advanced gameplay can
com.github.uiautomatorRepackage or modify resources. - Control the total collection duration and period: A single continuous collection is not recommended to exceed 1 hour. It is best to divide it into multiple time periods, with a long enough random rest interval.
⚙️ Note: The solution in this article is only suitable for legal scenarios such as learning and research. Excessively frequent automated operations may still violate platform regulations. Please be sure to control collection behavior and respect platform rules.

