Use dict and set

Among Python's built-in data structures, lists and tuples are good at handling ordered sequences, but in actual development, we often need to quickly locate data based on a unique identifier**, batch deduplication, or efficiently determine whether a member exists. In these scenarios, dictionary (dict) and set (set) become irreplaceable golden partners. Their core secret is that they all use Hash Tables at the bottom layer, which allows many operations to be completed almost instantly.


Dictionary (dict) basics

A dictionary is a typical key-value mapping container. You can think of it as an index of a book: keywords (keys) correspond to page numbers (values). In other languages ​​it is also called a hash table, map or associative array.

Basic addition, deletion, modification and query

The syntax of a dictionary is very intuitive. It is wrapped in a pair of curly braces, with pairs insidekey: value, separated by commas.

# 创建空字典(注意:{}是空字典,不是空集合!)
empty_dict = {}
empty_dict = dict()  # 更推荐的显式创建方式

# 创建带初始值的字典
student_scores = {"Michael": 95, "Bob": 75, "Tracy": 85}

# 读取元素:直接通过键访问(键不存在会触发KeyError)
print(student_scores["Michael"])  # 输出: 95

# 添加或修改元素:直接赋值
student_scores["Adam"] = 67   # 新增键值对
student_scores["Bob"] = 80    # 修改已有键对应的值,后赋值的会覆盖前面的

# 删除元素
removed = student_scores.pop("Bob")  # 安全删除并返回对应的值,键不存在也会报错
del student_scores["Tracy"]          # 无返回值删除,同样遇到不存在的键会报错
student_scores.clear()              # 清空整个字典

The core of efficiency: Hash table principle

Why can dictionaries achieve near-constant-time query, insertion, and deletion speeds? It can be simply understood as three steps:

  1. For inputkeyExecute the built-inhash()Function, calculates a fixed-length "hash value" (equivalent to giving this key a unique fingerprint).
  2. Map this hash value to a position index of the internal array (just like looking up the index table through keywords and turning directly to the corresponding page).
  3. Directly locate the storage based on the indexvaluememory location.

Regardless of whether there are 10 elements or 100,000 elements in the dictionary, the entire process does not need to be traversed one by one, so the speed is hardly affected by the amount of data.

Commonly used advanced operations

# 重新构建测试数据
student_scores = {"Michael": 95, "Bob": 80, "Tracy": 85}

# ✅ 安全地检查键是否存在(比直接索引 + try/except 更优雅)
if "Michael" in student_scores:
    print("找到了Michael的成绩")

# ✅ 安全地获取值,避免KeyError(第二个参数为默认值)
score = student_scores.get("Thomas", -1)   # "Thomas"不存在,返回 -1
print(score)  # 输出: -1

# ✅ 获取字典的“视图对象”(实时反映字典变化,但可以随时转换成固定列表)
keys = student_scores.keys()      # 所有键的视图
values = student_scores.values()  # 所有值的视图
items = student_scores.items()    # 所有键值对的视图(每个元素是 (key, value) 元组)
print(list(items))  # 输出: [("Michael", 95), ("Bob", 80), ("Tracy", 85)]

# ✅ 字典推导式:快速生成或过滤字典
squares = {x: x**2 for x in range(5)}                 # {0:0, 1:1, 2:4, 3:9, 4:16}
high_scores = {k: v for k, v in student_scores.items() if v > 80}  # {"Tracy": 85}

Three core features that must be remembered

  1. Insertion order, but don’t rely entirely on it: Starting with Python 3.7, the language specification guarantees that dictionaries will preserve the insertion order of keys. But when you need compatibility with older versions or need to flexibly adjust the order (such as moving elements to the beginning), please usecollections.OrderedDict
  2. The key must be unique: If the same key is assigned a value repeatedly, the latter value will overwrite the previous one, and there are no duplicate keys in the dictionary.
  3. The key must be an immutable object: The hash table requires the hash value of the key to be stable, so mutable objects such as lists and ordinary collections cannot be used as dictionary keys. Numbers, strings, and tuples (and the elements within the tuple must also be immutable) are all qualified keys.

Collection (set) basics

A collection can be understood as a dictionary that only stores keys and no values. It also relies on hash tables at the bottom, so it is naturally suitable for removal and mathematical set operations.

Basic usage and operations

# 创建空集合(注意:必须用set(),{}是空字典!)
empty_set = set()

# 创建带初始值的集合(会自动去除重复元素)
num_set = {1, 2, 3, 2}          # 输出: {1, 2, 3}
num_set = set([3, 2, 1, 2])     # 可以从任何可迭代对象创建,同样会自动去重

# 添加元素
num_set.add(4)                  # 添加单个元素
num_set.update([5, 6, 6])       # 批量添加可迭代对象中的元素,自动去重

# 删除元素
num_set.remove(3)               # 删除元素,如果不存在会报 KeyError
num_set.discard(3)              # 安全的删除,就算元素不存在也不会报错
num_set.pop()                   # 随机删除并返回一个元素(集合是无序的)
num_set.clear()                 # 清空集合

# ✅ 数学意义上的集合运算(也支持对应的方法)
s1 = {1, 2, 3}
s2 = {2, 3, 4}

print(s1 & s2)   # 交集:两个集合都有的元素 → {2, 3}
print(s1 | s2)   # 并集:所有出现过的元素 → {1, 2, 3, 4}
print(s1 - s2)   # 差集:在s1中但不在s2中的元素 → {1}
print(s1 ^ s2)   # 对称差集:只出现在其中一个集合的元素 → {1, 4}

💡 Tips: The elements of the collection must also be immutable objects. You can put strings, numbers, and tuples into a collection, but not lists or another ordinary collection.

Three major characteristics of collections

  1. Disorder: Do not rely on the order of the elements in the collection, the results of each traversal may be different.
  2. Uniqueness: Automatically remove duplicate elements, which is one of its most practical features.
  3. Elements must be immutable: Like the keys of the dictionary, the elements of the set also require the hash value to be stable.

Key prefix: mutable and immutable objects

Whether it is a dictionary key or a collection element, "immutability" is required. To understand this, we must first distinguish between two types of data in Python.

Mutable object (Mutable)

After creation, you can modify the object's contents in situ, but the object's address will not change:

  • List, dictionary (dict), ordinary set (set)
lst = [3, 1, 2]
ref = lst
lst.sort()        # 原地排序,并不会产生一个新对象
print(lst is ref) # 输出: True,说明还是同一个对象

Immutable objects (Immutable)

After creation, the content of the object cannot be modified directly. Any "modification" operation will generate a new object:

  • String (str), number (int/float/bool), tuple (tuple), frozen set (frozenset)
s = "hello"
original = s
new_s = s.replace("h", "H")  # 生成一个新字符串,原字符串不变
print(s is original)         # 输出: True
print(new_s is original)     # 输出: False

🔑 **Why is it so important? ** If the keys of the dictionary change, its hash value will not match the storage location, resulting in no data being found at all, or even destroying the entire structure. Therefore, Python directly prohibits using mutable objects as keys or collection elements.


Practical application scenario suggestions

When to use dict?

  • Key-value mapping query: Check the name based on the student number, and get the parameters based on the name of the configuration item.
  • Data grouping statistics: For example, to count the number of occurrences of each word in a paragraph of text, useword_count[word] = word_count.get(word, 0) + 1Extremely efficient.
  • Implement simple caching: temporarily save the calculation results in the dictionary and retrieve them directly using the key next time, making full use of its near-instantaneous search speed.

When to use set?

  • Batch deduplication: Extract all unique IDs from a massive list of user IDs, one rowset(id_list)That's it.
  • Efficient member judgment: Checking whether an IP is in the blacklist, using a set is dozens to hundreds of times faster than using a list traversal.
  • Set logical operation: Quickly find the common friends of two people (intersection), the unique visitor added to a platform today (difference set), etc.

Performance experience (analogy with examples in life)

OperationListDictionary / Set
Determine whether the element existsSearch sequentially from beginning to end, which is very slow when there is a lot of dataDirectly locate through hashing, almost completed in an instant
Insert or delete intermediate elementsA large amount of data needs to be moved, which is expensiveCalculate the position directly, and the time consumption is basically the same
Memory usageOnly the element itself is stored, which saves spaceFor the efficiency of the hash table, additional space will be reserved, which is relatively "memory-eating"

Answers to high-frequency questions

**Q1: ​​Why can’t a list be used as a key of a dictionary or an element of a set? ** Because lists are mutable objects. Imagine that if lists are allowed to be used as keys, we first stored = {[1,2]: "A"}, and then accidentally modify the contents of this list, its hash value will change, and the originally stored value will no longer be found, which will also cause confusion in the internal structure. If you need to use a list-like sequence as a key, you can convert it to a tuple, since tuples are immutable:

# ✅ 元组可以
d = {(1, 2): "value"}

# ❌ 列表会引发 TypeError
# d = {[1, 2]: "value"}

**Q2: Since the Python 3.7+ dictionary already maintains insertion order, you still needOrderedDict? ** This depends on the scenario:

  • If you just want to make the traversal order consistent with the insertion order, ordinarydictIt's enough.
  • But if you need to move the recently inserted key to the front or last, or the code must be compatible with Python 3.6 and earlier, you must usecollections.OrderedDict

Mastering dictionaries and collections, you will have a powerful tool to handle most efficient query and deduplication scenarios in Python. As long as you remember that "keys/elements must be immutable", you can avoid common pitfalls and use them with ease.