如何在C/C++中实现一个简单的哈希表
引言
在C/C++中,哈希表是一种将键映射到值的数据结构。哈希表使用哈希函数来计算键的索引。根据哈希表的索引,您可以将值存储在适当的位置。
使用哈希表的好处是其非常快速的访问时间。通常情况下,它的时间复杂度(摊还时间复杂度)是常量 O(1) 的访问时间。
如果两个不同的键得到了相同的索引,你将需要使用其他数据结构(桶)来处理这些冲突。如果你选择一个非常好的哈希函数,发生冲突的可能性可以忽略不计。
C++标准模板库(STL)包含了std::unordered_map()数据结构。
在本文中,你将从头开始构建一个散列表,由以下组成:
- A hash function to map keys to values.
- A hash table data structure that supports insert, search, and delete operations.
- A data structure to account for a collision of keys.
选择一个哈希函数
首先,选择一个具有低碰撞几率的相对良好的哈希函数是第一步。然而,为了更好地说明哈希碰撞,本教程将应用一个较差的哈希函数。此有限示例还将仅使用字符串(或在C中是字符数组)。
#define CAPACITY 50000 // Size of the HashTable.
unsigned long hash_function(char* str)
{
unsigned long i = 0;
for (int j = 0; str[j]; j++)
i += str[j];
return i % CAPACITY;
}
运行此代码并测试不同的字符串以查找潜在的碰撞。例如,字符串”Hel”和”Cau”会发生碰撞,因为它们具有相同的ASCII值。
这段代码必须返回一个在哈希表容量范围内的数字,否则可能访问到未绑定的内存位置,从而导致错误。
定义哈希表数据结构
哈希表是由项组成的数组,每个项都是{键: 值}对。
首先,定义物品的结构:
// Defines the HashTable item.
typedef struct Ht_item
{
char* key;
char* value;
} Ht_item;
现在,哈希表有一个指向Ht_item的指针数组,因此它是一个双指针。
// Defines the HashTable.
typedef struct HashTable
{
// Contains an array of pointers to items.
Ht_item** items;
int size;
int count;
} HashTable;
您的哈希表需要使用count函数返回哈希表中的元素数量,并使用size函数返回哈希表的大小。
创建哈希表和哈希表项
接下来,创建用于分配内存和创建项目的函数。
通过为键和值分配内存来创建项目,并返回指向该项目的指针。
Ht_item* create_item(char* key, char* value)
{
// Creates a pointer to a new HashTable item.
Ht_item* item = (Ht_item*) malloc(sizeof(Ht_item));
item->key = (char*) malloc(strlen(key) + 1);
item->value = (char*) malloc(strlen(value) + 1);
strcpy(item->key, key);
strcpy(item->value, value);
return item;
}
通过分配内存并设置大小、计数和项目来创建表格。
HashTable* create_table(int size)
{
// Creates a new HashTable.
HashTable* table = (HashTable*) malloc(sizeof(HashTable));
table->size = size;
table->count = 0;
table->items = (Ht_item**) calloc(table->size, sizeof(Ht_item*));
for (int i = 0; i < table->size; i++)
table->items[i] = NULL;
return table;
}
前面的例子为包装结构HashTable分配内存,并将所有项目设置为NULL。
释放内存是C/C++的最佳实践。使用malloc()和calloc()在堆上释放已分配的内存。
编写函数,释放表项和整个表的内存空间。
void free_item(Ht_item* item)
{
// Frees an item.
free(item->key);
free(item->value);
free(item);
}
void free_table(HashTable* table)
{
// Frees the table.
for (int i = 0; i < table->size; i++)
{
Ht_item* item = table->items[i];
if (item != NULL)
free_item(item);
}
free(table->items);
free(table);
}
添加一个print_table()函数来显示每个项目的索引、键和值。
void print_table(HashTable* table)
{
printf("\nHash Table\n-------------------\n");
for (int i = 0; i < table->size; i++)
{
if (table->items[i])
{
printf("Index:%d, Key:%s, Value:%s\n", i, table->items[i] -> key, table->items[i]->value);
}
}
printf("-------------------\n\n");
}
这就是你定制哈希表的基本功能。现在你需要编写插入、搜索和删除函数。
插入哈希表
创建一个名为ht_insert()的函数,用于执行插入操作。
该函数接受一个HashTable指针,一个键和一个值作为参数。
void ht_insert(HashTable* table, char* key, char* value)
{
...
}
现在,ht_insert() 函数中涉及到一些步骤。
- Create the item based on the { key: value } pair.
- Compute the index based on the hash function.
- Check if the index is already occupied or not, by comparing the key.
If it is not occupied, you can directly insert it into index.
Otherwise, it is a collision, and you will need to handle it.
此教程将教你如何在创建初始模型后处理碰撞。
首先,创建该项。 ,
create_item(key, value)
然后,计算指数。
int index = hash_function(key);
当第一次插入钥匙时,该项必须为空。
// Creates the item.
Ht_item* item = create_item(key, value);
// Computes the index.
int index = hash_function(key);
Ht_item* current_item = table->items[index];
if (current_item == NULL)
{
// Key does not exist.
if (table->count == table->size)
{
// HashTable is full.
printf("Insert Error: Hash Table is full\n");
free_item(item);
return;
}
// Insert directly.
table->items[index] = item;
table->count++;
}
考虑这样一种情况,即 { key: value } 的键值对已经存在,因为相同的项已经插入到哈希表中。为了解决这个问题,代码必须将项的值更新为新值。
if (current_item == NULL)
{
...
}
else {
// Scenario 1: Update the value.
if (strcmp(current_item->key, key) == 0)
{
strcpy(table->items[index] -> value, value);
return;
}
}
考虑到处理碰撞的情况,为此添加了一个占位符。
void handle_collision(HashTable* table, Ht_item* item)
{
}
void ht_insert(HashTable* table, char* key, char* value)
{
...
if (current_item == NULL)
{
...
}
else {
// Scenario 1: Update the value.
if (strcmp(current_item->key, key) == 0)
{
...
}
else {
// Scenario 2: Handle the collision.
handle_collision(table, item);
return;
}
}
}
现在,你的ht_insert()函数已经完成。
在哈希表中搜索物品
创建一个函数ht_search(),检查键是否存在,如果存在,则返回相应的值。
这个函数接受一个HashTable指针和一个键作为参数。
char* ht_search(HashTable* table, char* key)
{
...
}
在哈希表中使用关键字搜索一个项目。如果在哈希表中找不到该项目,则返回NULL。
char* ht_search(HashTable* table, char* key)
{
// Searches for the key in the HashTable.
// Returns NULL if it doesn't exist.
int index = hash_function(key);
Ht_item* item = table->items[index];
// Provide only non-NULL values.
if (item != NULL)
{
if (strcmp(item->key, key) == 0)
return item->value;
}
return NULL;
}
添加一个print_search()函数来展示与关键词匹配的项目。
void print_search(HashTable* table, char* key)
{
char* val;
if ((val = ht_search(table, key)) == NULL)
{
printf("Key:%s does not exist\n", key);
return;
}
else {
printf("Key:%s, Value:%s\n", key, val);
}
}
现在,你的ht_search()函数已经完成。
处理碰撞事件
解决冲突有不同的方式。本教程将依靠一种叫做”分离链”的方法来解决冲突,它旨在为具有相同哈希索引的所有项目创建独立的链表。本教程中的实现将使用链表来创建这些链表。
每当发生碰撞时,相同索引上碰撞的额外项会被添加到溢出桶列表中,这样一来,您就不必删除哈希表上的任何现有记录。
由于链表在插入、查找和删除方面的时间复杂度为O(n),所以在发生碰撞的情况下,最坏情况下的访问时间也为O(n)。这种方法的优点是,如果你的哈希表容量较低,它是一个不错的选择。
实现溢出桶列表。
// Defines the LinkedList.
typedef struct LinkedList {
Ht_item* item;
struct LinkedList* next;
} LinkedList;;
LinkedList* allocate_list()
{
// Allocates memory for a LinkedList pointer.
LinkedList* list = (LinkedList*) malloc(sizeof(LinkedList));
return list;
}
LinkedList* linkedlist_insert(LinkedList* list, Ht_item* item)
{
// Inserts the item onto the LinkedList.
if (!list)
{
LinkedList* head = allocate_list();
head->item = item;
head->next = NULL;
list = head;
return list;
}
else if (list->next == NULL)
{
LinkedList* node = allocate_list();
node->item = item;
node->next = NULL;
list->next = node;
return list;
}
LinkedList* temp = list;
while (temp->next->next)
{
temp = temp->next;
}
LinkedList* node = allocate_list();
node->item = item;
node->next = NULL;
temp->next = node;
return list;
}
Ht_item* linkedlist_remove(LinkedList* list)
{
// Removes the head from the LinkedList.
// Returns the item of the popped element.
if (!list)
return NULL;
if (!list->next)
return NULL;
LinkedList* node = list->next;
LinkedList* temp = list;
temp->next = NULL;
list = node;
Ht_item* it = NULL;
memcpy(temp->item, it, sizeof(Ht_item));
free(temp->item->key);
free(temp->item->value);
free(temp->item);
free(temp);
return it;
}
void free_linkedlist(LinkedList* list)
{
LinkedList* temp = list;
while (list)
{
temp = list;
list = list->next;
free(temp->item->key);
free(temp->item->value);
free(temp->item);
free(temp);
}
}
现在,将这些溢出桶列表添加到哈希表中。每个项目都会有一个链表,所以整个表就是一个链表指针的数组。
typedef struct HashTable HashTable;
// Defines the HashTable.
struct HashTable
{
// Contains an array of pointers to items.
Ht_item** items;
LinkedList** overflow_buckets;
int size;
int count;
};
现在定义了溢出桶,添加函数来创建和删除它们。您还需要在创建表和释放表的函数中考虑它们。
LinkedList** create_overflow_buckets(HashTable* table)
{
// Create the overflow buckets; an array of LinkedLists.
LinkedList** buckets = (LinkedList**) calloc(table->size, sizeof(LinkedList*));
for (int i = 0; i < table->size; i++)
buckets[i] = NULL;
return buckets;
}
void free_overflow_buckets(HashTable* table)
{
// Free all the overflow bucket lists.
LinkedList** buckets = table->overflow_buckets;
for (int i = 0; i < table->size; i++)
free_linkedlist(buckets[i]);
free(buckets);
}
HashTable* create_table(int size)
{
// Creates a new HashTable.
HashTable* table = (HashTable*) malloc(sizeof(HashTable));
table->size = size;
table->count = 0;
table->items = (Ht_item**) calloc(table->size, sizeof(Ht_item*));
for (int i = 0; i < table->size; i++)
table->items[i] = NULL;
table->overflow_buckets = create_overflow_buckets(table);
return table;
}
void free_table(HashTable* table)
{
// Frees the table.
for (int i = 0; i < table->size; i++)
{
Ht_item* item = table->items[i];
if (item != NULL)
free_item(item);
}
// Free the overflow bucket lists and its items.
free_overflow_buckets(table);
free(table->items);
free(table);
}
如果该物品的溢出桶列表不存在,则创建一个列表并将该物品添加到其中。
更新 handle_collision() 函数以处理插入操作。
void handle_collision(HashTable* table, unsigned long index, Ht_item* item)
{
LinkedList* head = table->overflow_buckets[index];
if (head == NULL)
{
// Creates the list.
head = allocate_list();
head->item = item;
table->overflow_buckets[index] = head;
return;
}
else {
// Insert to the list.
table->overflow_buckets[index] = linkedlist_insert(head, item);
return;
}
}
而电话响了
void ht_insert(HashTable* table, char* key, char* value)
{
...
if (current_item == NULL)
{
...
}
else {
// Scenario 1: Update the value.
if (strcmp(current_item->key, key) == 0)
{
...
}
else {
// Scenario 2: Handle the collision.
handle_collision(table, index, item);
return;
}
}
}
现在,更新搜索方法以使用溢出桶。
char* ht_search(HashTable* table, char* key)
{
// Searches for the key in the HashTable.
// Returns NULL if it doesn't exist.
int index = hash_function(key);
Ht_item* item = table->items[index];
LinkedList* head = table->overflow_buckets[index];
// Provide only non-NULL values.
if (item != NULL)
{
if (strcmp(item->key, key) == 0)
return item->value;
if (head == NULL)
return NULL;
item = head->item;
head = head->next;
}
return NULL;
}
最后,现在在insert()和search()函数中处理了碰撞!
从哈希表中删除
让我们现在最后来看看删除功能:
void ht_delete(HashTable* table, char* key)
{
...
}
再次,这种方法与插入方法类似。
- 计算哈希索引并获取条目。
如果为空,则不执行任何操作。
否则,在比较键之后,如果该索引没有冲突链,则从表中移除该项。
如果存在冲突链,则移除该元素并相应地调整链接。
void ht_delete(HashTable* table, char* key)
{
// Deletes an item from the table.
int index = hash_function(key);
Ht_item* item = table->items[index];
LinkedList* head = table->overflow_buckets[index];
if (item == NULL)
{
// Does not exist.
return;
}
else {
if (head == NULL && strcmp(item->key, key) == 0)
{
// Collision chain does not exist.
// Remove the item.
// Set table index to NULL.
table->items[index] = NULL;
free_item(item);
table->count--;
return;
}
else if (head != NULL)
{
// Collision chain exists.
if (strcmp(item->key, key) == 0)
{
// Remove this item.
// Set the head of the list as the new item.
free_item(item);
LinkedList* node = head;
head = head->next;
node->next = NULL;
table->items[index] = create_item(node->item->key, node->item->value);
free_linkedlist(node);
table->overflow_buckets[index] = head;
return;
}
LinkedList* curr = head;
LinkedList* prev = NULL;
while (curr)
{
if (strcmp(curr->item->key, key) == 0)
{
if (prev == NULL)
{
// First element of the chain.
// Remove the chain.
free_linkedlist(head);
table->overflow_buckets[index] = NULL;
return;
}
else
{
// This is somewhere in the chain.
prev->next = curr->next;
curr->next = NULL;
free_linkedlist(curr);
table->overflow_buckets[index] = head;
return;
}
}
curr = curr->next;
prev = curr;
}
}
}
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define CAPACITY 50000 // Size of the HashTable.
unsigned long hash_function(char *str)
{
unsigned long i = 0;
for (int j = 0; str[j]; j++)
i += str[j];
return i % CAPACITY;
}
// Defines the HashTable item.
typedef struct Ht_item
{
char *key;
char *value;
} Ht_item;
// Defines the LinkedList.
typedef struct LinkedList
{
Ht_item *item;
LinkedList *next;
} LinkedList;
// Defines the HashTable.
typedef struct HashTable
{
// Contains an array of pointers to items.
Ht_item **items;
LinkedList **overflow_buckets;
int size;
int count;
} HashTable;
LinkedList *allocate_list()
{
// Allocates memory for a LinkedList pointer.
LinkedList *list = (LinkedList *)malloc(sizeof(LinkedList));
return list;
}
LinkedList *linkedlist_insert(LinkedList *list, Ht_item *item)
{
// Inserts the item onto the LinkedList.
if (!list)
{
LinkedList *head = allocate_list();
head->item = item;
head->next = NULL;
list = head;
return list;
}
else if (list->next == NULL)
{
LinkedList *node = allocate_list();
node->item = item;
node->next = NULL;
list->next = node;
return list;
}
LinkedList *temp = list;
while (temp->next->next)
{
temp = temp->next;
}
LinkedList *node = allocate_list();
node->item = item;
node->next = NULL;
temp->next = node;
return list;
}
Ht_item *linkedlist_remove(LinkedList *list)
{
// Removes the head from the LinkedList.
// Returns the item of the popped element.
if (!list)
return NULL;
if (!list->next)
return NULL;
LinkedList *node = list->next;
LinkedList *temp = list;
temp->next = NULL;
list = node;
Ht_item *it = NULL;
memcpy(temp->item, it, sizeof(Ht_item));
free(temp->item->key);
free(temp->item->value);
free(temp->item);
free(temp);
return it;
}
void free_linkedlist(LinkedList *list)
{
LinkedList *temp = list;
while (list)
{
temp = list;
list = list->next;
free(temp->item->key);
free(temp->item->value);
free(temp->item);
free(temp);
}
}
LinkedList **create_overflow_buckets(HashTable *table)
{
// Create the overflow buckets; an array of LinkedLists.
LinkedList **buckets = (LinkedList **)calloc(table->size, sizeof(LinkedList *));
for (int i = 0; i < table->size; i++)
buckets[i] = NULL;
return buckets;
}
void free_overflow_buckets(HashTable *table)
{
// Free all the overflow bucket lists.
LinkedList **buckets = table->overflow_buckets;
for (int i = 0; i < table->size; i++)
free_linkedlist(buckets[i]);
free(buckets);
}
Ht_item *create_item(char *key, char *value)
{
// Creates a pointer to a new HashTable item.
Ht_item *item = (Ht_item *)malloc(sizeof(Ht_item));
item->key = (char *)malloc(strlen(key) + 1);
item->value = (char *)malloc(strlen(value) + 1);
strcpy(item->key, key);
strcpy(item->value, value);
return item;
}
HashTable *create_table(int size)
{
// Creates a new HashTable.
HashTable *table = (HashTable *)malloc(sizeof(HashTable));
table->size = size;
table->count = 0;
table->items = (Ht_item **)calloc(table->size, sizeof(Ht_item *));
for (int i = 0; i < table->size; i++)
table->items[i] = NULL;
table->overflow_buckets = create_overflow_buckets(table);
return table;
}
void free_item(Ht_item *item)
{
// Frees an item.
free(item->key);
free(item->value);
free(item);
}
void free_table(HashTable *table)
{
// Frees the table.
for (int i = 0; i < table->size; i++)
{
Ht_item *item = table->items[i];
if (item != NULL)
free_item(item);
}
// Free the overflow bucket lists and its items.
free_overflow_buckets(table);
free(table->items);
free(table);
}
void handle_collision(HashTable *table, unsigned long index, Ht_item *item)
{
LinkedList *head = table->overflow_buckets[index];
if (head == NULL)
{
// Creates the list.
head = allocate_list();
head->item = item;
table->overflow_buckets[index] = head;
return;
}
else
{
// Insert to the list.
table->overflow_buckets[index] = linkedlist_insert(head, item);
return;
}
}
void ht_insert(HashTable *table, char *key, char *value)
{
// Creates the item.
Ht_item *item = create_item(key, value);
// Computes the index.
int index = hash_function(key);
Ht_item *current_item = table->items[index];
if (current_item == NULL)
{
// Key does not exist.
if (table->count == table->size)
{
// HashTable is full.
printf("Insert Error: Hash Table is full\n");
free_item(item);
return;
}
// Insert directly.
table->items[index] = item;
table->count++;
}
else
{
// Scenario 1: Update the value.
if (strcmp(current_item->key, key) == 0)
{
strcpy(table->items[index]->value, value);
return;
}
else
{
// Scenario 2: Handle the collision.
handle_collision(table, index, item);
return;
}
}
}
char *ht_search(HashTable *table, char *key)
{
// Searches for the key in the HashTable.
// Returns NULL if it doesn't exist.
int index = hash_function(key);
Ht_item *item = table->items[index];
LinkedList *head = table->overflow_buckets[index];
// Provide only non-NULL values.
if (item != NULL)
{
if (strcmp(item->key, key) == 0)
return item->value;
if (head == NULL)
return NULL;
item = head->item;
head = head->next;
}
return NULL;
}
void ht_delete(HashTable *table, char *key)
{
// Deletes an item from the table.
int index = hash_function(key);
Ht_item *item = table->items[index];
LinkedList *head = table->overflow_buckets[index];
if (item == NULL)
{
// Does not exist.
return;
}
else
{
if (head == NULL && strcmp(item->key, key) == 0)
{
// Collision chain does not exist.
// Remove the item.
// Set table index to NULL.
table->items[index] = NULL;
free_item(item);
table->count--;
return;
}
else if (head != NULL)
{
// Collision chain exists.
if (strcmp(item->key, key) == 0)
{
// Remove this item.
// Set the head of the list as the new item.
free_item(item);
LinkedList *node = head;
head = head->next;
node->next = NULL;
table->items[index] = create_item(node->item->key, node->item->value);
free_linkedlist(node);
table->overflow_buckets[index] = head;
return;
}
LinkedList *curr = head;
LinkedList *prev = NULL;
while (curr)
{
if (strcmp(curr->item->key, key) == 0)
{
if (prev == NULL)
{
// First element of the chain.
// Remove the chain.
free_linkedlist(head);
table->overflow_buckets[index] = NULL;
return;
}
else
{
// This is somewhere in the chain.
prev->next = curr->next;
curr->next = NULL;
free_linkedlist(curr);
table->overflow_buckets[index] = head;
return;
}
}
curr = curr->next;
prev = curr;
}
}
}
}
void print_search(HashTable *table, char *key)
{
char *val;
if ((val = ht_search(table, key)) == NULL)
{
printf("Key:%s does not exist\n", key);
return;
}
else
{
printf("Key:%s, Value:%s\n", key, val);
}
}
void print_table(HashTable *table)
{
printf("\nHash Table\n-------------------\n");
for (int i = 0; i < table -> size; i++)
{
if (table -> items[i])
{
printf("Index:%d, Key:%s, Value:%s\n", i, table -> items[i] -> key, table -> items[i] -> value);
}
}
printf("-------------------\n\n");
}
int main()
{
HashTable *ht = create_table(CAPACITY);
ht_insert(ht, (char *)"1", (char *)"First address");
ht_insert(ht, (char *)"2", (char *)"Second address");
ht_insert(ht, (char *)"Hel", (char *)"Third address");
ht_insert(ht, (char *)"Cau", (char *)"Fourth address");
print_search(ht, (char *)"1");
print_search(ht, (char *)"2");
print_search(ht, (char *)"3");
print_search(ht, (char *)"Hel");
print_search(ht, (char *)"Cau"); // Collision!
print_table(ht);
ht_delete(ht, (char *)"1");
ht_delete(ht, (char *)"Cau");
print_table(ht);
free_table(ht);
return 0;
}
这段代码将产生如下输出:
OutputKey:1, Value:First address
Key:2, Value:Second address
Key:3 does not exist
Key:Hel, Value:Third address
Key:Cau does not exist
Hash Table
-------------------
Index:49, Key:1, Value:First address
Index:50, Key:2, Value:Second address
Index:281, Key:Hel, Value:Third address
-------------------
Hash Table
-------------------
Index:50, Key:2, Value:Second address
Index:281, Key:Hel, Value:Third address
-------------------
ht_insert(),ht_search(),和ht_delete按照预期工作。
结论
在这篇文章中,您使用C/C++从头开始实现了一个哈希表。
尝试使用其他的碰撞处理算法和不同的哈希函数进行实验。继续通过更多的C++教程来学习。