C Programming Inteview Questions: What is hashing? in C programming

Sunday, 6 November 2011

What is hashing? in C programming

To hash means to grind up, and that’s essentially what hashing is all about. The heart of a hashing algorithm
is a hash function that takes your nice, neat data and grinds it into some random-looking integer.

The idea behind hashing is that some data either has no inherent ordering (such as images) or is expensive
to compare (such as images). If the data has no inherent ordering, you can’t perform comparison searches.
If the data is expensive to compare, the number of comparisons used even by a binary search might be too
many. So instead of looking at the data themselves, you’ll condense (hash) the data to an integer (its hash
value) and keep all the data with the same hash value in the same place. This task is carried out by using the
hash value as an index into an array.

To search for an item, you simply hash it and look at all the data whose hash values match that of the data
you’re looking for. This technique greatly lessens the number of items you have to look at. If the parameters
are set up with care and enough storage is available for the hash table, the number of comparisons needed
to find an item can be made arbitrarily close to one. Listing III.6 shows a simple hashing algorithm. You can
combine this example with code at the end of this chapter to produce a working program

One aspect that affects the efficiency of a hashing implementation is the hash function itself. It should ideally distribute data randomly throughout the entire hash table, to reduce the likelihood of collisions. Collisions occur when two different keys have the same hash value. There are two ways to resolve this problem. In “open
addressing,” the collision is resolved by the choosing of another position in the hash table for the element inserted later. When the hash table is searched, if the entry is not found at its hashed position in the table, the search continues checking until either the element is found or an empty position in the table is found. The second method of resolving a hash collision is called “chaining.” In this method, a “bucket” or linkedlist holds all the elements whose keys hash to the same value. When the hash table is searched, the list must
be searched linearly.

A simple example of a hash algorithm.

1: #include <stdlib.h>
2: #include <string.h>
3: #include “list.h”
4: #include “hash.h”
5:
6: #define HASH_SIZE 1024
7:
8: static listnode_t *hashTable[HASH_SIZE];
9:
10: void insert(const char *s)
11: {
12: listnode_t *ele = newNode((void *) s);
13: unsigned int h = hash(s) % HASH_SIZE;
14:
15: ele->next = hashTable[h];
16: hashTable[h] = ele;
17: }
18:
19: void print(void)
20: {
21: int h;
22:
23: for (h = 0; h < HASH_SIZE; h++)
24: {
25: listnode_t *lp = hashTable[h];
26:
27: if (lp == NULL)
28: continue;
29: printf(“[%d]”, h);
30: while (lp)
31: {
32: printf(“\t’%s’”, lp->u.str);
33: lp = lp->next;
34: }
35: putchar(‘\n’);
36: }
37: }
38:
39: const char *search(const char *s)
40: {
41: unsigned int h = hash(s) % HASH_SIZE;
42: listnode_t *lp = hashTable[h];
43:
44: while (lp)
45: {
46: if (!strcmp(s, lp->u.str))
47: return lp->u.str;
48: lp = lp->next;
49: }
50: return NULL;
51: }

Cross Reference:

III.4: What is the easiest searching method to use?
III.5: What is the quickest searching method to use?
III.8: How can I search for data in a linked list?

Sunday, 6 November 2011

What is hashing? in C programming

No comments:

Post a Comment