A binary search, such as bsearch() performs, is much faster than a linear search. A hashing algorithm can
provide even faster searching. One particularly interesting and fast method for searching is to keep the data
in a “digital trie.” A digital trie offers the prospect of being able to search for an item in essentially a constant
amount of time, independent of how many items are in the data set.
provide even faster searching. One particularly interesting and fast method for searching is to keep the data
in a “digital trie.” A digital trie offers the prospect of being able to search for an item in essentially a constant
amount of time, independent of how many items are in the data set.
A digital trie combines aspects of binary searching, radix searching, and hashing. The term “digital trie” refers
to the data structure used to hold the items to be searched. It is a multilevel data structure that branches N
ways at each level (in the example that follows, each level branches from 0 to 16 ways). The subject of treelike
to the data structure used to hold the items to be searched. It is a multilevel data structure that branches N
ways at each level (in the example that follows, each level branches from 0 to 16 ways). The subject of treelike
data structures and searching is too broad to describe fully here, but a good book on data structures or
algorithms can teach you the concepts involved.
Listing III.5 shows a program implementing digital trie searching. You can combine this example with code
at the end of the chapter to produce a working program. The concept is not too hard. Suppose that you use
a hash function that maps to a full 32-bit integer. The hash value can also be considered to be a concatenation
of eight 4-bit hash values. You can use the first 4-bit hash value to index into a 16-entry hash table.
Naturally, there will be many collisions, with only 16 entries. Collisions are resolved by having the table entry
point to a second 16-entry hash table, in which the next 4-bit hash value is used as an index.
The tree of hash tables can be up to eight levels deep, after which you run out of hash values and have to search
through all the entries that collided. However, such a collision should be very rare because it occurs only when
all 32 bits of the hash value are identical, so most searches require only one comparison.
The binary searching aspect of the digital trie is that it is organized as a 16-way tree that is traversed to find
the data. The radix search aspect is that successive 4-bit chunks of the hash value are examined at each level
in the tree. The hashing aspect is that it is conceptually a hash table with 232 entries
An implementation of digital trie searching.
1: #include <stdlib.h>
2: #include <string.h>
3: #include “list.h”
4: #include “hash.h”
5:
6: /*
7: * NOTE: This code makes several assumptions about the
8: * compiler and machine it is run on. It assumes that
9: *
10: * 1. The value NULL consists of all “0” bits.
11: *
12: * If not, the calloc() call must be changed to
13: * explicitly initialize the pointers allocated.
14: *
15: * 2. An unsigned and a pointer are the same size.
16: *
17: * If not, the use of a union might be incorrect, because
18: * it is assumed that the least significant bit of the
19: * pointer and unsigned members of the union are the
20: * same bit.
21: *
22: * 3. The least significant bit of a valid pointer
23: * to an object allocated on the heap is always 0.
24: *
25: * If not, that bit can’t be used as a flag to determine
26: * what type of data the union really holds.
27: */
28:
29: /* number of bits examined at each level of the trie */
30: #define TRIE_BITS 4
31:
32: /* number of subtries at each level of the trie */
33: #define TRIE_FANOUT (1 << TRIE_BITS)
34:
35: /* mask to get lowest TRIE_BITS bits of the hash */
36: #define TRIE_MASK (TRIE_FANOUT - 1)
37:
38: /*
39: * A trie can be either a linked list of elements or
40: * a pointer to an array of TRIE_FANOUT tries. The num
41: * element is used to test whether the pointer is even
42: * or odd.
43: */
44: typedef union trie_u {
45: unsigned num;
46: listnode_t *list; /* if “num” is even */
47: union trie_u *node; /* if “num” is odd */
48: } trie_t;
49:
50: /*
51: * Inserts an element into a trie and returns the resulting
52: * new trie. For internal use by trieInsert() only.
53: */
54: static trie_t eleInsert(trie_t t, listnode_t *ele, unsigned h,
55: int depth)
56: {
57: /*
58: * If the trie is an array of tries, insert the
59: * element into the proper subtrie.
60: */
61: if (t.num & 1)
62: {
63: /*
64: * nxtNode is used to hold the pointer into
65: * the array. The reason for using a trie
66: * as a temporary instead of a pointer is
67: * it’s easier to remove the “odd” flag.
68: */
69: trie_t nxtNode = t;
70:
71: nxtNode.num &= ~1;
72: nxtNode.node += (h >> depth) & TRIE_MASK;
73: *nxtNode.node =
74: eleInsert(*nxtNode.node,
75: ele, h, depth + TRIE_BITS);
76: }
77: /*
78: * Since t wasn’t an array of tries, it must be a
79: * list of elements. If it is empty, just add this
80: * element.
81: */
82: else if (t.list == NULL)
83: t.list = ele;
84: /*
85: * Since the list is not empty, check whether the
86: * element belongs on this list or whether you should
31:
32: /* number of subtries at each level of the trie */
33: #define TRIE_FANOUT (1 << TRIE_BITS)
34:
35: /* mask to get lowest TRIE_BITS bits of the hash */
36: #define TRIE_MASK (TRIE_FANOUT - 1)
37:
38: /*
39: * A trie can be either a linked list of elements or
40: * a pointer to an array of TRIE_FANOUT tries. The num
41: * element is used to test whether the pointer is even
42: * or odd.
43: */
44: typedef union trie_u {
45: unsigned num;
46: listnode_t *list; /* if “num” is even */
47: union trie_u *node; /* if “num” is odd */
48: } trie_t;
49:
50: /*
51: * Inserts an element into a trie and returns the resulting
52: * new trie. For internal use by trieInsert() only.
53: */
54: static trie_t eleInsert(trie_t t, listnode_t *ele, unsigned h,
55: int depth)
56: {
57: /*
58: * If the trie is an array of tries, insert the
59: * element into the proper subtrie.
60: */
61: if (t.num & 1)
62: {
63: /*
64: * nxtNode is used to hold the pointer into
65: * the array. The reason for using a trie
66: * as a temporary instead of a pointer is
67: * it’s easier to remove the “odd” flag.
68: */
69: trie_t nxtNode = t;
70:
71: nxtNode.num &= ~1;
72: nxtNode.node += (h >> depth) & TRIE_MASK;
73: *nxtNode.node =
74: eleInsert(*nxtNode.node,
75: ele, h, depth + TRIE_BITS);
76: }
77: /*
78: * Since t wasn’t an array of tries, it must be a
79: * list of elements. If it is empty, just add this
80: * element.
81: */
82: else if (t.list == NULL)
83: t.list = ele;
84: /*
85: * Since the list is not empty, check whether the
86: * element belongs on this list or whether you should
87: * make several lists in an array of subtries.
88: */
89: else if (h == hash(t.list->u.str))
90: {
91: ele->next = t.list;
92: t.list = ele;
93: }
94: else
95: {
96: /*
97: * You’re making the list into an array or
98: * subtries. Save the current list, replace
99: * this entry with an array of TRIE_FANOUT
100: * subtries, and insert both the element and
101: * the list in the subtries.
102: */
103: listnode_t *lp = t.list;
104:
105: /*
106: * Calling calloc() rather than malloc()
107: * ensures that the elements are initialized
108: * to NULL.
109: */
110: t.node = (trie_t *) calloc(TRIE_FANOUT,
111: sizeof(trie_t));
112: t.num |= 1;
113: t = eleInsert(t, lp, hash(lp->u.str),
114: depth);
115: t = eleInsert(t, ele, h, depth);
116: }
117: return t;
118: }
119:
120: /*
121: * Finds an element in a trie and returns the resulting
122: * string, or NULL. For internal use by search() only.
123: */
124: static const char * eleSearch(trie_t t, const char * string,
125: unsigned h, int depth)
126: {
127: /*
128: * If the trie is an array of subtries, look for the
129: * element in the proper subtree.
130: */
131: if (t.num & 1)
132: {
133: trie_t nxtNode = t;
134: nxtNode.num &= ~1;
135: nxtNode.node += (h >> depth) & TRIE_MASK;
136: return eleSearch(*nxtNode.node,
137: string, h, depth + TRIE_BITS);
138: }
139: /*
140: * Otherwise, the trie is a list. Perform a linear
141: * search for the desired element.
142: */
143: else
88: */
89: else if (h == hash(t.list->u.str))
90: {
91: ele->next = t.list;
92: t.list = ele;
93: }
94: else
95: {
96: /*
97: * You’re making the list into an array or
98: * subtries. Save the current list, replace
99: * this entry with an array of TRIE_FANOUT
100: * subtries, and insert both the element and
101: * the list in the subtries.
102: */
103: listnode_t *lp = t.list;
104:
105: /*
106: * Calling calloc() rather than malloc()
107: * ensures that the elements are initialized
108: * to NULL.
109: */
110: t.node = (trie_t *) calloc(TRIE_FANOUT,
111: sizeof(trie_t));
112: t.num |= 1;
113: t = eleInsert(t, lp, hash(lp->u.str),
114: depth);
115: t = eleInsert(t, ele, h, depth);
116: }
117: return t;
118: }
119:
120: /*
121: * Finds an element in a trie and returns the resulting
122: * string, or NULL. For internal use by search() only.
123: */
124: static const char * eleSearch(trie_t t, const char * string,
125: unsigned h, int depth)
126: {
127: /*
128: * If the trie is an array of subtries, look for the
129: * element in the proper subtree.
130: */
131: if (t.num & 1)
132: {
133: trie_t nxtNode = t;
134: nxtNode.num &= ~1;
135: nxtNode.node += (h >> depth) & TRIE_MASK;
136: return eleSearch(*nxtNode.node,
137: string, h, depth + TRIE_BITS);
138: }
139: /*
140: * Otherwise, the trie is a list. Perform a linear
141: * search for the desired element.
142: */
143: else
144: {
145: listnode_t *lp = t.list;
146:
147: while (lp)
148: {
149: if (strcmp(lp->u.str, string) == 0)
150: return lp->u.str;
151: lp = lp->next;
152: }
153: }
154: return NULL;
155: }
156:
157: /* Test function to print the structure of a trie */
158: void triePrint(trie_t t, int depth)
159: {
160: if (t.num & 1)
161: {
162: int i;
163: trie_t nxtNode = t;
164: nxtNode.num &= ~1;
165: if (depth)
166: printf(“\n”);
167: for (i = 0; i < TRIE_FANOUT; i++)
168: {
169: if (nxtNode.node[i].num == 0)
170: continue;
171: printf(“%*s[%d]”, depth, “”, i);
172: triePrint(nxtNode.node[i], depth + 8);
173: }
174: }
175: else
176: {
177: listnode_t *lp = t.list;
178: while (lp)
179: {
180: printf(“\t’%s’”, lp->u.str);
181: lp = lp->next;
182: }
183: putchar(‘\n’);
184: }
185: }
186:
187: static trie_t t;
188:
189: void insert(const char *s)
190: {
191: t = eleInsert(t, newNode((void *) s), hash(s), 0);
192: }
193:
194: void print(void)
195: {
196: triePrint(t, 0);
197: }
145: listnode_t *lp = t.list;
146:
147: while (lp)
148: {
149: if (strcmp(lp->u.str, string) == 0)
150: return lp->u.str;
151: lp = lp->next;
152: }
153: }
154: return NULL;
155: }
156:
157: /* Test function to print the structure of a trie */
158: void triePrint(trie_t t, int depth)
159: {
160: if (t.num & 1)
161: {
162: int i;
163: trie_t nxtNode = t;
164: nxtNode.num &= ~1;
165: if (depth)
166: printf(“\n”);
167: for (i = 0; i < TRIE_FANOUT; i++)
168: {
169: if (nxtNode.node[i].num == 0)
170: continue;
171: printf(“%*s[%d]”, depth, “”, i);
172: triePrint(nxtNode.node[i], depth + 8);
173: }
174: }
175: else
176: {
177: listnode_t *lp = t.list;
178: while (lp)
179: {
180: printf(“\t’%s’”, lp->u.str);
181: lp = lp->next;
182: }
183: putchar(‘\n’);
184: }
185: }
186:
187: static trie_t t;
188:
189: void insert(const char *s)
190: {
191: t = eleInsert(t, newNode((void *) s), hash(s), 0);
192: }
193:
194: void print(void)
195: {
196: triePrint(t, 0);
197: }
198:
199: const char *search(const char *s)
200: {
201: return eleSearch(t, s, hash(s), 0);
202: }
199: const char *search(const char *s)
200: {
201: return eleSearch(t, s, hash(s), 0);
202: }
Cross Reference:
III.4: What is the easiest searching method to use?
III.6: What is hashing?
III.8: How can I search for data in a linked list?
No comments:
Post a Comment