Test 4 Review A: Questions

R4a.1.

Suppose we have a ten-bucket dictionary for storing integers, using a hash code that simply uses the integer. How will the dictionary appear internally after the following integers are inserted into it?

2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47

R4a.2.

Suppose we have a five-bucket dictionary for storing strings; each string's hash code is simply the length of the string. How will the dictionary appear internally after the following strings are inserted into it?

ab, abbot, abdomen, abecedarian, abhors

R4a.3.

Describe step-by-step the process used to find the value associated with a given key in a dictionary.

R4a.4.

Suppose we know our hash function will give us an integer between 0 and MAX. We discussed how a dictionary would map this to a valid bucket using the remainder (modulus) operator ‘%’.

index = hash_code % num_buckets

The following is an alternative way to map integers to buckets.

index = int(num_buckets * (hash_code / (MAX + 1)))

Both map hash codes into valid buckets. Neglecting concerns about the computational efficiency, which is likely to work better, and why?

R4a.5.

Assuming scores.txt is a file of integers, with one integer on each line, complete the below Python fragment so that it computes and displays the average of the numbers in the file.

data = open('scores.txt')

R4a.6.

The file words.txt is a file containing several lines with one English word on each line. Complete the below Python fragment so that it displays all six-letter words beginning with q.

dictionary = open('words.txt')

R4a.7.

Suppose we have a file page.html encoded in HTML, such as the following.

<p><a href="http://www.cburch.com/cs/150/">CSCI 150</a> is taught at <a href="http://www.hendrix.edu/">Hendrix College.</p>

As you can see, there are some instances of “href="…"”. Complete the following program to display what is in each set of quotes following “href=”.

import re

html_file = open("page.html")

R4a.8.

Complete the following function to implement sequential search, returning the index within values where query may be found or −1 if query is not present.

def sequential_search(values, query):

R4a.9.

Suppose we were to use binary search to check whether 20 is in the following list.

0, 1, 3, 5, 8, 10, 11, 12, 15, 16, 22, 26, 31, 34, 35

List the integers from the list that binary search would examine, in the order they are examined.

R4a.10.

If we used binary search to search among 511 entries in a sorted list, what is the maximum number of entries that would have to be examined to conclude whether the query is in the list? What if we used sequential search? Explain why in each case.

R4a.11.

In your own words, describe how selection sort works.

R4a.12.

Complete the following function so that it uses selection sort to reorder the elements of its parameter list.

def selection_sort(data):

Test 4 Review A: Solutions

R4a.1.

(The order of the numbers within each bucket is not important.)

R4a.2.

(The order of the strings within each bucket is not important.)

R4a.3.

We first apply the hash function to the key to get a corresponding integer, and then we determine the remainder when this integer is divided by the dictionary's number of buckets. We use this index to determine which bucket to search. We step through each entry in the bucket to see whether the key is in that entry; when we find the key, we return the value associated with the key in that entry. If we do not find the key, we raise an error.

R4a.4.

The first technique (finding the remainder) is likely to work better. The second technique will tend to map hash codes which are close to each other into the same bucket. For example, if we have a number of values whose hash codes tend to be small (between 0 and 100), then everything would map into index 0.

[If being able to iterate through keys in order of their hash code is useful, the second technique has some usefulness. However, this isn't very likely to be desirable.]

R4a.5.

line_count = 0

total = 0

for line in data:

    line_count += 1

    total += float(line.rstrip())

print('Average is {0}'.format(total / line_count))

R4a.6.

for line in dictionary:

    word = line.rstrip()

    if len(word) == 6 and word[0] == 'q':

        print(word)

R4a.7.

html_text = html_file.read()

hrefs = re.findall('href="[^"]*"', html_text)

for match in hrefs:

    print(match[6:-1])

R4a.8.

def sequential_search(values, query):

    for i in range(0, len(values)):

        if values[i] == query:

            return i

    return -1

R4a.9.

12, 26, 16, 22.

lo hi mid check

0 14 7 12

8 14 11 26

8 10 9 16

10 10 10 22

lo	hi	mid	check
0	14	7	12
8	14	11	26
8	10	9	16
10	10	10	22

R4a.10.

Binary search would examine at most 9 entries: Initially, we'd have a range of 511 entries. After the first examination, the range would be halved to at most 255 entries. The range would decrease to at most 127 after the second, to 63 after the third, to 31 after the fourth, to 15 after the fifth, to 7 after the sixth, to 3 after the seventh, to 1 after the eighth, so the ninth would either hit the query or we'd conclude that the query is not included.

Sequential search might require 512 entries, since if the query is greater than the largest entry it would end up examining every single entry once.

R4a.11.

Given a list to sort, we first determine where the smallest value is in the list; we do this starting with the assumption that the first value (index 0) is the smallest, but then we go through each subsequent index to see whether the value at that index is less than the value we were thinking is smallest — and when it is, we update our assumed smallest to be the element at that index. Once locating the smallest element, and we swap it into the first location (index 0) in the list.

Then we repeat the process to find the smallest element in the second and subsequent locations of the list (indices 1 and higher); we swap that into the second location (index 1).

We continue repeating this process starting from the third location; and again for the fourth; and so on, until the list segment for which we are determining the smallest contains just one element.

R4a.12.

def selection_sort(data):

    for i in range(len(data) - 1):

        minval = data[i]

        mini = i

        for j in range(i + 1, len(data)):

            if data[j] < minval:

                minval = data[j]

                mini = j

        data[mini] = data[i]

        data[i] = minval