# COMPSCI 186 Study Guide - Midterm Guide: Sorting Algorithm, Good Luck!!, Foreign Key

University of California, Berkeley

CS 186 Intro to Database Systems, Fall 2012, Prof. Michael J. Franklin

MIDTERM I - Questions

This is a closed book examination – but you are allowed one 8.5” x 11” sheet of notes (double

sided). You should answer as many questions as possible. Partial credit will be given where

appropriate. There are 100 points in all. You should read all of the questions before starting the

exam, as some of the questions are substantially more time-consuming than others.

Write all of your answers on the SEPARATE ANSWER SHEET. We will be grading only the

answer sheets. You must put your CS 186 Class account on the answer sheet (Question 0).

GOOD LUCK!!!!

Question 1 – Sorting/Hashing [6 parts, 24 points total]

You have recently started a new company to help people sort their data. Currently, your

servers have 4KB disk blocks, and have 800KB of memory available for sorting.

Your pricing plan is as follows:

You charge $1 for every I/O request that gets performed during sorting.

You do not charge to store data on disk!

You charge only for sorting (including the cost of writing your sorted output.)

a) [3 points] Initially, your company only attracted only small users. Your first user wanted

to sort a file that was 640KB in size. How much did your company earn?

b) [3 points] As news of your company’s awesome sorting service spreads, you begin

attracting larger and larger customers. Your next customer had a huge file of 1200 KB. How

much did you earn?

c) [5 points] Using the optimization of tournament sort/heapsort which produces sorted

runs of average size 2B in Pass 0, how much would you earn with the 1200KB file?

d) [5 points] Mr. X wants the biggest bang for his buck. What is the size in KB of the biggest

file he could sort for $100,000 using the original sorting algorithm (runs of size B in Pass 0)?

UC Berkeley now decides to use your service. The Chancellor of Berkeley wishes to know

the distribution of the hometowns of all the students. Since he doesn’t care about any form

of ordering, Berkeley decides to hash the students into groups.

Each student tuple consists of (SID, name, gpa, major, hometown, address, photo). Due to

the extra fields, the size of each student tuple is about 10KB. The number of students in

Berkeley is 50,000. Being your alma matter, you give them a special server that has 100KB

disk blocks and 101 buffer (memory) pages that can hold 100KB of data each.

e) [4 points] How many times do we have to run the Partitioning stage of hashing to hash

the students, assuming all the partitions end up being the same length?

f) [4 points] What will be the size in KB of the average partition at the start of the ReHash

stage?

CS 186 Midterm I September 25, 2012 Page 2 of 7

Question 2 – Schema Refinement and Normalization [6 parts, 22 points total]

Consider the relation R with attributes A B C D E F

and with the functional dependencies: {A → BF, B → F, CD → E, DE → F }

a) [4 points] Give the attribute closure of CD, also written as CD+. (In other words, given

just CD, what can we derive?)

b) [4 points] There exists a single candidate key (i.e., minimal superkey) for R. What is it?

c) [3 points] Consider the decomposition of relation R into tables: ABF, BF, CDE, DEF. For

each of the following, indicate on the answer sheet if it is True or False.

i. All relations in the decomposition are in BCNF.

ii. The decomposition is Lossless Join.

iii The decomposition is dependency preserving.

d) [3 points] Consider the decomposition of relation R into tables: ACE, BDF. For each of

the following, indicate on the answer sheet if it is True or False.

i. All relations in the decomposition are in BCNF.

ii. The decomposition is Lossless Join.

iii The decomposition is dependency preserving.

e) [4 points] Consider the decomposition of relation R into two tables: ABC, DEF.

Which dependency or dependencies causes this decomposition to violate BCNF? (0 or

more answers may be correct)

(A) A → BF

(B) B → F

(C) CD → E

(D) DE → F

f) [4 points] Perform a BCNF decomposition, considering the functional dependencies as

shown from left to right. Show the final decomposed schema on the answer sheet.

## Document Summary

Cs 186 intro to database systems, fall 2012, prof. michael j. franklin. This is a closed book examination but you are allowed one 8. 5 x 11 sheet of notes (double sided). You should answer as many questions as possible. You should read all of the questions before starting the exam, as some of the questions are substantially more time-consuming than others. Write all of your answers on the separate answer sheet. We will be grading only the answer sheets. You must put your cs 186 class account on the answer sheet (question 0). Question 1 sorting/hashing [6 parts, 24 points total] You have recently started a new company to help people sort their data. Currently, your servers have 4kb disk blocks, and have 800kb of memory available for sorting. You charge for every i/o request that gets performed during sorting. You do not charge to store data on disk!