CPSC 310: Database Systems /
CPSC 603: Database Systems and Applications
Fall 2007
Project: Personal Database Application (PDA)

Source: Arthur Keller, CMPS 180, University of California Santa Cruz, Winter 2002, http://www.cse.ucsc.edu/classes/cmps180/Winter02/. Adapted for TAMU CS Department.

[Overview] [Logistics] [Part 1] [Part 2] [Part 3] [Part 4] [Part 5]


Your CPSC 310/603 programming project will be to build a substantial database application for a real-world domain of your choosing. You will design a schema on paper. Then, you will create an actual database using MySQL. You will populate the database, write interactive queries and modifications on the database, and develop programs that manipulate the database.

Part 1: Choose a project and design an E/R diagram. Due Tue, 9/18/07.

Part 2: Relational design. Due Tue, 10/02/07.

Part 3: Create database in MySQL. Due Tue, 10/23/07. (*new due date*)

Part 4: Queries, updates, and indexes. Due Thu, 11/08/07. (*even newer due date*)

Part 5: Embedded SQL. Due Tue, 12/4/07. (*newest due date*)

The project is worth 35% of your course grade, broken down like this: Parts 1, 2 and 3 are worth 5% each. Parts 4 and 5 are worth 10% each.

Back to beginning


Back to beginning

Part 1: E/R Diagram

Due: Tuesday, Sep 18, 2007, 12:45 PM.
What to turn in:
Your first step is to identify the domain you would like to manage with your database, and to construct an Entity-Relationship diagram for the database. We suggest that you pick an application that you will enjoy working with, since you'll be stuck with it for the whole semester! It's especially nice if you pick an application where you can populate your database using real, as opposed to fabricated, data. As the project progresses, you'll end up creating two actual databases, a small one (10's of entities/relationships) and a large one (1000's or 10,000's of entities/relationships). Many students find that it only makes sense for their small database to contain realistic data, while the large one is made up of synthetic (computer-generated fake) data. If you have an application where you can get a large amount of real data, all the better, but it's not necessary.

Try to pick an application with a schema that is relatively substantial, but not too enormous. For example, your E/R design should have in the range of six or so entity sets, and a similar number of relationship sets. This is a ballpark figure only - shooting for somewhere between 5 and 10 is fine - you'll sense if your design is too simple or too complex. You should certainly include different kinds of relationships (e.g., many-one, many-many) and different kinds of data (strings, integers, etc.), but your application need not necessarily require advanced features such as weak entity sets, "is-a" relationships, or roles.

  1. Write a short description (no more than one page) of the database application you propose to work with throughout the course. Your description should be brief, relatively complete and clear. If there are any unique or particularly difficult aspects of your proposed application, please point them out. This part is worth 1 point and your description will be graded on suitability and conciseness.
  2. Specify an E/R diagram for your proposed database. Don't forget to underline key attributes for entity sets and include arrowheads indicating the multiplicity of relationship sets. If there are weak entity sets or "is-a" relationships, make sure to notate them appropriately. This part is worth 4 points. Points will be subtracted for poor or wrong design, wrong shapes or arrows, missing or wrong key attributes, wrong relationships, not complex enough, etc.
If you're having trouble thinking of an application, take a look at any Web shopping site. They all have a similar theme: products, customers, orders, shopping baskets, etc., and typically make for an interesting and appropriately sized application. If you're still having trouble, or if you're unsure whether your proposed application is appropriate, please feel free to consult with the TA. In fact, to alleviate problems, we are encouraging all students to visit office hours to consult with the TA on your chosen application and E/R design. Coming up with a good design now will pay off greatly as the project progresses!!!


Back to beginning

Part 2: Relational Schema

Due: Tuesday, Oct. 2, 2007, 12:45 PM.

What to turn in:

In this second part of the project, you will produce a relational schema from the entity- relationship diagram you came up with in Part 1.
  1. Please attach a copy of your E/R diagram from Part 1. If you would like to make changes to your original E/R diagram at this point (due to staff feedback or any other reason), you may do so. The new E/R design will be used as a basis for grading part 2.

  2. Using the method for translating an E/R diagram to relations, produce a set of relations for your database design. As usual, please be sure to capitalize key attributes in your relations.

  3. For each relation in your schema, specify a set of completely nontrivial functional dependencies for the relation. Any functional dependencies that actually hold in the real- world scenario that you're modeling should be specified, or should follow from the specified dependencies. Don't worry if you find that some of your relations have no nontrivial functional dependencies.

  4. Is each relation in your schema in Boyce-Codd Normal Form (BCNF) with respect to the functional dependencies you specified? If not, decompose the relation into smaller relations so that each relation is in BCNF. Be sure to capitalize key attributes in your new relations. Don't worry if you don't have any BCNF violations - many PDAs will not have any.

  5. Are there any nontrivial multivalued dependencies that hold on any of the relations in your schema? (You needn't consider MVD's that are also functional dependencies.) If so, specify the multivalued dependencies, then decompose the relations into smaller ones so that each one is in Fourth Normal Form (4NF). Be sure to underline key attributes in your new relations. Don't worry if you don't have any 4NF violations - most PDAs will not have any.

  6. Now that you've decomposed your relations as far as possible, are there any relations that could be combined without introducing redundancy (i.e., without creating BCNF or 4NF violations)? If so, combine them.

  7. Is there anything you still don't like about the schema (e.g., attribute names, relation structure, etc.)? If so, modify the relational schema to something you prefer. You will be working with this schema quite a bit, so it's worth spending some time now to make sure you're happy with it.

Back to beginning

Part 3: Creating your Database in MySQL

Due: Tuesday, Oct. 23, 2007, 12:45 PM. (*new due date*)

What to turn in:

In this part of the project, you will create a relational schema for your PDA in the MySQL database system, and you will populate the tables in your database with initial data sets.
  1. Familiarize yourself with the MySQL relational DBMS by reading the on-line documentation (referenced above under logistics), logging into MySQL, trying some of the examples in the document, and experimenting with the various commands. You don't need to turn anything in for this part.

  2. Create relations for your PDA based on your final relational schema from Part 2. Use the CREATE TABLE command to specify each relation, its attributes and attribute types. If you have an attribute that represents a date and/or time, you may want to look at this page.

  3. Turn in a script log showing a MySQL session in which your relations are created successfully, as described next. For each relation in your PDA, create an execution script file and a few (approximately 5-10) records of "realistic" data. Then execute the script file from the mysql command line using the \T option. Turn in a listing showing the contents of the files you created, the successful loading of the data into MySQL, and the execution of "SELECT *" commands to show the contents of each relation.

  4. Write a program in any programming language you like that creates large files of records for each of your PDA relations. If you have available real data for your PDA, then your program will need to transform the data into files of records conforming to your PDA schema and to MySQL's load format. The rest of you will need to write a program to fabricate data: your program will generate either random or nonrandom (e.g., sequential) records conforming to your schema. Note that it is both fine and expected for your data values, strings especially, to be meaningless gibberish. The point of generating large amounts of data is so that you can experiment with a database of realistic size, rather than the small "toy" databases often used in classes. The data you generate and load should be on the order of: If your application naturally includes relations that are expected to be relatively small (e.g., schools within a university), then it is fine to use some small relations, but please ensure that you have relations of the sizes prescribed above as well. When writing a program to fabricate data, there are two important points to keep in mind:

Back to beginning

Part 4: Queries, Updates, and Indexes

Due: Thursday, Nov. 8, 2007, 12:45 PM (*even newer due date*)

What to turn in:

In this part of the project, you will issue SQL queries and updates against your PDA database, and you will experiment with the use of indexes. Since you will be modifying your data as part of this assignment, we strongly suggest that you adopt a routine for getting repeated "fresh" starts with MySQL. You need to save and keep all SQL commands such as creating/dropping database, creating tables, inserting real data and/or synthetic data, querying tuples, updating tuples, etc., in script files. If needed, you just can run these script files again in order to re-setup your database.

(a) Queries and Updates

Please note:

(b) Indexes

In Part 4 (a), you may have discovered that some queries run very slowly over your large database. As discussed in class, an important technique for improving the performance of queries is to create indexes. An index on an attribute A of relation R allows the database to quickly find all tuples in R with a given value for attribute A (which is useful when evaluating selection or join conditions involving attribute A). An index can be created on any attribute of any relation, or on several attributes combined. See this page for more information on how to create an index in MySQL.

  1. Create at least three useful indexes for your PDA. Run your queries from part 4(a) on your large database with the indexes and without the indexes. Turn in a script showing your commands to create indexes, and showing the relative times of query execution with and without indexes. Here too, please truncate any large query results.

  2. You can use an execution script file that creates your indexes to ease your testing. Use the \T command (Set outfile [to_outfile]. Append everything into given outfile) to call your queries script file both before and after you create the indexes. Surround the calls to execute the queries files with lines that execute time functions in MySQL; so you can measure elapsed time. You can then easily calculate elapsed time manually and add it to your script log as a comment. You can note your added comments by enclosing them with /*...*/. For an added challenge you can try to get your execution script file to calculate the elapsed time and display it for you.
Please note:

Back to beginning

Part 5: Using Java and Embedded SQL

Due: Tuesday, Dec. 4, 12:45 PM (*newest due date*)

What to turn in

Five-minute DEMO (5 pts)
You will show the following things to TA in a lab on the second floor in HRBB building.

In this part of the project, you will interact with your PDA database from an external program. Your task is to build a moderately user-friendly interactive application program front end to your PDA using the Java programming language. Your program should consist of a continuous loop in which: You should include both queries and modifications among your options. The user should not be aware of the database schema and is not to simply enter SQL commands. Both input and output should be in a format more convenient and pleasing than raw interactive SQL. As in Part 4, please include some "interesting" queries or modifications, i.e., operations that require some of the more complex SQL constructs such as subqueries, aggregates, set operators, etc. As a general example, if your PDA is a campus applicant database, then your interface might include in its menu a number of useful queries on the database, with some queries performing statistical analysis requiring multiple levels of grouping, and other queries being simpler.

Your application code should interact with the database using the JDBC call-level interface for Java programs. You can refer to following links.

We are not expecting anything particularly fancy in terms of the interface itself. For example, in Java a menu printed via print function is fine. Also, handling of SQL errors can be quite simple. You can write a routine that just prints the error message from MySQL, or model your error handler after a sample program.

Back to beginning