Assignment 2 FAQs
- For assignment 2, all the query questions say to "retrieve all customers/tours/etc.". Is it sufficient to retrieve just the primary keys, e.g. PID's, TID's?
It is sufficient to retrieve PID, TID, etc. The point is to get the correct tuples from your queries.
- For q1: "List all customers (pid) and their city address, grouped by city."
Is it asking to SELECT PID, Address.CITY only? Or does it want more information from the address?
Displaying City is good enough.
- For q8: For each of the highest paid tour guides, return the number of tours each guide has worked (use the label numTours).
Do we retrieve TourGuides that was paid the most for a single Tour or the total amount money a TourGuide got from all the Tours that they've guided?
Retrieve TourGuides that was paid the most for a single Tour.
- Does the relational algebra need to be identical to the SQL queries, or can we use a relational algebra expression that is equivalent to our SQL but uses different operators?
Yes, you can just use equivalent relational algebra expression since not all SQL operators can be directly translated into relational algebra operators. Sometimes you need work around to get the same results.
- How do I express HAVING clause in relational algebra (i.e. SELECT ID FROM R GROUP BY ID HAVING COUNT(*)>2)?
You could use relational algebra aggregation operator (gamma) to get aggregated function and assign (use arrow) it to a name c, filter out (sigma_c>2) the correct tuples, and then project (Pi) out the desired attributes.
- For q4, it says to list the customers that have participated in at least 2 tours. Does that mean we're supposed to check that they participated in 2+ "Past" tours, or do we include the "In-Progress" and "Future" tours in the count too?
Include all the tours, not only "Past" tours.
- For part I, How do I know whether I should retain the duplicates or not in the results for SQL queries?
Unless it's specified in the question (q7 and q10), please makes sure no duplicates in the results.
- How do I retain the duplicates in relational algebra expression?
You just need to specify you are using bag/multiset semantic of the relational algebra, as default is set semantic.
- For SQL queries, I'm supposed to submit a txt file. How will you be running these queries? Is it okay to just have them listed in the txt file for you to read or are you going to run the txt file.
I'm going to run your script directly, so please make sure it's runnable on db2 (i.e. include db connections, etc.) Non executable queries will be deducted at least half of the marks. It doesn't matter what your script is called, as long as it's a ASCII text type file, you can run it as db2 -n -t -f queries.txt on db2.