数据库系统概念复习题及答案

2024-08-15 来源：爱站旅游

导读数据库系统概念复习题及答案

第一题：

考虑用二元联系（图1）对三元联系（图2）的表示：

1) 分别给出E，A，B，C，RA，RB 和RC 的一个实例，这些实例不对应A， B，C 和R 的任何实例；

2) 更改图1 中的ER 图，引入适当的约束以确保满足约束的E，A，B，C， RA，RB 和RC 的任何实例都对应于A，B，C 和R 的一个实例； 3) 更改以上的转化以表示在三元联系上的全参与约束； 4) 以上表示要求为E创建一个主码属性，试问如何将E处理为弱实体集，以便不需要主码？解：

1）令 E = {e1, e2}, A = {a1, a2}, B = {b1}, C = {c1}, RA = {(e1, a1), (e2, a2)},Rb={(e1,b1)},

Rc={(e1,c1)}；

可以看出，由于元组（e2,a2）的原因，不存在任何实例对应于E,Ra,Rb,Rc

2）如下图所示：通过引入E 和关系 Ra ， Rb ， Rc之间的全部参与的约束条件，以便

在 E 中的每个元组都和A ，B，C有关系。

3）假设A全部参与关系R，则在A和Ra之间引入全部参与约束

将 E看作弱实体集，而将Ra,Rb,Rc看作标志联系集。如下图所示

第二题：

分别判断下列图中G1和G2是否互模拟（bisimulation），并说明理由

解: （１）在图中标出各点的状态,我们构造关系S={(P0,Q0),(P1,Q1),(P2,Q1),(P3,Q2),(P4,Q3)}

可知G2可以模拟G1,下面我们讨论

S+1={( Q0, P0),(Q1, P1),(Q1, P2),(Q2, P3),(Q3,P4)}

是否可模拟,在G2中Q0有一个a变换可对应到G1中2个变换,即(Q1,P1)∈S-1, (Q1,P2)∈S-1。但Q1有两个变换b,c，而在Ｇ１中公存在只有b或只有c的状态点，可知Ｇ１和Ｇ２不能互模拟。

（２）如图，标出各状态点，构造有关系

S={(P0,Q0),(P1,Q1),(P1,Q2),(P2,Q3),(P2,Q4), (P3,Q5)},

可知其中Ｇ１中的点均可由G2中的点模拟，下面我们考虑

S+1={( Q0, P0),(Q1, P1),(Q2, P1),(Q3, P2),(Q4,P2),(Q5,P3)}，

可知同样其中Ｇ２中的点均可由Ｇ１中的点模拟，所以Ｇ１和Ｇ２互模拟。

第三题：

设有下列嵌套关系模式：

Emp = (Ename, childrenSet setof(Childern), SkillsSet setof(Skills)) Childern = (Cname, birthday) Birthday = (day, month, year)

Skills = (Stype, examsSet setoff(Exams)) Exams = (year,city)

1) 试给出嵌套关系模式的XML DTD 表示，假定数据库中包含表emp(Emp) 2) 试用SQL:1999 写出下列查询：

a) 列出所有有一个孩子的生日在三月的员工的姓名；

b) 列出所有在城市”Xi’an”参加过技能种类(Stype)为”Typing”考试(Exams) 的员工的姓名(Ename)；

c) 列出关系Emp 中的所有技能种类(Stype)； 3) 试用Xquery 重写2)中的所有查询; 解:

1) ] >

2) a――－ select ename from Emp as e, unnest(e.ChildrenSet) as c(Children) where ‘March’ in ( select birthday.month from c)

b――― select ename from Emp as e, e.SkillsSet as s, s.ExamsSet as x where s.stype=’typing’ and x.city=’Xi’’an’

c――― select distinct s.stype from Emp as e , e.SkillsSet as s 3) a. for $e in /db/emp

let $empname := $e/@ename

where $e/children/birthday/month=”March” return $empname b. for $e in /db/emp let $empname := $e/@ename where $e/skills/stype=”typing” and $e/skills/exams/city=”Xi’an” return $empname c. for $e in /db/emp return $e/skills/stype 第四题：

For the DTD, XML, and XQUERY given below, answer the questions listed next.

The Document Type Definition myfriend.dtd: < !ELEMENT myfriends (person*)>

< !ELEMENT person (id, name?, cell-phone*, children?)> < !ELEMENT children (child*)> < !ELEMENT child (name,toys*)> < !ELEMENT name ( #PCDATA)> < !ELEMENT toys ( #PCDATA)> < !ELEMENT id ( #PCDATA)> ... ] < !ELEMENT employees (emp*)>

< !ELEMENT emp (id, work-phone, (contact|address)> < !ELEMENT address (city,zip,street)> < !ELEMENT id ( #PCDATA)>]

< !ELEMENT contact ( #PCDATA)> ... ] The XML Document friends.xml: 1

``jack''

2222 2

3333

c2 t1 c2 ...

The XML Document employees.xml:

9999 ``me''

8888

c z s

The XQUERY expression:

FOR $outer in (friends.xml)//person, LET $child := $outer/children

WHERE ($outer/cellphone > 2000 ) RETURN $outer/id

FOR $inner IN (employees.xml)/employees/emp[id=$outer/id] RETURN {

$outer/cellphone $child/child

$inner/workphone $inner/address/city }

1) List the XML output that the XQUERY expression would generate when applied to the given XML input documents.

2) Design a relational schema to store the two given XML data files.

3) Design an object-relational schema to store the two given XML data files. 4) List the SQL query that you would generate to execute the given XQUERY expression on your relational database. State what final computations would remain to be done by the XQUERY processor beyond executing your SQL statement, if any. 解:

2222 9999 2

3333

c2 t1 c2 8888 c

2) person(pid, cellphone, name) child(cid, parentid, name) toy(tid, cid, toy_name) emp(pid, workphone, contact, city, zip, street) 3) person(pid, name, cellphoneSet MultiSet(cellphones), ChildSet MultiSet(children)) cellphones(cellphone)

children(name, toySet MultiSet(toys)) toys(toyname)

emp(pid, workphone, contact, city, zip, street) 4) select person.cellphone,

array( select child.name child.toy form child

where child.parentid=person.pid) as child_array,

emp.workphone, emp.city from person, child, emp

where person.pid=emp.pid AND person.cellphone>2000 第五题:

Suppose you have to represent the information about parts. Each part has a name (unique),and a textual description. Parts may be simple or complex. A simple part has a color but no children subparts. A complex part has a number of children subparts (which can be simple or complex), each of which may be repeated. (E.g., a car has 4 wheels.) You can assume that each part can be a child subpart of at most one other part (so each part, together with its subparts, can be viewed as a tree). Do not assume any fixed number of levels of part composition.

1) Define the schema of XML documents containing part information using DTDs.

2) Give an example of a document instance which is valid under the DTDs. 3) Write the following queries in XQuery: (a) find the names of all the yellow parts.

(b) find all the parts that have at least 5 distinct children subparts.

(c) find all the parts containing a descendant subpart named “spoke\" and not containing a descendant subpart named “tire\". Answer: ANSWER:

1）

name ID #REQUIRED>

]> 2)

part1 blue part2 part3 yellow 3)

(a) for $p in /parts//part

where $p[color=”yellow”] return $p/@name (b) for $p in /parts//part

where count($p/subpartinfo)>=5 return $ p/@name (c) for $p in /parts//part

Where ($p//@name=”sopke”)and(not($p//@name=”tire”)) Return $p

第六题:

Consider these XML elements for the 'prefix' and 'infix' application of a binary function, here add, to its two variable arguments, here x and y:

add x y

x add y

Complete the following XSLT template - by just filling in the seven versions of \"___\" - for the (XML-to-XML) transformation of 'prefix' applications into 'infix' applications:

<_infix___>

<_larg_> __ __ ______

Could this transformation be 'inverted' - mapping 'infix' applications to 'prefix' applications - without information loss (write in only \"yes\" or \"no\" here)? Answer: No 第七题:

Consider the relation PARTS, which has Part# as hash key and which includes records with the following Part# values: 2369, 3760, 5046, 4871, 5659, 2222, 1821, 1074, 7115, 1620, 2428, 3943, 4750, 3157, 6975, 4981, 9208.The hash function uses 8 buckets, numbered 0 to 7. Each bucket is one disk block and holds two records.

1) Load these records into the file in the given order using the hash function h(K)=K mod 8. Calculate the average number of block accesses for a random retrieval on Part#.

2) Now load the records into expandable hash files based on linear hashing. Start with a single disk block, using the hash function h0= K mod 2º, and show how the file grows and how the hash functions change as the records are inserted. Assume that blocks are split whenever an overflow occurs, and show the value of n at each stage. 解: 1)

平均查找代价：(8＋6＊2＋3＋3＋3）/ 17＝1.71 2)

第八题:

设关系r1(A，B，C)，r2(C，D，E)有如下特性：r1 有200 000 个元组，r2 有 45 000 个元组，一块中可容纳25 个r1 元组或30 个r2 元组。试估算以下每一种策略计算r1|><|r2 所需存取的块数： 1) 嵌套循环连接 2) 块嵌套循环连接 3) 归并连接 4) 散列连接

解：r1需要8000个块，r2需要1500个块。假设有一个存储器有M页。如果M>8000，那么使用平坦嵌套循环，通过1500＋8000次磁盘存取就可以很容易的完成连接操作。因此我们只考虑M<=8000的情况。 A．嵌套循环连接：

使用r1作为外关系，我们需要进行200 000×1500＋8000＝300，008，000次磁盘存

取。如果r2是外关系，那么我们需要45 000×8 000＋1 500＝360 001 500次磁盘存取。 B．块嵌套循环连接：

如果r1是外关系，我们需要8000/(M2)×1500＋8000次磁盘存取，如果r2是外关系，我们需要1500/(M-2)×8000＋1500次磁盘存取。

C.归并连接

假设r1和r2最初没有按连接关键字进行排序，那么总的排序加上输出的耗费为Bs＝1500（2 logM-1(1500/M)＋1）＋8000（2 logM-1(8000/M)＋1）次磁盘存取。假设具有相同连接属性值的所有员组装入内存中，那么总的耗费是Bs＋1500＋8000次磁盘存取。 d.散列连接

我们假设不发生溢出。因为r2比较小，所以我们用r2作为创建关系，用r1作为探针关系。如果M>1500，那么就不需要进行递归分割，于是耗费为3（1500＋8000）＝28 500次磁盘存取，否则耗费为2（1500＋8000）logM-1(1500/M)+2＋1500＋8000次磁盘存取。

第九题:

Consider a hash-join of two relations R and S having B(R) = 1000 and B(S) = 500. The values in R and S are skewed such that the hash function assigns three times as many tuples to even-numbered hash buckets as to odd-numbered buckets.

1) How much memory would be required to perform the join in two passes? 2) What is the performance of the hash-join given the skewed hashing? 3) How would the performance of using the hash-join compare to using a sorted-merge algorithm? 解：

1。散列连接要用两趟完成，则需要递归划分，对关系s的划分所需趟数估计为

logM1(b(s))1，所以有500(2log99500/1001)8502logM15001， M=8.9。

对关系r进行划分所需趟数估计为logM1(b(r))1，且2logM110001，M=11。因为散列算法要求内存满足小的操作对象，所以需要8.9*4KB=35.6KB的内存。

2. 增加分区的个数，使得每个分区的大小（包括该分区上的散列索引在内）小于内存容量。 3.基于散列的算法使用一个散列函数将操作对象分割到桶中，然后操作被分别应用到桶和桶对上能被内存所容纳。基于排序归并连接的算法可对大于内存的关系进行排序，可知在关系以排序的情况下，归并连接是比较可取的。散列和排序在某种意义下是对偶的，因为能用散列实现的连接也可用排序来实现，反之亦然。基于散列的算法常常优于基于排序的算法，我们假设内存能容纳100个块，则用散列连接对S划分为5个划分，则代价为3（1000+500）=4500次块传输，用排序归并对R的排序需1000(2log991000/1001)2000次快传输，对S的排序需500(2log99500/1001)850次块传输，把排序写回磁盘需要1500次块

传输，归并步骤还需1500次块传输以读回数据，因此归并排序总代价为5850次块传输。第十题:

设关系r1(A，B，C)，r2(C，D，E)和r3(E，F)，其主码分别为A，C，E。假设r1 有1500 个元组，r2 有2500 个元组，r3 有1000 个元组。 1) 试估计r1|><|r2|><|r3 的大小；

2) 给出一个有效计算r1|><|r2|><|r3 的策略

答：1）因为连接具有结合律和交换性，所以不管我们怎样连接r1,r2和r3，最终连接r1，r2和r3得到的结果都是一样的。因此，我们只考虑基于（（r1 r2）r3）连接策略下的大小。因为C为r2的关键字，所以连接r1和r2产生至多包含1500个元组的关系。同样，把前面得到的结果和r3进行连接，将产生至多包含1500个元组的关系，因为E为r3的关键字。因此，最终关系最多包含有1500个元组。

2）计算这个连接的一个有效的策略是为关系r2上的属性C和关系r3上的属性E创建索引。然后对于r1中的每个元组，我们按照下面锝方法作：

A．使用在C上创建的索引，在r2中查找最多一个元组，这个元组与r1中的C匹配。 B．使用在E上创建的索引，在r3中查找最多一个元组，这个元组与r2中的E值匹配。

第十一题:

Assume that the following relation is fragmented horizontally by plant-number: employee (name, address, salary, plant-number)

Assume that each fragment is stored at the corresponding plant site. Describe a good processing strategy for the following queries issued at the San Jose site. 1) Find all employees at the New York plant.

2) Find the highest-paid employee at New York, Boston, Toronto, respectively.

3) Find the average salary of all employees.

答：1）a.纽约节点发送查询 ; b.让纽约节点返回查询结果。

2）a.分别向New York, Boston, Toronto发送查询最高工资员工的请求； b.在所有的节点上计算查询； c.向San Jose返回结果。

3）a.向所有的节点发送查询平局工资的请求； b.各个节点将计算结果返回到San Jose；

c.在San Jose对各个节点返回的结果进一步求平均； d.返回计算出的结果。

第十二题:

什么是可恢复调度？为什么要求调度的可恢复性？存在要求允许出现不可恢复调度的情况吗？说明理由。

答：假设在一个调度中，Tj读取了Ti写入的数据，Ti在提交前发生故障，我们必须中止Tj

以保证事务地原子性。若Tj在Ti出现故障后是可中止的，那么我们就称该调度是可恢复调度。可恢复调度应满足：对于每个事务Ti和Tj，如果Tj读取了由Ti所写的数据项，则Ti先于Tj提交。第十三题:

State if the following statements are true or false. Please write “TRUE” or

“FALSE” is the space provided.

a) For any schedule S1 and any serial schedule S2, if S1 is conflict equivalent to S2, then S1 is conflict serializable. ANSWER: true

b) For any schedules S1 and S2, if S1 and S2 are conflict serializable, then S1 and S2 are conflict equivalent. ANSWER: false

c) All serializable schedules are recoverable. ANSWER: false

d) All recoverable schedules are serializable. ANSWER: false

e) All schedules produced by a strict two-phase locking mechanism are strict schedules. ANSWER: true

f) All serial schedules are ACR (avoid cascading rollback). ANSWER: true

g) Any schedule that can be produced by a two-phase locking mechanism can also be produced by a validation mechanism. ANSWER: false

h) Any schedule that can be produced by a validation mechanism can also be produced by a two-phase locking mechanism. ANSWER: true

i) Any schedule that can be produced by a two-phase locking mechanism with shared (read) and exclusive (write) locks, can also be produced by a two-phase locking mechanism with exclusive locks only. ANSWER: true

j) Say we run a schedule S that is not conflict serializable on a database D, where a set C of consistency constraints have been defined. Initially, before S runs, all constraints in C hold, and we know that all transactions in S preserve consistency. After S runs, for sure at least one constraint in C will no longer hold.

ANSWER: true

第十四题:

假设一个应用是采用嵌入式SQL 编写的，其中80%的时间花在运行SQL 代码上，20%的时间花在运行主语言代码上。如果只对SQL 代码实施了并行，那么可以期望得到多大的加速比？说明理由。

答：由于不能被并行话的部分占总运行时间的20%，所以可获得的加速比最多不会超过5。

第十五题:

假设一个系统运行三种类型的事务：A 类事务以50/s 的速度运行，B 类事务以100/s 的速度运行，C 类事务以200/s 的速度运行。并假设系统所处理的事务中A、B、C 三类事务所占比例分别为30%，30%，40%。

1) 如果A、B、C 三类事务之间互不干扰，系统的平均事务吞吐量是多少？

2) 什么因素会使不同类型事务之间产生相互干扰，导致计算出的平均事务吞吐量不准确？

3) 如果不同类型事务之间相互干扰的因素非常复杂，那么用什么方法可以得到比较准确的平均事务吞吐量？

答：1）假设系统中有100个事务，那么系统中A类和B类事务就各有30个，C类事务有40个。那么仅仅执行A类事务所用的时间是0.6秒，仅执行B类事务的时间是0.3秒，C类事务的时间是0.2秒。假设各个事务之间互不干扰，在串行的系统下那么执行这100个事务所用的总时间是0.6+0.3+0.2=1.1秒，也就是说系统得平均事务吞吐量是每秒100/1.1=91个事务；如果是并行的系统，那么时间是0.6秒作完，系统的平均事务吞吐量为100/0.6=167个事务

2）引起事务之间干扰的最重要的原因之一是封锁竞争。在前面的例子中，假设事务A和事务B都是更新事务，而事务C是查询事务。由于处理器和磁盘之间的速度不匹配，很可能会出现下面的情况：A类型的一个事务持有一个“热”数据项的锁，并且在等待将其写道磁盘中来完成操作，在在这个时候B类或C类一个事务正在等待事务A持有的封锁。在这种情况下，一些CPU循环就被浪费了。因此，观察到的事物吞吐量会比计算出来的吞吐量要小。

相反，如果A类型的事务和B类型的事务是大量消耗磁盘资源的事务，而C类型事务时大量消耗CPU资源的事务，那么观察到的事物吞吐量将会比计算到吞吐量大。

封锁竞争也会导致死锁，在这种情况下一些事务将不得不被终止。事务的终止和重启将会导致观察到的吞吐量比计算出来的吞吐量要小。

数据结构大小的限制，事务管理器事务记录函数花费时间的变化情况等因素都会导致观察到的吞吐量和计算出来的吞吐量之间的不同。

3）如果不同类型事务之间的相互干扰因素非常复杂，那么我们可以采用性能模拟的办法对系统得吞吐量进行测试。首先需要建立模型，然后再模型上进行各种实验，可以通过改变不同的实验环境来估算出系统得平均吞吐量。

第十六题:

对于下列每一种任务，哪一种并行形式(查询间并行、操作间并行、操作内并行)可能是最关键的？说明理由。

1) 提高一个执行许多小的查询的系统吞吐量；

2) 在磁盘和处理器数目都很大的情况下，提高一个执行少量大的查询的系统吞吐量；

答：查询间并行指的是：不同的查询或不同的事务彼此并行地执行。

操作内并行指的是：我们可以通过并行的执行每一个运算，如排序，选择，投影，连

接等，来加速一个查询速度。操作间并行指的是：我们可以通过并行地执行每一个查询表达式中地多个不同的运算，

来加快一个查询的处理速度。

通过上面的介绍，我们可以知道，对于a查询间并行；对于b，操作内并行。

第十七题:

Consider a relation T(A, B) that contains 10000 records partitioned onto 5 disks according to the following strategies:

􀁺 Round robin.

􀁺 Hash-partition based on hash function (A mod 5)

􀁺 Assume the 5 disks corresponding to has values 0, 1, …, 4 contain 3000, 1500, 1500, 2000, 2000 tuples respectively.

􀁺 Range-partition based on vector on B [20, 40, 60, 80]

􀁺 Assume the disk responsible for <20 has 1000 records, the one for [20, 40) has 2000 records, and the other disks in this order have 2000, 2000, 3000 records, respectively.

􀁺 No index is created. Assume processing one tuple takes 1ms for any query.

What are the costs of processing the following queries using each of the above strategies?

1) select * from T

2) select * from T where A = 20

3) select * from T where 20 < A < 30 4) select * from T where 70 < B < 85 Round robin Hash-partiton Range-partition 1 2 3 3 2 2 3 3 3 2 3 3 4 2 3 3

第十八题:

Consider the following log where the DPT represents the Dirty Page Table and TT represents the Transaction Table

Answer the following questions (using the ARIES-like algorithm we studied in

class):

1) What is the smallest LSN accessed during the Analysis phase. 2) Fill in the contents of the Dirty Page Table and the Transaction Table at the end of the Analysis phase. (you may not need all the space we give you) PageID 1 3 5 8 RecLSN 10 40 50 70 XID T1 Status abort LastLSN 90

3) At which LSN does the Redo phase begin?

4) What entries (specify only LSNs) do get undone as part of the Undo phase? 解:

1)the smallest LSN accessed during the Anaylsis phase is 10 2)see above 3)10 4)40, 10 第十九题:

Consider the following natural language description: John Smith is the author of \"My Life as a Bug\". John Smith's agent is Mary Jones. John Smith is 35 years old.

The son of Bob Smith is John Smith.

Represent the information in this description as RDF triples . Use the RDF graph format to represent the triples, with labeled ovals and labeled directed arcs. That is, you do not need to use XML syntax to describe the RDF statements. Use sensible label names, but your labels do not need to be in URI syntax. 解：

第二十题:

Consider the following RDF document using the XML syntax. Draw the

equivalent graph. For convenience, you may use QNames for your node and edge labels.

解：

第二十一题:

Translate the following RDF Graph into the RDF/XML syntax. For node ands arcs labeled with QNames, assume the standard prefixes apply. For other names, assume they are all local to the document you are writing.

Answer:

xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\" xmlns:rdfs=\"http://www.w3.org/2000/01/rdf-schema#\">

Peter Jackson

Elijah Wood

The Fellowship of the Ring

Ian Mckellan

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文

全部栏目

数据库系统概念复习题及答案