Theses: 1981 to present

Jump to year:

2011 >> 2010 >>
2009 >> 2008 >> 2007 >> 2006 >> 2005 >> 2004 >> 2003 >> 2002 >> 2001 >> 2000 >>
1999 >> 1998 >> 1997 >> 1996 >> 1995 >> 1994 >> 1993 >> 1992 >> 1991 >> 1990 >>
1989 >> 1988 >> 1987 >> 1986 >> 1985 >> 1984 >> 1983 >> 1982 >> 1981 >>

2011

[Cheng, 2011]: Author(s): Carson Ka Shing Cheng.

Title: . Cognitive modeling of sentence meaning acquisition using a hybrid connectionist computational model inspired by Cognitive Grammar.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 2011.

Abstract: A novel connectionist architecture of artificial neural networks is presented to model the assignment of meaning to test sentences on the basis of learning from relevantly sparse input. Training and testing sentences are generated from simple recursive grammars, and once trained, the architecture successfully processes thousands of sentences containing deeply embedded clauses, therefore experimentally showing the architecture exhibits partial semantic and strong systematicities — two properties that humans also satisfy. The architecture's novelty derives, in part, from analyzing language meaning on the basis of Cognitive semantics (Langacker, 2008), and the concept of affirmative stimulus meaning (Quine, 1960). The architecture demonstrates one possible way of providing a connectionist processing model of Cognitive semantics. The architecture is argued to be oriented towards increasing neurobiological and psychological plausibility as well, and will also be argued as being capable of providing an explanation of the aforementioned systematicity properties in humans.
[Ding, 2011]: Author(s): Yan Ding.

Title: . Peer-to-peer 3D/multi-view video streaming.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 2011.

Abstract: The recent advances in stereoscopic video capture, compression and display have made 3D video a visually appealing and costly affordable technology. More sophisticated multi-view videos have also been demonstrated. Yet their remarkably increased data volume poses greater challenges to the conventional client/server systems. The stringent synchronization demands from different views further complicate the system design. In this thesis, we present an initial attempt toward efficient streaming of 3D videos over peer-to-peer networks. We show that the inherent multi-stream nature of 3D video makes playback synchronization more difficult. We address this by a 2-stream buffer, together with a novel segment scheduling. We further extend our system to support multi-view video with view diversity and dynamics. We have evaluated our system under different end-system and network configurations with typical stereo video streams. The simulation results demonstrate the superiority of our system in terms of scalability, streaming quality and dealing with view dynamics.
[Jiang, 2011]: Author(s): Bin Jiang.

Title: . Summarizing Certainty in Uncertain Data.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, May 2011.

Abstract: Uncertain data has been rapidly accumulated in many important applications, such as sensor networks, market analysis, social networks, and so on. Analyzing large collections of uncertain data has become an essential task. Generally, uncertainty means the lack of certainty due to having limited knowledge of the data being examined. An uncertain object cannot be described exactly in one state. Instead, it has more than one possible representation. Therefore, we model an uncertain data set as a set of uncertain objects, each of which has a set of instances, in a domain consisting of multiple attributes. In this thesis, we put emphasis on summarizing certainty in uncertain data. We systematically identify three types of uncertainty, namely, value uncertainty, membership uncertainty, and relationship uncertainty in the levels of objects, instances, and domains of uncertain data. In particular, we develop techniques for clustering uncertain objects to summarize objects, detecting outlying instances to summarize instances, and learning domain orders to summarize domains. Technically, we combine statistical analysis and data mining techquies to investigate uncertain data. We develop efficient and scalable algorithms to tackle the computational challenges of large uncertain data sets. We also conduct comprehensive empirical studies on real and synthetic data sets to verify the effectiveness of the proposed summarization techniques and the efficiency of our algorithms.
[Litus, 2011]: Author(s): Yaroslav Litus.

Title: . Using spatial embeddedness and physical embodiment for computation in multi-robot systems.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, January 2011.

Abstract: This thesis contributes to the understanding of computational capabilities of multi-robot systems by viewing them as devices in which mechanical and electronic components jointly perform computation. We show that a multi-robot system can use physical embodiment and spatial embeddedness as computational resources for two types of problems: (i) a continuous optimization problem of achieving optimal joint robot team configuration, and (ii) a combinatorial problem of sorting robots. In the continuous problem domain we describe a general approach for developing decentralized distributed gradient descent optimization algorithms for teams of embodied agents that need to rearrange their configuration over space and time to approach some optimal and initially unknown configuration. We provide examples of the application of this general method by solving two non-trivial problems of multi-robot coordination: energy-efficient single point and multiple point rendezvous. In the combinatorial problem domain we demonstrate a multi-robot system controller that sorts a team of robots by means of its continuous movement dynamics. Between-robot rank comparisons suggested by traditional discrete state sorting algorithms are avoided by coupling neighbors in the order in a Brockett double bracket flow system.
[Wang, 2011]: Author(s): Chunhao Wang.

Title: . Computational Study on Bidimensionality Theory Based Algorithms.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, July 2011.

Abstract: Bidimensionality theory provides a general framework for developing subexponential fixed parameter algorithms for NP-hard problems. A representative example of such algorithms is the one for the longest path problem in planar graphs. The largest grid minor and the branch-decomposition of a graph play an important role in bidimensionality theory. The best known approximation algorithm for computing the largest grid minor of a planar graph has the approximation ratio 3. In this thesis, we report a computational study on a branch-decomposition based algorithm for the longest path problem in planar graphs. We implement the 3-approximation algorithm for computing large grid minors. We also design and implement an exact algorithm for computing the largest cylinder minor in planar graphs to evaluate the practical performance of the 3-approximation algorithm. The results show that the bidimensional framework is practical for the longest path problem in planar graphs.
[Wighton, 2011]: Author(s): Paul Wighton.

Title: . Towards automated skin lesion diagnosis.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, 06 2011.

Abstract: Melanoma, the deadliest form of skin cancer, must be diagnosed early in order to be treated effectively. Automated Skin Lesion Diagnosis (ASLD) attempts to accomplish this using digital dermoscopic images. This thesis investigates several areas in which ASLD can be improved. Typically, the ASLD pipeline consists of 5 stages: 1) image acquisition, 2) artifact detection, 3) lesion segmentation, 4) feature extraction and 5) classification. The main focus of the thesis is the development of two probabilistic models which are sufficiently general to perform several key tasks in the ASLD pipeline, including: artifact detection, lesion segmentation and feature extraction. We then show how all parameters of these two models can be inferred automatically using supervised learning and a set of examples. Additionally, we present methods to: 1) evaluate the perception of texture in images of dermoscopic skin lesions, 2) calibrate acquired digital dermoscopy images for color, lighting and chromatic aberration, and 3) digitally remove detected occluding artifacts. Our general probabilistic ability to detect occluding hair and segment lesions performs comparably to other, less general, methods. Perceptually, we conclude that the textural information in skin lesions exists independently of color. Calibrating, for colour and lighting, we achieve results consistent with previous work; calibrating for chromatic aberration, we are able to reduce distortions by 47%. Furthermore, our method to digitally remove occluding artifacts outperforms previous work.
[Zhou, 2011]: Author(s): Bin Zhou.

Title: . Keyword search on large-scale data: from relational and graph data to OLAP infrastructure.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, May 2011.

Abstract: In recent years, the great success of Web search engines has shown the simplicity and the power of keyword search on billions of textual pages on the Web. In addition to textual pages, there also exist vast collections of structured, semi-structured, and unstructured data in various applications that contain abundant text information. Due to the simplicity and the power of keyword search, it is natural to extend keyword search to retrieve information from large-scale structured, semi-structured, and unstructured data. In this thesis, we study a class of important challenging problems centered on keyword search on large-scale data. We propose various techniques for different types of important data sources, including relational tables, graphs, and search logs. Specifically, for relational tables, we show that, while searching individual tuples using keywords is useful, in some scenarios, it may not find any tuples since keywords may be distributed in different tuples. To enhance the capability of the keyword search technique on relational tables, we develop the aggregate keyword search method which finds aggregate groups of tuples jointly matching a set of query keywords. For graphs, we indicate that keyword queries are often ambiguous. Thus, developing efficient and effective query suggestion techniques is crucial to provide satisfactory user search experience. We extend the query suggestion technique in Web search to help users conduct keyword search on graphs. For search logs, we study various types of keyword search applications in Web search engines, and conclude that all of those applications are related to several novel mining functions on search logs. We build a universal OLAP infrastructure on search logs which supports scalable online query suggestion. The proposed techniques have several desirable characteristics which are useful in different application scenarios. We evaluate our approaches on a broad range of real data sets and synthetic data sets and demonstrate that the techniques can achieve high performance. We also provide an overview of the keyword search problem on large-scale data, survey the literature study in the field, and discuss some potential future research directions.

2010

[Chen, 2010]: Author(s): Jiyi Chen.

Title: . Automated Load Curve Data Cleansing in Power Systems.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 2010.

Abstract: Load curve data refers to power consumption recorded by meters at certain time intervals at delivery points or end user points, and contains vital information for day-to-day operations, system analysis, system visualization, system reliability performance, energy saving and adequacy in system planning. It is unavoidable that load curves contain corrupted data and missing data due to various random failure factors in meters and transfer processes. In this thesis, nonparametric smoothing techniques are proposed to model the load curve data and detect corrupted data. An adapted multiplicative model is built to correct corrupted data and fill in missing data. In implementation, an incremental training procedure is proposed to enhance the performance. The experiment results on the real BCTC (British Columbia Transmission Corporation) load curve data demonstrated the effectiveness of the presented solution.
[Lin, 2010]: Author(s): Zhenhua Lin.

Title: . Mining Discriminative Items in Multiple Data Streams.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, January 2010.

Abstract: How can we maintain a dynamic profile capturing a user's reading interest against the common interest? What are the queries that have been asked 1,000 times more frequently to a search engine from users in Asia than in North America? What are the keywords (or tags) that are 1,000 times more frequent in the blog stream on computer games than in the blog stream on Hollywood movies? To answer such interesting questions, we need to find discriminative items in multiple data streams. Each data source, such as Web search queries in a region and blog postings on a topic, can be modeled as a data stream due to the fast growing volume of the source. Motivated by the extensive applications, in this thesis, we study the problem of mining discriminative items in multiple data streams. We show that, to exactly find all discriminative items in stream S1 against stream S2 by one scan, the space lower bound is Ω(|Σ| log n1/|Σ|) where Σ is the alphabet of items and n1 is the current size of S1. To tackle the space challenge, we develop three heuristic algorithms that can achieve high precision and recall using sub-linear space and sub-linear processing time per item with respect to |Σ|. The complexity of all algorithms are independent of the size ofthe two streams. An extensive empirical study using both real data sets and synthetic data sets verifies our design.
[Ly, 2010]: Author(s): Cong Ly.

Title: . Latency Reduction in Online Multiplayer Games Using Detour Routing.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, May 2010.

Abstract: Long network latency negatively impacts the performance of online multiplayer games. In this thesis, we propose a novel approach to reduce the network latency in online gaming. Our approach employs application level detour routing in which game-state update messages between two players can be forwarded through other intermediate relay nodes in order to reduce network latency. We present results from an extensive measurement study to show the potential benefits of detour routing in online games. We also present the design of a complete system to achieve the potential, which is called Indirect Relay System (IRS). The experimental and simulation results show that IRS: (i) significantly reduces end-to-end round-trip times (RTTs) among players, (ii) increases number of peers a player can connect to and maintain good gaming quality, (iii) imposes negligible network and processing overheads, and (iv) improves gaming quality and player performance.
[Wawerla, 2010]: Author(s): Jens Wawerla.

Title: . Task-Switching for Self-Sufficient Robots.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, May 2010.

Abstract: Task-switching enables a system to make its own decisions about which task to perform. It is therefore a key ability for any truly autonomous system. Common task-switching methods range from computationally expensive planning methods to often suboptimal, minimalistic, heuristics.

This thesis takes a bio-inspired approach, motivated by the fact that animals successfully make task-switching decisions on a daily basis. The field of behavioural ecology provides a vast literature on animal task-switching. Generally these works are descriptive models of animal behaviour, either modelling to fit the data from observed animal behaviour, or theoretically optimal models of how animals ought to behave.

But what is needed in robotics are methods that generate behaviour based on the information available (due to sensing) to the robot. Furthermore these methods have to take the physical limitations (velocity, acceleration, storage capacity etc.) of the robot into account.

This thesis takes inspiration from descriptive behavioural ecology models and proposes a situated and embodied task-switching method suitable for mobile robots. To evaluate the quality of the decisions an objective function is needed. Reproductive success is commonly used in Biology, here economical success is used. We illustrate the applicability of the proposed methods on Toda's FE robot. The decisions this robot faces are (1) when to work and when to refuel and (2) where to work or refuel respectively. Both decision types are essential to any autonomous, mobile robot. The proposed task-switching methods are based on Optimal Foraging Theory, in particular on rate-maximization and the Marginal-Value Theorem.
[Yang, 2010]: Author(s): Weilong Yang.

Title: . Learning Transferable Distance Functions for Human Action Recognition and Detection.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 2010.

Abstract: In this thesis, we address an important topic in computer vision, human action recognition and detection. In particular, we focus on a special scenario where only a single clip is available for training for each action category. This is a very natural scenario in many real-world applications, such as video search and intelligent video surveillance. We present a transfer learning technique called transferable distance function learning and apply it in human action recognition and detection. This learning algorithm aims to extract generic knowledge from previous training sets, and apply this knowledge to videos of new actions without further learning. It is experimentally demonstrated that the proposed algorithm can improve the accuracy of single clip action recognition and detection. Based on the learned transferable distance function, we further propose a cascade structure which can significantly improve the efficiency of an action detection system.
[Zhang, 2010]: Author(s): Qiang Zhang.

Title: . Embedding Parallel Bit Stream Technology into Expat.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, June 2010.

Abstract: Parallel bit stream technology is a novel approach to interpret byte stream data and exploit data level parallelism by employing single-instruction multiple-data (SIMD) instructions. Parabix, an XML parser embedded with parallel bit stream technology, performs much better than traditional XML parsers that process XML documents in byte-at-a-time fashion. The project attempts to enhance the performance of Expat, an traditional XML parser, by embedding parallel bit stream technology into Expat. Most byte-at-a-time loops are identified and then replaced with bit stream operations. A performance case study is conducted by comparing the performance result between the original Expat and the parallel bit stream version.

2009

[Bastani, 2009]: Author(s): Behnam Bastani.

Title: . Spectral Analysis of Output Devices, from Printing to Predicting.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, Jun 2009.

Abstract: The focus of this thesis is to develop and introduce algorithms that extend traditional colour reproduction from three dimensions to higher dimensions in order to minimize metamerism. The thesis introduces models that can accurately predict interactions between the primaries for non-linear output devices in spectral colour space. Experiments were designed and performed to aid in understanding how optimized the spectral characteristics of existing printer inks and display primaries are, and how the inks and primaries should be designed so that the accuracy of the reproduction is optimized. The time and space computational complexity of the reproduction algorithms grows exponentially with the number of input dimensions. The algorithms for finding the best combinations of inks or primaries matching a given input reflectance become more challenging when the inks interact with each other non-linearly, as is usually the case in printers. A number of different methods are introduced in this thesis to handle gamut mapping and the colour reproduction process in higher dimensions. An ink-separation algorithm is introduced to find the ink combination yielding a chosen gamut-mapped spectral reflectance. Experiments with real inks for spectral colour reproduction were performed to compare the results of the reproduction against trichromatic colour reproduction on a 9-ink printer system. Finally, a new application of reflectance analysis in higher dimensions is introduced.
[Belli, 2009]: Author(s): Fernando Belli.

Title: . Improving Access to Data in Legacy Health Information Systems.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 2009.

Abstract: In today's world, fast and on time access to information has become a necessity due to the increasing demand to make informed decisions based not only on estimations and historical information but also on current data. Decision makers expect to have access to the information in a timely way. This is a particular problem in the Fraser Health Authority where data needed by decision-makers is not easily accessible due to lack of functionality in their legacy Health Information Systems (HIS). Our solution involves extracting the data from the HIS into a SQL database every 15 minutes, that is then used to create a set of XML files. These XML files are processed and displayed to the end users using a client-side browser application in a secure way. This solution adheres to the policies of the Fraser Health Authority, is scalable for read-only access, and enables end-users to access current information.
[Choi, 2009]: Author(s): Yongchul (Kenneth) Choi.

Title: . SPATIAL OLAP QUERY ENGINE: PROCESSING AGGREGATE QUERIES ON SPATIAL OLAP DATA.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, July 2009.

Abstract: A spatial OLAP can be characterised as a practical union of OLAP analysis and geographic mapping. A spatial OLAP query has a spatial confinement along with the conventional non-spatial predicate. An existing framework we opt for is to convert a spatial OLAP query into a set of queries for a general-purpose ROLAP engine. However, little has been done at the query optimization level, once the queries are submitted to the query engine. This thesis introduces three query engines on an experimental MOLAP system. The first is the implementation of the framework in the MOLAP context. The second increases the efficiency by adopting a novel merging technique to screen out many useless queries. The third does all aggregation on the fly, which outperforms the first two query engines by a wide margin under many circumstances. Detailed experimental performance data are presented, using a real-life database with 1/3 million of spatial objects.
[Chuang, 2009]: Author(s): Johnson Chuang.

Title: . Energy Aware Colour Mapping for Visualization.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, June 2009.

Abstract: We present a design technique for colours that lower the energy consumption of the display device. Our approach relies on a screen space variant energy model. Guided by perceptual principles, we present three variations of our approach for finding low energy, distinguishable, iso-lightness colours. The first is based on a set of discrete user-named (categorical) colours, which are ordered according to energy consumption. The second optimizes for colours in the continuous CIELAB colour space. The third is hybrid, optimizing for colours in select CIELAB colour subspaces that are associated with colour names. We quantitatively compare our colours with a traditional choice of colours, demonstrating that approximately 45 percent of the display energy is saved. The colour sets are applied to 2D visualization of nominal data and volume rendering of 3D scalar fields. A new colour blending method for volume rendering which preserves hues further improves colour distinguishability.
[Farahbod, 2009]: Author(s): Roozbeh Farahbod.

Title: . CoreASM: An Extensible Modeling Framework & Tool Environment for High-level Design and Analysis of Distributed Systems.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, May 2009.

Abstract: Model-based systems engineering naturally requires abstract executable specifications to facilitate simulation and testing in early stages of the system design process. Abstraction and formalization provide effective instruments for establishing critical system requirements by precisely modeling the system prior to construction so that one can analyze and reason about specification and design choices and better understand their implications. There are many approaches to formal modeling of software and hardware systems. Abstract State Machines, or ASMs, are well known for their versatility in computational and mathematical modeling of complex distributed systems with an orientation toward practical applications. They offer a good compromise between declarative, functional and operational views towards modeling of systems. The emphasis on freedom of abstraction in ASMs leads to intuitive yet accurate descriptions of the dynamic properties of systems. Since ASMs are in principle executable, the resulting models are validatable and possibly falsifiable by experiment. Finally, the well-defined notion of step-wise refinement in ASMs bridges the gap between abstract models and their final implementations. There is a variety of tools and executable languages available for ASMs, each coming with their own strengths and limitations. Building on these experiences, this work puts forward the design and development of an extensible and executable ASM language and tool architecture, called CoreASM, emphasizing freedom of experimentation and design exploration in the early phases of the software development process. CoreASM aims at preserving the very idea of ASM modeling—the design of accurate abstract models at the level of abstraction determined by the application domain, while encouraging rapid prototyping of such abstract models for testing and design space exploration. In addition, the extensible language and tool architecture of CoreASM facilitates integration of domain specific concepts and special-purpose tools into its language and modeling environment. CoreASM has been applied in a broad scope of R&D projects, spanning maritime surveillance, situation analysis, and computational criminology. In light of these applications, we argue that the design and implementation of CoreASM accomplishes its goals; it not only preserves the desirable characteristics of abstract mathematical models, such as conciseness, simplicity and intelligibility, but it also adheres to the methodological guidelines and best practices for ASM modeling.
[Finkbeiner, 2009]: Author(s): Bernhard Finkbeiner.

Title: . Fast Computed Tomography and Volume Rendering Using the Body-Centered Cubic Lattice.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, March 2009.

Abstract: Two main tasks in the field of volumetric image processing are acquisition and visualization of 3D data. The main challenge is to reduce processing costs, while maintaining high accuracy. To achieve these goals for volume rendering (visualization), we demonstrate that non-separable box splines for body-centered cubic (BCC) lattices can be adapted to fast evaluation on graphics hardware. Thus, the BCC lattice can be used for interactive volume rendering leading to better image quality than comparable methods. Leveraging this result, we study volumetric reconstruction methods based on the Expectation Maximization (EM) algorithm. We show the equivalence of the standard implementation of the EM- based reconstruction with an implementation based on hardware-accelerated volume rendering for nearest- neighbor interpolation to achieve fast reconstruction times. Accuracy is improved by adapting the EM algorithm for the BCC lattice, leading to superior accuracy, more compact data representation, and better noise reduction compared to the Cartesian one.
[Giabbanelli, 2009]: Author(s): Philippe J. Giabbanelli.

Title: . Self-improving immunization policies for complex networks.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, March 2009.

Abstract: Viruses are objects of research in both computer networks and epidemiology. The two types of viruses are very close in terms of modeling: in both cases, the spread of viruses can be modeled as a broadcast in a graph with a decentralized scheme. In order to design efficient immunization strategies, we have to be aware of the properties of the graphs in which viruses spread. Thus, we first review the properties found ! in many real-world graphs, such as small-world and scale-free, and the deterministic models that exhibit them. As a virus is an independent entity, the modeling should take into consideration parameters related to agents, such as their heuristics and their memory. We perform a 2^k factorial design to identify the contribution of the parameters of the agents and the properties of the topology. To benefit from the potential of agents to immunize dynamic networks, we specify a multi-agent system: the agents observe their environment, exchange their knowledge with minimal communication cost and fast consensus, and thus have a model of the dynamics that allows them to cope with changes. We present an algebraic framework that allows such exchange of knowledge while providing rigorous characterization. Among futures works, we discuss deterministic models for scale-free networks from vertex contractions, and reconfiguration of agent communications.
[Guo, 2009]: Author(s): Zhenshan Guo.

Title: . Partial Aggregation and Query Processing of OLAP Cubes.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 2009.

Abstract: This work presents a novel and flexible PPA (Partial Pre-Aggregation) construction and query processing technique in OLAP (On-Line Analytical Processing) applications - SplitCube, which greatly reduces the cube size, shortens the cube building time and maintains an acceptable query performance at the time. Furthermore, we devise two enhanced query processing techniques. They can further improve the query performance or reduce cube building time further and keep query response time at an acceptable level. The result analysis shows more insights in cube construction and query processing procedure and illustrates the advantages and disadvantages of each algorithm. Finally, we give guidelines in how to choose the right algorithm in different user cases.
[Haffari, 2009]: Author(s): Gholamreza Haffari.

Title: . Machine Learning approaches for dealing with limited bilingual training data in statistical machine translation.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, December 2009.

Abstract: Statistical Machine Translation (SMT) models learn how to translate by examining a bilingual parallel corpus containing sentences aligned with their human-produced translations. However, high quality translation output is dependent on the availability of massive amounts of parallel text in the source and target languages. There are a large number of languages that are considered low-density, either because the population speaking the language is not very large, or even if millions of people speak the language, insufficient online resources are available in that language. This thesis covers machine learning approaches for dealing with such situations in statistical machine translation where the amount of available bilingual data is limited. The problem of learning from insufficient labeled training data has been dealt with in machine learning community under two general frameworks: (i) Semi-supervised Learning, and (ii) Active Learning. The complex nature of machine translation task poses severe challenges to most of the algorithms developed in machine learning community for these two learning scenarios. In this thesis, I develop semi-supervised learning as well as active learning algorithms to deal with the shortage of bilingual training data for Statistical Machine Translation task, specific to cases where there is shortage of bilingual training data. This dissertation provides two approaches, unified in what is called the bootstrapping framework, to this problem.
[Hsu, 2009]: Author(s): Cheng-Hsin Hsu.

Title: . Efficient Mobile Multimedia Streaming.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, November 2009.

Abstract: Modern mobile devices have evolved into small computers that can render multimedia streaming content anywhere and anytime. These devices can extend the viewing time of users and provide more business opportunities for service providers. Mobile devices, however, make a challenging platform for providing high-quality multimedia services. The goal of this thesis is to identify these challenges from various aspects, and propose efficient and systematic solutions to solve them. In particular, we study mobile video broadcast networks in which a base station concurrently transmits multiple video streams over a shared air medium to many mobile devices. We propose algorithms to optimize various quality-of-service metrics, including streaming quality, bandwidth efficiency, energy saving, and channel switching delay. We analytically analyze the proposed algorithms, and we evaluate them using numerical methods and simulations. In addition, we implement the algorithms in a real testbed to show their practicality and efficiency. Our analytical, simulation, and experimental results indicate that the proposed algorithms can: (i) maximize energy saving of mobile devices, (ii) maximize bandwidth efficiency of the wireless network, (iii) minimize channel switching delays on mobile devices, and (iv) efficiently support heterogeneous mobile devices. Last, we give network operators guidelines on choosing solutions suitable for their mobile broadcast networks, which allow them to provide millions of mobile users much better viewing experiences, attract more subscribers, and thus increase the revenues.
[Huang, 2009]: Author(s): Jiawei Huang.

Title: . Saliency Detection and Feature Matching for Image Trimming and Tracking in Active Video.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, Aug 2009.

Abstract: We develop a new automatic Object of Interest detection method for image trimming and a novel tracking technique in active videos. Both applications consist of salient region detection and feature matching. We deploy a color-saliency weighted Probability-of-Boundary (cPoB) map to detect salient regions. Scale Space Image Pyramid (SSIP) feature matching is proposed for image trimming. An image pyramid is created to imitate the view point change for stable keypoint selection. Successive Classification Maximum Similarities (SCMS) feature matching is used for tracking. A strong classifier trained by AdaBoost is utilized for keypoint classification and subsequent Linear Programming rejects outliers. The object- centered property of Active Video is highly beneficial because it captures the essence of Human Visual Attention and facilitates self-initialization in tracking. Experiments demonstrate the importance of saliency detection and feature matching and confirm that our approach can automatically detect salient regions in images and track reliably in videos.
[Karimi, 2009]: Author(s): Mohammad Mahdi Karimi.

Title: . Minimum Cost Homomorphisms to Digraphs.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, March 2009.

Abstract: For digraphs D and H, a homomorphism of D to H is a mapping ƒ: V(D) → V(H) such that uv ∈ A(D) implies ƒ(u)ƒ(v) ∈ A(H). Suppose D and H are two digraphs, and c&subi;(u), u ∈ V(D), i ∈ V(H), are nonnegative integer costs. The cost of the homomorphism ƒ of D to H is ∑_{u∈V(D)^c_ƒ(u)}(u). The minimum cost homomorphism for a fixed digraph H, denoted by MinHOM(H), asks whether or not an input digraph D, with nonnegative integer costs c_i(u), u ∈ V(D), i ∈ V(H), admits a homomorphism ƒ to H and if it admits one, find a homomorphism of minimum cost. Our interest is in proving a dichotomy for minimum cost homomorphism problem: we would like to prove that for each digraph H, MinHOM(H) is polynomial-time solvable, or NP-hard. Gutin, Rafiey, and Yeo conjectured that such a classification exists: MinHOM(H) is polynomial time solvable if H admits a k-Min-Max ordering for some k ≥ 1, and it is NP-hard otherwise. For undirected graphs, the complexity of the problem is well understood; for digraphs, the situation appears to be more complex, and only partial results are known. In this thesis, we seek to verify this conjecture for 'large' classes of digraphs including reflexive digraphs, locally in-semicomplete digraphs, as well as some classes of particular interest such as quasi- transitive digraphs. For all classes, we exhibit a forbidden induced subgraph characterization of digraphs with k-Min-Max ordering; our characterizations imply a polynomial time test for the existence of a k-Min-Max ordering. Given these characterizations, we show that for a digraph H which does not admit a k-Min-Max ordering, the minimum cost homomorphism problem is NP-hard. This leads us to a full dichotomy classification of the complexity of minimum cost homomorphism problems for the aforementioned classes of digraphs.
[Liu, 2009a]: Author(s): Yi Liu.

Title: . Video Streaming over Cooperative Wireless Networks.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 2009.

Abstract: We study the problem of broadcasting video streams over a Wireless Metropolitan Area Network (WMAN) to many mobile devices. We propose a cooperative network in which several elected mobile devices share received video data over a Wireless Local Area Network (WLAN). The proposed system significantly reduces the energy consumption and the channel switching delay concurrently. We design a distributed leader election algorithm for the cooperative system and analytically show that the proposed system outperforms current systems in terms of energy consumption and channel switching delay. Our experimental results from a real mobile video streaming testbed show that the proposed cooperative system is promising because it achieves high energy saving, significantly reduces channel switching delay and uniformly distributes load on all mobile devices. Furthermore, we complement our empirical evaluation with a trace driven simulator to rigorously show the viability of the proposed cooperative system.
[Liu, 2009b]: Author(s): Yudong Liu.

Title: . Semantic Role Labeling using Lexicalized Tree Adjoining Grammars.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, December 2009.

Abstract: The predicate-argument structure (PAS) of a natural language sentence is a useful representation that can be used for a deeper analysis of the underlying meaning of the sentence or directly used in various natural language processing (NLP) applications. The task of semantic role labeling (SRL) is to identify the predicate-argument structures and label the relations between the predicate and each of its arguments. Researchers have been studying SRL as a machine learning problem in the past six years, after large- scale semantically annotated corpora such as FrameNet and PropBank were released to the research community. Lexicalized Tree Adjoining Grammars (LTAGs), a tree rewriting formalism, are often a convenient representation for capturing locality of predicate-argument relations. Our work in this thesis is focused on the development and learning of the state of the art discriminative SRL systems with LTAGs. Our contributions to this field include: We apply to the SRL task a variant of the LTAG formalism called LTAG-spinal and the associated LTAG-spinal Treebank (the formalism and the Treebank were created by Libin Shen). Predicate-argument relations that are either implicit or absent from the original Penn Treebank are made explicit and accessible in the LTAG-spinal Treebank, which we show to be a useful resource for SRL. We propose the use of the LTAGs as an important additional source of features for the SRL task. Our experiments show that, compared with the best-known set of features that are used in state of the art SRL systems, LTAG-based features can improve SRL performance significantly. We treat multiple LTAG derivation trees as latent features for SRL and introduce a novel learning framework – Latent Support Vector Machines (LSVMs) to the SRL task using these latent features. This method significantly outperforms state of the art SRL systems. In addition, we adapt an SRL framework to a real-world ternary relation extraction task in the biomedical domain. Our experiments show that the use of SRL related features significantly improves performance over the system using only shallow word-based features.
[Ma, 2009]: Author(s): William Pak Tun Ma.

Title: . Motion Estimation for Functional Medical Imaging Studies Using a Stereo Video Head Pose Tracking System.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, Aug 2009.

Abstract: Patient motion is unavoidable during long medical imaging scan times. In particular, motion artifacts in functional and molecular brain imaging (e.g., dynamic positron emission tomography in dPET) are known to corrupt the data leading to inaccurate analysis and diagnosis. Most existing motion correction solutions either rely on attaching external markers or on data-driven image registration algorithms. In this work, we propose a new motion correction approach. It alleviates the need for inconvenient external markers and relaxes the dependence on the fragile similarity metrics that are generally incapable of capturing the complex spatio-temporal tracer dynamics in dPET. We develop a hybrid, multi-sensor method that uses a marker-free video tracker, along with image-based registration. The balance between the two is automatically adapted to confide in the more certain measurement. Our quantitative results demonstrate improved motion estimation and kinetic parameter extraction when using our hybrid method.
[Nguyen, 2009]: Author(s): Nhi Hoang Nguyen.

Title: . A hybrid approach to segmenting hair in dermoscopic images using a universal kernel.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, November 2009.

Abstract: Hair occlusion often causes automated melanoma diagnostic systems to fail. We present a new method to segment hair in dermoscopic images. First, all possible dark and light hairs are amplified without prejudice with a universal matched filtering kernel. We then process the filter response with a novel tracing algorithm to get a raw hair mask. This raw mask is skeletonized to contain only the centerlines of all the possible hairs. Then the centerlines are verified by applying a model checker on the response and the original images. If a centerline indeed corresponds to a hair, the hair is reconstructed; otherwise it is rejected. The result is a clean hair mask which can be used to disocclude hair. Application on real dermoscopic images yields good results for thick hair of varying colours. The algorithm also performs well on skin images with a mixture of both dark and light hair.
[Shi, 2009]: Author(s): Lilong Shi.

Title: . Novel Colour Constancy Algorithms for Digital Colour Imagery.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, December 2009.

Abstract: Colour constancy algorithms differ in their derivation, implementation, performance and assumptions. The focus of the research presented in this thesis is to discover colour constancy solutions to recover surface colours, or equivalently, to estimate the illumination, of single light source in a given scene. Several colour constancy models will be proposed. These methods have different methodologies and constraints. For example, a method can be constrained on a particular model surface material, on blackbody radiation light source, on dichromatic model, and on spatial variation of the illumination and the reflectance. The methods to be discussed include, for instance, a method of identifying achromatic surfaces, which can then be used as known references for estimating the scene illumination. A second method examines the colour of human skin and its dependence on its hemoglobin content, melanin content, and the illuminating light. The corresponding basis of these three factors can be represented linearly in logarithm space, where the colour of the light can then be estimated. A third method, uses the fact that the colours reflected by an inhomogeneous dielectric material lie on a plane spanned by the colour of the specular component reflected from the air-surface interface and the colour reflected from the body of the material. Once these planes are detected by a Hough transform, their intersection line represents the scene illumination. A fourth method is based on the independence and difference in the rate of spatial variation of the luminance and the surface reflectance in a given scene, from which image features can be separated via non-negative matrix factorization to reveal the true surface reflectance. A fifth method is based on learning the correspondence between an s colour content and its illumination via thin-plate-spline interpolation so that the chromaticity of the light can be calculated. Finally, a quaternion-based curvature measure approach is developed that can be used as a complement to colour constancy methods that use information from spatial edges. In this thesis, these various methods are proposed to overcome drawbacks in existing approaches for better performance and improved robustness and efficiency.
[Tan, 2009]: Author(s): Yan Tan.

Title: . Improving mouse pointing with eye-gaze targeting: application in radiology.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, October 2009.

Abstract: In current radiology workstations, a scroll mouse is typically used as the primary input device for navigating image slices and conducting operations on an image. Radiological analysis and diagnosis rely on careful observation and annotation of medical images. During analysis of 3D MRI and CT volumes thousands of mouse clicks are performed every day, which can cause wrist fatigue. This thesis presents a dynamic Control-to-Display (C-D) gain mouse movement method, controlled by an eye-gaze tracker as the target predictor. By adjusting the C-D gain according to the distance to the target, the target width in motor space is electively enlarged, thus reducing the index of diculty of the mouse movement. Results indicate that using eye-gaze to predict the target position, the dynamic C-D gain method can improve pointing performance and increase the accuracy over traditional mouse movement.
[Tien, 2009]: Author(s): Geoffrey Tien.

Title: . Building Interactive Eyegaze Menus for Surgery.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, March 2009.

Abstract: A real-time hands-free eyegaze menu selection interface was implemented using a commercially available eyetracking system, with selections activated by eyegaze fixations and glances on menu widgets. A pilot study tested three different spatial layouts of the menu widgets and employed a highly accurate two-stage selection mechanism. Improvements based on the pilot results were incorporated into a second revision of the interface with a more streamlined selection mechanism which allowed us to test users' selection accuracy. Another study was conducted on the revised interface and received a positive response from our participants in addition to a faster selection while maintaining high selection accuracy. A software framework was created for building eyetracking applications which provides basic eyetracking and interaction features in a modular format. The framework is also expandable to include more features by adding customized modules. Two new applications built using the framework were evaluated to demonstrate its flexibility.
[Valadkhan, 2009]: Author(s): Payam Valadkhan.

Title: . Extremal Oriented Graphs and Eerdos-Hajnal Conjecture.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, June 2009.

Abstract: For a family L (finite or infinite) of oriented graphs, a new parameter called the compressibility number of L and denoted by z(L) is defined. The main motivation is the application of this parameter in a special case of Turan-type extremal problems for digraphs, in which it replaces the role of chromatic number parameter in the classic extremal problems. Determining this parameter, in the most explicit possible form, for oriented graphs with bounded oriented and/or acyclic chromatic number (planar graph in particular) leads us to the infamous Erdos-Hajnal conjecture.

2008

[Bian, 2008]: Author(s): Zhengbing Bian.

Title: . Algorithms for Wavelength Assignment and Call Control in Optical Networks.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, August 2008.

Abstract: Routing and channel assignment is a fundamental problem in computer/communication networks. In wavelength division multiplexing (WDM) optical networks, the problem is called routing and wavelength assignment or routing and path coloring (RPC) problem: given a set of connection requests, find a routing path to connect each request and assign each path a wavelength channel (often called a color) subject to certain constraints. One constraint is the distinct channel assignment: the colors (channels) of the paths in the same optical fiber must be distinct. Another common constraint is the channel continuity: a path is assigned a single color. When a path may be assigned different colors on different fibers, the RPC problem is known as the routing and call control (RCC) problem. When the routing paths are given as part of the problem input, the RPC and RCC problems are called the path coloring and call control problems, respectively. Major optimization goals for the above problems include to minimize the number of colors for realizing a given set of requests and to maximize the number of accommodated requests using a given number of colors. Those optimization problems are NP-hard for most network topologies, even for simple networks like rings and trees of depth one. In this thesis, we make the following contributions: (1) We give better approximation algorithms which use at most 3L (L is the maximum number of paths in a fiber) colors for the minimum path coloring problem in trees of rings. The 3L upper bound is tight since there are instances requiring 3L colors. We also give better approximation algorithms for the maximum RPC problem in rings. (2) We develop better algorithms for the minimum and maximum RPC problems on multi-fiber networks. (3) We develop better algorithms for the call control problem on simple topologies. (4) We develop carving-decomposition based exact algorithms for the maximum edge-disjoint paths problem in general topologies. We develop and implement tools for computing optimal branch/carving decompositions of planar graphs to provide a base for the branch/carving-decomposition based algorithms. These tools are of independent interests.
[Cheng, 2008a]: Author(s): Simon Xin Cheng.

Title: . Locating Relay Nodes for P2P VOIP Applications: A PlanetLab-based Experimental Study.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 2008.

Abstract: In Peer-to-Peer VoIP applications, we can use relay nodes to detour default IP paths for recovering from low quality sessions (LQSs). Locating quality relay nodes (QRNs) rapidly and inexpensively however is a very challenging task. In this project, we present a PlanetLab-based experimental study on locating relay nodes. We collect the 1-hop QRNs of all LQSs in a network snapshot and evaluate a series of state-of-the-art QRN selection approaches. These evaluated schemas are either ineffective or involving excessive probes. We observe that LQSs with high location similarity tend to share QRNs. Inspired by this, we propose a geographic landmark (GLM) based system that uses delays from peers to landmarks and geographic constraints to select QRNs. Evaluations demonstrate that GLM outperforms the existing schemes in terms of cost efficiency.
[Cheng, 2008b]: Author(s): Xu Cheng.

Title: . On the Characteristics and Enhancement of Internet Short Video Sharing.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 2008.

Abstract: In this thesis, using long-term data traces, we present an in-depth and systematic measurement study on the characteristics of YouTube, the most successful site providing a new generation of short video sharing service. We find that YouTube videos have noticeable differences compared with traditional videos, making it difficult to use conventional strategies (e.g., peer-to-peer) to reduce the server workload. However, the social network presented among YouTube videos opens new opportunities. We design a novel social network based peer-to-peer short video sharing system, in which peers are responsible for re-distributing the videos that they have cached. We address a series of key design issues to realize the system, including an efficient indexing scheme, a bi-layer overlay leveraging social networking, a source rate allocation and a pre-fetching strategy to guarantee the playback quality. We perform extensive simulations, which show that the system greatly reduces the server workload and improves the playback quality.
[Colak, 2008]: Author(s): Recep Colak.

Title: . Towards Finding The Complete Modulome: Density Constrained Biclustering.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, June 2008.

Abstract: Large-scale gene expression experiments and interaction networks have become major data sources for discovery in systems biology. In several types of interaction networks, as is widely established, active modules, i.e. functional, simultaneously active groups of genes, are best encoded as highly interconnected regions that are co-expressed and show signi cant changes in an accompanying set of gene expression experiments. Accordingly, inferring an organism's active modulome, the entirety of active modules, translates to identifying these dense and co-expressed regions, which is NP-hard. We provide a novel algorithm, DCB-Miner, that addresses the corresponding computationally hard problem by means of a carefully designed search strategy, which has been speci cally adapted to the topological peculiarities of protein interaction networks. Our algorithm outperforms all prior related approaches on standard datasets from H. sapiens and S. cerevisiae in a Gene Ontology-based competition and nds modules that convey particularly interesting novel biological meaning.
[Fulek, 2008]: Author(s): Radoslav Fulek.

Title: . Intersecting convex sets by rays.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, June 2008.

Abstract: What is the smallest number tau = tau_d(n) such that for any collection C of n pairwise disjoint compact convex sets in R^d, there is a point such that any ray (half-line) emanating from it meets at most tau sets of the collection? In this thesis we show an upper and several lower bounds on the value tau_d(n),and thereby we completely answer the above question for R^2,and partially for higher dimensions.We show the order of magnitude for an analog of tau_2(n) for collections of fat sets with bounded diameter. We conclude the thesis with some algorithmic solutions for finding a point p that minimizes the maximum number of sets in C we are able to intersect by a ray emanating from p in the plane, and for finding a point that basically witnesses our upper bound on tau_d(n) in any dimension. However, the latter works only for restricted sets of objects.
[Lau, 2008]: Author(s): Man Ki Lau.

Title: . Energy-preserving Maintenance of k-centers on Large Wireless Sensor Networks.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, November 2008.

Abstract: Recently, large wireless sensor networks have been used in many applications. Analyzing data detected by numerous sensors is one of the prominent issues in these applications. However, the power consumption of sensors is the major bottleneck of wireless sensor network lifetime. Energy-preserving data collection on large sensor networks becomes an important problem. In this thesis, we focus on continuously maintaining k-centers of sensor readings in a large sensor network. The goal is to preserve energy in sensors while the quality of k-centers is retained. We also want to distribute the clustering task into sensors, so that raw data and many intermediate results do not need to be transmitted to the server. We propose the reading reporting tree as the data collection and analysis framework in large sensor networks. We also introduced a uniform sampling method, a reporting threshold method and a lazy approach to achieve good quality approximation of k-centers.
[Schell, 2008]: Author(s): David George Schell.

Title: . Matrix Partitions of Digraphs.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, June 2008.

Abstract: The matrix partition problem has been of recent interest in graph theory. Matrix partitions generalize the study of graph colourings and homomorphisms. Many well-known graph partition problems can be stated in terms of matrices. For example skew partitions, split partitions, homogeneous sets, clique-cutsets, stable-cutsets and k-colourings can all be modeled as matrix partitions. For each matrix partition problem there is an equivalent trigraph H-colouring problem. We show a 'dichotomy' for the class of list H-colouring problems where H is a so-called trigraph path. For each trigraph path H we show that the list H-colouring problem is either NP-complete or polynomial time solvable. For each trigraph path H we associate a digraph H-minus such that the list (H-minus)-colouring problem is polynomial time solvable if the list H-colouring problem is polynomial time solvable, and is NP-complete otherwise.
[Stacho, 2008]: Author(s): Juraj Stacho.

Title: . Complexity of Generalized Colourings of Chordal Graphs.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, April 2008.

Abstract: The generalized graph colouring problem (GCOL) for a fixed integer k, and fixed classes of graphs P₁,…, P_k (usually describing some common graph properties), is to decide, for a given graph G, whether the vertex set of G can be partitioned into sets V₁,…, V_k such that, for each i, the induced subgraph of G on V_i belongs to P_i. It can be seen that GCOL generalizes many natural colouring and partitioning problems on graphs. In this thesis, we focus on generalized colouring problems in chordal graphs. The structure of chordal graphs is known to allow solving many difficult combinatorial problems, such as the graph colouring, maximum clique and others, in polynomial, and in many cases in linear time. Our study of generalized colouring problems focuses on those problems in which the sets P_i are characterized by a single forbidden induced subgraph. We show, that for k=2, all such problems where the forbidden graphs have at most three vertices are polynomial time solvable in chordal graphs, whereas, it is known that almost all of them are NP-complete in general. On the other hand, we show infinite families of such problems which are NP-complete in chordal graphs. By combining a polynomial algorithm and an NP-completeness proof, we answer a question of Broersma, Fomin, Nešetřil and Woeginger about the complexity of the so-called subcolouring problem on chordal graphs. Additionally, we explain, how some of these results generalize to particular subclasses of chordal graphs, and we show a complete forbidden subgraph characterization for the so-called monopolar partitions of chordal graphs. Finally, in the last part of the thesis, we focus on a different type of colouring problem – injective colouring. We describe several algorithmic and (in-)approximability results for injective colourings in the class of chordal graphs and its subclasses. In the process, we correct a result of Agnarsson et al. on inapproximability of the chromatic number of the square of a split graph.
[Xu, 2008]: Author(s): Yabo Xu.

Title: . New Models and Techniques on Privacy-Preserving Infomration Sharing.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, 12 2008.

Abstract: Due to the wide deployment of Internet and information technology, the ever-growing privacy concern has been a major obstacle for information sharing. This thesis work thus centres on developing new models and techniques to deal with emerging privacy issues in various contexts of information sharing and exchange. Specifically, along with the main theme, this thesis work can be divided into three research problems, summarized as follows. The first problem is privacy-preserving data mining spanning multiple private data sources. The goal of this research is to enable the computation as the data collected in a central place, but preserves the privacy of participating sites. This problem has been studied in the context of classification with multiple private data sources integrated with join semantics. The second problem is privacy-preserving data publishing. This research aims to address the scenario where a data owner wishes to publish the data while preserving individual privacy. This topic has been extensively studied in the context of relational data, but much less is known for transaction data. We propose one way to address this issue in this thesis. The third problem is privacy enhancing online personalized service. This research starts from an end s point of view, and studies how to submit a piece of personal data to exchange for service without compromising individual privacy. Our contribution on this topic is a framework under which individual users can strike a balance between service quality and privacy protection.

2007

[Bagheri, 2007]: Author(s): Majid Bagheri.

Title: . Efficient k-Coverage Algorithms forWireless Sensor Networks and Their Applications to Early Detection of Forest Fires.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 2007.

Abstract: Achieving k-coverage in wireless sensor networks has been shown before to be NP-hard. We propose an efficient approximation algorithm which achieves a solution of size within a logarithmic factor of the optimal. A key feature of our algorithm is that it can be implemented in a distributed manner with local information and low message complexity. We design and implement a fully distributed version of our algorithm. Simulation results show that our distributed algorithm converges faster and consumes much less energy than previous algorithms. We use our algorithms in designing a wireless sensor network for early detection of forest fires. Our design is based on the Fire Weather Index (FWI) System developed by the Canadian Forest Service. Our experimental results show the efficiency and accuracy of the proposed system.
[Best, 2007]: Author(s): Micah J. Best.

Title: . Graphs with Monotone Connected Mixed Search Number of at Most Two.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 2007.

Abstract: Graph searching is used to model a variety of problems and has close connections to variations of path-decomposition. This work explores Monotone Connected Mixed Search. Metaphorically, we consider this problem in terms of searchers exploring a network of tunnels and rooms to locate an opponent. In one turn this opponent moves arbitrarily fast while the searchers may only move to adjacent rooms. The objective is, given an arbitrary graph, to determine the minimum number of searchers for which there exist a valid series of moves that searches the graph. We show that the family of graphs requiring at most k searchers is closed under graph contraction. We exploit the close ties between the contraction ordering and the minor ordering to produce a number of structural decomposition techniques and show that there are 172 obstructions in the contraction order for the set of graphs requiring at most two searchers.
[Chen, 2007]: Author(s): Liang Chen.

Title: . Solving Linear Systems of Equations over Cyclotomic Fields.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 2007.

Abstract: Let A in Q ^{n times n}[z] be a matrix of polynomials and b in Q ⁿ[z] be a vector of polynomials. Let m(z) = Phi _k[z] be the k^th cyclotomic polynomial. We want to find the solution vector x in Q ⁿ[z] such that the equation Ax equiv b bmod m(z) holds. One may obtain x using Gaussian elimination, however, it is inefficient because of the large rational numbers that appear in the coefficients of the polynomials in the matrix during the elimination. In this thesis, we present two modular algorithms namely, Chinese remaindering and linear p-adic lifting. We have implemented both algorithms in Maple and have determined the time complexity of both algorithms. We present timing comparison tables on two sets of data, firstly, systems with random generated coefficients and secondly real systems given to us by Vahid Dabbaghian which arise from computational group theory. The results show that both of our algorithms are much faster than Gaussian elimination.
[Fung, 2007]: Author(s): Benjamin C. M. Fung.

Title: . Privacy-Preserving Data Publishing.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, May 2007.

Abstract: The success of data mining relies on the availability of high quality data. To ensure quality data mining, effective information sharing between organizations becomes a vital requirement in today's society. Since data mining often involves person-specific and sensitive information like medical records, the public has expressed a deep concern about their privacy. Privacy-preserving data publishing is a study of eliminating privacy threats while, at the same time, preserving useful information in the released data for data mining. It is different from the study of privacy-preserving data mining which performs some actual data mining task. This thesis identifies a collection of privacy threats in real-life data publishing, and presents a unified solution to address these threats.
[Gattani, 2007]: Author(s): Akshay Kishore Gattani.

Title: . Automated Natural Language Headline Generation Using Discriminative Machine Learning Models.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 2007.

Abstract: Headline or short summary generation is an important problem in Text Summarization and has several practical applications. We present a discriminative learning framework and a rich feature set for the headline generation task. Secondly, we present a novel Bleu measure based scheme for evaluation of headline generation models, which does not require human produced references. We achieve this by building a test corpus using the Google news service. We propose two stacked log-linear models for both headline word selection (Content Selection) and for ordering words into a grammatical and coherent headline (Headline Synthesis). For decoding a beam search algorithm is used that combines the two log-linear models to produce a list of k-best human readable headlines from a news story. Systematic training and experimental results on the Google-news test dataset demonstrate the success and effectiveness of our approach.
[Glen, 2007]: Author(s): Edward Glen.

Title: . jViz.Rna - A Tool for Visual Comparison and Analysis of RNA Secondary Structures.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 2007.

Abstract: RNA is a single stranded biomolecule which can fold back on itself, forming hydrogen bonds. Many projects exist which attempt to predict the 2 dimensional structure that will form from a given nucleotide sequence. Visualization of these predictions helps researchers to better understand the behaviour and results of their prediction algorithms. jViz.Rna is a tool designed to assist in the analysis of structure predictions with three unique features: analysis of multiple structures using seven different visualization methods, dual graphs of RNA structures which highlight the topology of the RNA, and finally, the ability to overlay a predicted structure on top of the native structure of a given RNA in all visualizations apart from dual graphs which use a different comparison method. jViz.Rna is available through http://jviz.cs.sfu.ca. It has been successfully employed on Linux, Mac OS 10.4, and Windows with Java 1.5 or greater.
[Haas, 2007]: Author(s): Wolfgang Haas.

Title: . A Dynamic Resource Scheduling Framework applied to Random Datasets in the Search and Rescue Domain.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, September 2007.

Abstract: Dynamic scheduling refers to a class of scheduling problems in which dynamic events, such as delaying of a task, occur throughout execution. We develop a framework for dynamic resource scheduling implemented in Java with a random problem generator, a dynamic simulator and a scheduler. The problem generator is used to generate benchmark datasets that are read by the simulator, whose purpose is to notify the scheduler of the dynamic events when they occur. We perform a case-study on the CoastWatch problem which is an over subscribed dynamic resource scheduling problem in which we assign unit resources to tasks subject to temporal and precedence constraints. Tabu search is implemented as a uniform platform to test various heuristics and neighbourhoods. We evaluate their performance on the generated benchmark dataset and also measure schedule disruption.
[Harati, 2007]: Author(s): Masoud Harati.

Title: . IGAUNA (IMPROVED GLOBAL SEQUENCE ALIGNMENT USING NON-EXACT ANCHORS.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, May 2007.

Abstract: With the sequencing of the entire genome for many species, bioinformaticians are increasingly relying on efficient whole genome alignment tools exhibiting both high sensitivity and specifcity. We introduce and analyze IGAUNA (Improved Global Alignment Using Non-exact Anchors), a new global alignment algorithm based on GAUNA, which in comparison with the other state-of-the-art algorithms, almost always produces as good or better alignments with high sensitivity and specificity using less time and space. While those tools either find exact or close-to-exact matches as anchors,IGAUNA makes use of suffix trees to find both types of anchors depending on the instance: Exact anchors for very similar sequences and otherwise non-exact anchors that are obtained from a complicated set of techniques. In particular, IGAUNA can rapidly align sequences containing millions of bases on a standard PC where other similar programs are currently incapable of accomplishing such a task.
[Huang, 2007]: Author(s): Xiaorong Huang.

Title: . Fitting Protein Chains to Lattice Using Integer Programming Approach.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, January 2007.

Abstract: Fitting Protein chains to Lattice (FPL) problem can be formulated as follows. Given the 3D-coordinates of all C-alpha atoms in a protein fold, find the optimal lattice approximation (self-avoiding path in the lattice) of the fold. The FPL problem is proved to be NP-complete for cubic lattice with side length 3.8 angstroms when the coordinate root mean-square deviation (cRMS) is used as a similarity measure between original and approximated fold. We design three Integer Programming(IP) formulations for FPL problem, and generate a serial of algorithms which combine dynamic programming and backtracking techniques aiming to reduce the search space, while guaranteeing to find optimal solutions. Experiments show that optimal lattice approximations in cubic lattices with side length 3.8 angstroms using cRMS measurement can be found in feasible time by ILOG CPLEX 9.1 for all proteins in a randomly selected group of proteins (longest of length 1014 residues).
[Li, 2007]: Author(s): Xiaoxing Ginger Li.

Title: . Towards Expression-Invariant Face Recognition using Multiple Adaptive Attributes.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 2007.

Abstract: The performances of most existing face recognition systems suffer from facial expressions. Unfortunately, there is not yet a satisfactory solution. Therefore, the main focus of this thesis is on expression-invariant face recognition algorithms. In this thesis, we first propose a 2D face recognition algorithm by separately modeling geometry and texture information in a face image. The effect of expression is removed from each of these two attributes independently. We then re-combine them to construct a robust face identifier. Then, we extend our algorithm to recognize 3D faces using multiple geometric attributes in a face mesh, taking advantage of the invariance of 3D geometry under poses and illuminations. In order to adapt to expression variations, training is performed for each geometric attribute as well as the weighting scheme for combining multiple attributes. Using our proposed algorithm, the recognition ratio exceeds 96% for the challenging GavaDB database.
[Luo, 2007]: Author(s): Wei Luo.

Title: . Mind Change Optimal Learning: Theory and Applications.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, October 2007.

Abstract: Learning theories play a significant role to machine learning as computability and complexity theories to software engineering. Gold Rs language learning paradigm is one cornerstone of modern learning theories. The aim of this thesis is to establish an inductive principle in Gold Rs language learning paradigm to guide the design of machine learning algorithms. We follow the common practice of using the number of mind changes to measure complexity of Gold Rs language learning problems, and study efficient learning with respect to mind changes. Our starting point is the idea that a learner that is efficient with respect to mind changes minimizes mind changes not only globally in the entire learning problem, but also locally in subproblems after receiving some evidence. Formalizing this idea leads to the notion of mind change optimality. We characterize mind change complexity of language collections with Cantor Rs classic concept of accumulation order. We show that the characteristic property of mind change optimal learners is that they output conjectures (languages) with maximal accumulation order. Therefore, we obtain an inductive principle in Gold Rs language learning paradigm based on the simple topological concept accumulation order. We illustrate the theory by describing strongly mind change optimal learners for various problems such as identifying linear subspaces, ne-variable patterns, fixed-length patterns, and Bayes net structure. The new inductive principle enables the analysis of the practical problem of learning Bayes net structure in the rich theoretical framework of Gold Rs learning paradigm. Applying the inductive principle of mind change optimality leads to a unique fastest mind change optimal Bayes net learner. This learner conjectures a graph if it is a unique minimal Sindependence map T, and outputs Sno guess T otherwise. As exact implementation of the fast mind change optimal learner for learning Bayes net structure is NP-hard, mind change optimality can be approximated with a hybrid criterion for learning Bayes net structure. The criterion combines search based on a scoring function with information from statistical tests. We show how to adapt local search algorithms to incorporate the new criterion. Simulation studies provide evidence that one such new algorithm leads to improved structure on small to medium samples.
[Ma, 2007]: Author(s): George Zi Sheng Ma.

Title: . Model Checking Support for CoreASM: Model Checking Distributed Abstract State Machines Using SPIN.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, May 2007.

Abstract: We present an approach to model checking Abstract State Machines, in the context of a larger project called CoreASM, which aims to provide a comprehensive and extensible tool environment for the design, validation, and verification of systems using the Abstract State Machine (ASM) formal methodology. Model checking is an automated and efficient formal verification technique that allows us to algorithmically prove properties about state transition systems. This thesis describes the design and implementation of model checking support for CoreASM, thereby enabling formal verification of ASMs. We specify extensions to CoreASM required to support model checking, as well as present a novel procedure for transforming CoreASM specifications into Promela models, which can be checked by the Spin model checker. We also present the results of applying our ASM model checking tool to several non-trivial software specifications.
[Rova, 2007]: Author(s): Andrew Rova.

Title: . Eigen-Css Shape Matching and Recognizing Fish in Underwater Video.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 2007.

Abstract: This thesis presents work on shape matching and object recognition. First, we describe Eigen-CSS, a faster and more accurate approach to representing and matching the curvature scale space (CSS) features of shape silhouette contours. Phase-correlated marginal-sum features and PCA eigenspace decomposition via SVD differentiate our technique from earlier work. Next, we describe a deformable template object recognition method for classifying fish species in underwater video. The efficient combination of shape contexts with larger-scale spatial structure information allows acceptable estimation of point correspondences between template and test images despite missing or inaccurate edge information. Fast distance transforms and tree-structured dynamic programming allow the efficient computation of globally optimal correspondences, and multi-class support vector machines (SVMs) are used for classification. The two methods, Eigen-CSS shape matching and deformable template matching followed by texture-based recognition, are contrasted as complementary techniques that respectively suit the unique characteristics of two substantially different computer vision problems.
[Tisdall, 2007]: Author(s): Matthew Dylan Tisdall.

Title: . Development and Validation of Algorithms for MRI Signal Component Estimation.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, December 2007.

Abstract: The MRI analysis pipeline consists of a data-acquisition stage defined by a protocol, an estimation stage defined by a function, and an analysis stage – normally performed by a adiologist. MRI data is acquired as a 3D or 4D grid of complex-valued measurements. In some protocols more than one set of measurements are fused into a vector of complex values. However, radiologists normally desire a real-valued 3D or 4D dataset representing a feature of interest. To convert from the measurements to the real-valued feature an estimator must be applied. This thesis studies the development and evaluation of estimators. We approach the problem not as one of general image processing, but as one specific to MRI and based in the physics of the measurement process. The estimators proposed are based on the physics of MRI and protocols used clinically. We also show how estimators can be evaluated by testing suitability for radiological tasks. We present statistical models for protocols and features of interest that arise in MRI. Since the models contain nuisance parameters many estimators are available from the statistical theory. Additionally, we consider how adding a constraint of regularity in the phase coordinate of the complex data affects the estimators. We demonstrate how phase regularity can be integrated into the model using estimation with local models and avoiding a costly unwrapping step. To choose among the variety of estimators available for a model, we suggest task-based quality metrics. In particular, for estimators whose output is destined to be viewed by a radiologist, we demonstrate human observer studies and models of human perception that can quantify the quality of an estimator. For features of interest that are analyzed quantitatively, we study the trade-offs between bias and variance that are available. We find that choosing an estimator specific to the feature of interest and protocol can produce substantially improved output. Additionally, we find that our human observer results are not predicted by SNR, challenging the use of SNR for quantifying estimator suitability. We conclude that MRI-specific estimation and evaluation provide substantial advantages over general-purpose approaches.
[Tsang, 2007]: Author(s): Herbert H. Tsang.

Title: . SARNA-Predict: A Permutation-based Simulated Annealing Algorithm for RNA Secondary Structure Prediction.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, Aug 2007.

Abstract: This dissertation describes and presents SARNA-Predict, a novel algorithm for Ribonucleic Acid (RNA) secondary structure prediction based on Simulated Annealing (SA). SA is known to be effective in solving many different types of minimization problems and for finding the global minimum in the solution space. Based on free energy minimization techniques, SARNA-Predict heuristically searches for the structure with a free energy close to the minimum free energy delta G for a strand of RNA, within given constraints. Furthermore, SARNA-Predict has also been extended to predict RNA secondary structures with pseudoknots. Although dynamic programming algorithms are guaranteed to give the minimum free energy structure, the lowest free energy structure is not always the correct native structure. This is mostly due to the imperfections in the currently available thermodynamic models. Since SARNA-Predict can incorporate different thermodynamic models (INN-HB, efn2 and HotKnots) during the free energy evaluation, this feature makes SARNA-Predict superior to other algorithms such as mfold. mfold can only predict pseudoknots-free structures and cannot readily be extended to use other thermodynamic models. SARNA-Predict encodes RNA secondary structures as a permutation of helices that are pre-computed. A novel swap mutation operator and different annealing schedules were incorporated into this original algorithm for RNA Secondary Structure Prediction. An evaluation of the performance of the new algorithm in terms of prediction accuracy is made via comparison with several state-of-the-art prediction algorithms. We measured the sensitivity and specificity using nine prediction algorithms. Four of these are dynamic programming algorithms: mfold, Pseudoknot (pknotsRE), NUPACK, and pknotsRG-mfe. The other five are heuristic algorithms: P-RnaPredict, SARNA-Predict, HotKnots, ILM, and STAR algorithms. An evaluation for the performance of the new algorithm in terms of prediction accuracy was verified with native structures. Experiments on thirty three individual known structures from eleven RNA classes (tRNA, viral RNA, anti-genomic HDV, telomerase RNA, tmRNA, rRNA, RNaseP, 5S rRNA, Group I intron 23S rRNA, Group I intron 16S rRNA, and 16S rRNA) were performed. The results presented in this dissertation demonstrate that SARNA-Predict can out-perform other state-of-the-art algorithms in terms of prediction accuracy.
[Villecroze, 2007]: Author(s): Susan Villecroze.

Title: . Qualitative Study of User Annotations in Satellite Scheduling.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 2007.

Abstract: Past research in scheduling has focused on algorithmic issues and has not addressed many important human-computer interaction issues. For tasks that require a higher level of abstraction and decision, annotation tools could provide an aid. This study investigated how people used annotations to solve problems presented on printed schedules. A user study involving 5 participants was conducted. Participants were presented with a pre-computed satellite schedule and given a practical problem to solve. Video observations, interview answers, and markings on the schedule and source documents provided data for analysis. Results show that while making trade-offs on different priorities, every participant used and benefited from the use of annotations. Participants did not always use specific annotations because of the connotations of the annotation appearance. The results suggest that support is needed for marking priority changes, deleted activities, interesting regions, and adding text on the schedule.
[Wang, 2007]: Author(s): Dan Wang.

Title: . Continuous Data Collection in Wireless Sensor Networks.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, January 2007.

Abstract: Recently, it has come to be generally believed by academia end industry alike that the sensor network will have a key role to extend the reachability of the next generation Internet. A key characteristic of this network is that there is no single node in the network that is powerful enough to perform the assigned tasks. An application should be served via the cooperation of several nodes or even the entire network. The network serves as an information base, and is data driven, as opposed to a provider for the point-to-point connection. The main challenge of this network is huge information organization, including information storage, searching and retrieval, especially in a continuous way. There are many specific and interrelated problems. We list a few examples. First, data accuracy: the correctness of the sensor network to represent the properties of the sensor field. Second, data search and retrieval delay; while low delay is always preferred, various applications have different delay constraints. Third, overhead; low transmission overhead is often the main consideration in system design, as it is directly related to the usage of energy, the most severely limited resource for sensors. In this thesis, we first discuss load balanced sensor coverage ,which provides a lower layer support for long run sensor data collection. We then concentrate on how to balance the parameters in data collection of the sensor networks, so that the user queries and applications can be satisfied with reasonable delay and low overhead. Based on different application specifics, we try to use a smaller number of sensors, less number of transmissions by exploring historical and topological information, coding techniques and data distribution information. Our analysis and experimental results show that our architecture and algorithms provide both theoretical and practical insights for sensor network design and deployment.
[Ye, 2007]: Author(s): Jiang Ye.

Title: . A Review of Artificial Intelligence Techniques Applied to Protein Structure Prediction.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, May 2007.

Abstract: Protein structure prediction (PSP) is a significant, yet difficult problem that attracts attention from both biology and computing worlds. The problem is to predict protein native structure from primary sequence using computational means. It remains largely unsolved due to the fact that no comprehensive theory of protein folding is available and a global search in the conformational space is intractable. This is why AI techniques have been effective in tackling some aspects of this problem. This survey report reviews biologically initiated AI techniques that have been applied to PSP problem. We focus on evolutionary computation and ANNs. Evolutionary computation is used as a population-based search technique, mainly in ab initio prediction approach. ANNs are most successful in secondary structure prediction by learning meaningful relations between primary sequence and secondary structures from datasets. The report also reviews a new generative encoding scheme L-systems to capture protein structure on lattice models.
[Zeng, 2007]: Author(s): Xinghuo Zeng.

Title: . Efficient Maximal Frequent Itemset Mining By Pattern-Aware Dynamic Scheduling.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, May 2007.

Abstract: While frequent pattern mining is fundamental for many data mining tasks, mining maximal frequent itemsets efficiently is important in both theory and applications of frequent itemset mining. The fundamental challenge is how to search a large space of item combinations. Most of the existing methods search an enumeration tree of item combinations in a depth-first manner. In this thesis, we develop a new technique for more efficient maximal frequent itemset mining. Different from the classical depth-first search, our method uses a novel probing and reordering search method. It uses the patterns found so far to schedule its future search so that many search subspaces can be pruned. Three optimization techniques, namely reduced counting, pattern expansion and head growth, are developed to improve the performance. As indicated by a systematic empirical study using the benchmark data sets, our new approach outperforms the currently fastest maximal frequent itemset mining algorithm FPMax* clearly.
[Zhou, 2007]: Author(s): Bin Zhou.

Title: . Mining page farms and its application in link spam detection.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, March 2007.

Abstract: Ranking pages on the Web is an essential task in Web search. For a Web page p, what other pages are the major contributors to the ranking score of p? How are the contributions made? Understanding the general relations of Web pages and their environments is important with a few interesting applications such as Web spam detection. In this thesis, we study the novel problem of page farm mining and its application in link spam detection. A page farm is the set of Web pages contributing to (a major portion of) the PageRank score of a target page. We show that extracting page farms is computationally expensive, and propose heuristic methods. We propose the concept of link spamicity based on page farms to evaluate the degree of a Web page being link spam. Using a real sample of more than 3 million Web pages, we analyze the statistics of page farms. We examine the effectiveness of our spamicity-based link spam detection methods using a newly available real data set of spam pages. The empirical study results strongly indicate that our methods are effective.

2006

[Bavarian, 2006]: Author(s): Maryam Bavarian.

Title: . Design and Analysis of Biological Sequences using Constraint Handling Rules.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 2006.

Abstract: The need for processing biological information is rapidly growing, owing to the masses of new information in digital form being produced at this time. Old methodologies for processing it can no longer keep up with this rate of growth. We present a novel methodology for solving an important bioinformatics problem, which has been proved to be computationally hard: that of finding a RNA sequence which folds into a given structure. Previous solutions to this problem divide the whole structure into smaller substructures and apply some techniques to resolve it for smaller parts, which causes them to be slow while working with longer RNAs. We prove that by using a set of simple CHR rules we are able to solve this problem and obtain an approximate but still useful solution more efficiently. We expect the results we present to be applicable, among other things, to in vitro genetics and to drug design.
[Cao, 2006]: Author(s): Xiao Rui Cao.

Title: . A Framework for Benchmarking Haptic Systems. Master's final project thesis, School of Computing Science, Simon Fraser University, 04 2006.

Abstract: As more and more haptic rendering algorithms are developed, the need for evaluation and comparison is becoming more pressing. However, evaluating and comparing haptic rendering algorithms presents two challenges. First, haptic systems provide bidirectional communication between humans and the computer. Therefore, the outputs of such systems are highly reliant on human inputs, which are hard to reproduce consistently. Second, haptic systems are real-time systems. Testing real-time systems itself is difficult because of their timing constraints. Our solution to these challenges is to build a simulation-based tool where human inputs and the real haptic device are replaced by input simulation models, and a haptic device simulator, respectively. The main purpose of this tool is to provide repeatable inputs for haptic systems testing and enable qualitative evaluation among them. The tool also provides replicable test cases for haptic system debugging and regression tests.
[Clements, 2006]: Author(s): Andrew Clements.

Title: . Minimum Ratio Contours For Meshes.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, June 2006.

Abstract: We present a novel minimum ratio contour (MRC) algorithm, for discretely optimizing contours on the surface of triangle meshes. We compute the contour having the minimal ratio between a numerator and a denominator energy. The numerator energy measures the bending and salience (feature adaptation) of a contour, while the denominator energy measures contour length. Given an initial contour, the optimal contour within a prescribed search domain is sought. The search domain is modeled by a weighted acyclic edge graph, where nodes in the graph correspond to directed edges in the mesh. The acyclicity of this graph allows for an efficient computation of the MRC. To further improve the result, the algorithm may be run on a refined mesh to allow for smoother contours that can cut across mesh faces. Results are demonstrated for postprocessing in mesh segmentation. We also speculate on possible global optimization methods for computing a MRC.
[G.Fouron, 2006]: Author(s): Anne G.Fouron.

Title: . Development of Bayesian Network Models for Obstructive Sleep Apnea Syndrome Assesment.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, January 2006.

Abstract: Bayesian Belief networks have been used for diagnosis in some medical domains and in this thesis we provide a methodology for creating Bayesian Networks to predict Obstructive Sleep Apnea Syndrome severity. We build 3 Bayesian Network topologies: by knowledge engineering, Na%Gï¿½%@e Bayes configuration and a third topology is created using results of the Na%Gï¿½%@e network. All networks are trained on data from 652 patients referred for an overnight polysomnogram. Data is derived from multiple data sources and includes a mix of continuous and discrete variables. We investigate the impact of different topologies and discretizing continuous variables, adding nodes with large amounts of missing values, and removing nodes from networks. Results show that performance is dependent on the interaction between topology and discretization. Node removal increases sensitivity while node addition decreases it.
[Hormozdiari, 2006]: Author(s): Fereydoun Hormozdiari.

Title: . Protein Protein Interaction Network Comparison and Emulation.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 2006.

Abstract: The (asymptotic) degree distribution of the best known scale free network models are all similar and are independent of the seed graph used. Hence it has been tempting to assume that networks generated by these models are similar in general. In this thesis it is shown that several key topological features of such networks depend heavily on the specific model and seed graph used. Furthermore, it is shown that starting with right seed graph, the duplication model captures many topological features of publicly available PPI networks very well.
[Ishida, 2006]: Author(s): Mayu Ishida.

Title: . Reasoning about Actions: A Model-Theoretic Approach.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 2006.

Abstract: A knowledge-based agent reasons with its knowledge and answers queries while performing various tasks. We consider the case where we describe the agent's knowledge in a propositional fragment of the situation calculus and queries in a fragment of ID-logic, the extension of first-order logic with inductive definitions. This fragment of ID-logic is equivalently as expressive as the alternation-free μ-calculus. We formulate the agent's reasoning process as the following question: does the representation T of the agent's knowledge logically entail the query φ (i.e., T vDash φ)? We provide an efficient algorithm for this task, using a model-theoretic approach: we construct from T a canonical model str F^T of the agent's knowledge and ask whether str F^T satisfies φ. Using this approach, the agent can answer the query in time linear with respect to both the size of T and the size of φ.
[Jain, 2006]: Author(s): Varun Jain.

Title: . Robust Correspondence and Retrieval of Articulated Shapes.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, May 2006.

Abstract: We consider the problem of shape correspondence and retrieval. Although our focus is on articulated shapes, the methods developed are applicable to any shape specified as a contour, in the 2D case, or a surface mesh, in 3D. We propose separate methods for 2D and 3D shape correspondence and retrieval, but the basic idea for both is to characterize shapes using intrinsic measures, defined by geodesic distances between points, to achieve robustness against bending in articulated shapes. In 2D, we design a local, geodesic-based shape descriptor, inspired by the well-known shape context for image correspondence. For 3D shapes, we first transform them into the spectral domain based on geodesic affinities to normalize bending and other common geometric transformations and compute correspondence and retrieval in the new domain. Various techniques to ensure robustness of results and efficiency are proposed. We present numerous experimental results to demonstrate the effectiveness of our approaches.
[Jiang, 2006a]: Author(s): Hao Jiang.

Title: . Successive Convexification for Consistent Labeling.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, August 2006.

Abstract: In this thesis, a novel successive convexification scheme is proposed for solving consistent labeling problems with convex regularization terms. Many computer vision problems can be modeled as such consistent labeling problems. The main optimization term, the labeling cost, however, is typically non-convex, which makes the problem difficult. As well, the large search space, i.e., formally the large label set, makes such applications thorny and inefficient to solve using traditional schemes. The proposed scheme successively convexifies the labeling cost surfaces by replacing them with their lower convex hulls, each time starting from the original cost surfaces but within shrinking trust regions, with a careful scheme for choosing new search regions. This enables the new scheme to solve a sequence of much easier convex programming problems and almost always find the correct labeling. The proposed scheme can be applied to labeling problems with any convex regularization terms. In particular, problems with L1-norm regularization terms can be solved with sequential linear programming; and problems with L2-norm regularization terms with sequential quadratic programming. To zero in on the targets in the search space, the method uses a set of basis labels to approximate the cost surface for each site, and this essentially decouples the size of the relaxed convex problem from the number of labels. The proposed scheme also has other useful properties making it well-suited to very large label-set problems, e.g. searching within an entire image. The proposed successive convexification scheme has been applied to many challenging computer vision problems: the task of robustly locating objects in cluttered environments, dense motion estimation with occlusion inference, appearance-adaptive object tracking with boundary refinement, and finally the challenging problem of human posture and action detection both in still images and in video. Compared with traditional methods, the proposed scheme is shown to have a clear advantage in these applications.
[Jiang, 2006b]: Author(s): Yuelong Jiang.

Title: . Finding Interesting Rules From Large Data Sets.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, August 2006.

Abstract: A main goal in data mining is finding those interesting rules data, which may help the user to do something to her advantage. In order to explore really interesting rules, numerous measures of interestingness and corresponding constraints have been developed based on the structures of rules and the statistic information of data. Given measures and their constraints in a large data set, one challenging task is to design efficient algorithms by analyzing the property of the measure constraints to quickly find all qualified rules. Another important aspect is the consideration of the domain knowledge from the user, which will make many statistically significant rules not interesting if the background information is given. In that situation, the subjective measures, which depend on the class of users and their domain knowledge, will be more pivotal to find interesting rules. In this work, we develop a class of methods to address the efficiency and effectiveness issues in finding interesting rules. We firstly introduce a novel approach to speed up the rule mining by pruning search space for a wide class of measure constraints that are hard to be exploited previously. Then, we discuss two new models on subjective measures, which consider not only the real interestingness of the rules from the user's standpoint, but also the efficiency of the algorithms. The extensive experiments show that the models are more tractable and scalable than existing approaches.
[Kwiatkowska, 2006]: Author(s): Bogumila (Mila) Kwiatkowska.

Title: . Integrating Knowledge-Driven and Data-Driven Approaches in the Derivation of Clinical Prediction Rules.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, November 2006.

Abstract: Clinical prediction rules play an important role in medical practice. They expedite diagnosis and treatment for the serious cases and limit unnecessary tests for low-probability cases. However, the creation process for prediction rules is costly, lengthy, and involves several steps: initial clinical trials, rule generation and refinement, validation, and evaluation in clinical settings. With the current development of efficient data mining algorithms and growing accessibility to a vast amount of medical data, the creation of clinical rules can be supported by automated or semi-automated rule induction from the existing data sources. A data-driven method based on the reuse of previously collected medical records and clinical trial statistics is very cost-effective; however, it requires well defined and intelligent methods for data, information, and knowledge integration. This thesis presents a new framework for the integration of domain knowledge into purely data-driven techniques for the derivation of clinical prediction rules. We concentrate on two aspects: knowledge representation for the predictors and prediction rules, and knowledge-based evaluation for the automatically induced models. We propose a new integrative framework, a semio-fuzzy approach that has its theoretical foundations in semiotics and fuzzy logic. Semiotics provides representation for the measurements and interpretation of the medical predictors. Fuzzy logic provides explicit representation for the impression of the measurements and prediction rules. The integrative framework is applied to the construction of a knowledge repository for existing facts and rules, detection of medical outliers, handling missing values, handling imbalanced data, and feature selection. Several machine learning techniques are considered, based on model comprehensibility, interpretability, and practical utility in clinical settings. This new semio-fuzzy framework is applied towards the creation of prediction rules for the diagnosis of obstructive sleep apnea, a serious and under-diagnosed respiratory disorder, and tested on heterogeneous clinical data sets. The induced decision trees and logistic regression models are evaluated in context of the existing clinical prediction rules published in medical literature. We describe how the induced rules may confirm, contradict, and expand the expert-created rules.
[Lai, 2006]: Author(s): Lily Yi-Ting Lai.

Title: . Influential Marketing: A New Direct Marketing Strategy Addressing the Existence of Voluntary Buyers.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, September 2006.

Abstract: The traditional direct marketing paradigm implicitly assumes that there is no possibility of a customer purchasing the product unless he receives the direct promotion. In real business environments, however, there are 'voluntary buyers' who will make the purchase even without marketing contact. While no direct promotion is needed for voluntary buyers, the traditional response-driven paradigm tends to target such customers. In this thesis, the traditional paradigm is examined in detail. We argue that it cannot maximize the net profit. Therefore, we introduce a new direct marketing strategy, called 'influential marketing'. To achieve the maximum net profit, influential marketing targets only the customers who can be positively influenced by the campaign. Nevertheless, targeting such customers is not a trivial task. We present a novel and practical solution to this problem which requires no major changes to standard practices. The evaluation of our approach on real data provides promising results.
[Li, 2006]: Author(s): John Yung San Li.

Title: . Nonobtuse Meshes with Guaranteed Angle Bounds.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, dec 2006.

Abstract: High-quality mesh representations of 3D objects are useful in many applications ranging from computer graphics to mathematical simulation. We present a novel method to generate triangular meshes with a guaranteed face angle bound of [30 degrees, 90 degrees]. Our strategy is to first remesh a 2-manifold, open or closed, mesh into a rough approximate mesh that respects the proposed angle bounds. This is achieved by a novel extension to the Marching Cubes algorithm. Next, we perform an iterative constrained optimization, along with constrained Laplacian smoothing, to arrive at a close approximation of the input mesh. A constrained mesh decimation algorithm is then carried out to produce a hierarchy of coarse meshes where the angle bounds are maintained. We demonstrate the quality of our work through several examples.
[Lin, 2006]: Author(s): Zhi Hao Lin.

Title: . An Examination on Convergence Time of the Iterated Prisoner's Dilemma Game Using the Pavlov Strategy in the Different Types of the Interaction Graphs.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, Sept 2006.

Abstract: We embed Iterated Prisoner's Dilemma problem into a graph system. Each vertex in the graph corresponds to a player in the IPD game. At each round, an edge in the graph is picked at random, and two players, who are connected by this edge, will play one round of PD game using Pavlov strategy. After that, another edge will be chosen and corresponding players will play again. This game cycle repeats infinitely, until all of the players in the system choose cooperation as their strategy. We call this state as 'convergence state', and the number of rounds used to reach convergence state as 'convergence time'. Our interest is the relationship between the sizes of various types of graphs and their convergence time. This project explored this relationship by computer simulations. In addition, it also provided observations on other related issues.
[Liu, 2006]: Author(s): Daphne Hao Liu.

Title: . A Consistency-Based System for Knowledge Base Merging.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 2006.

Abstract: The ability to change one's beliefs consistently is essential for sound reasoning in a world where the new information one acquires may invalidate or augment one's current beliefs. Belief revision is the process wherein an agent modifies its beliefs to incorporate the new information received, and knowledge base merging the process wherein the agent is given two or more knowledge bases to merge. We present a binary decision diagram (BDD) - based implementation of Delgrande and Schaub's consistency-based belief change framework. Our system focuses on knowledge base merging with the possible incorporation of integrity constraints, using a BDD solver for consistency checking. We show that the result of merging finite knowledge bases can be represented as a finite formula, and that merging can be streamlined algorithmically by restricting attention to a subset of the vocabulary of the propositional formulas involved. Experimental results and comparisons with related systems are also given.
[Lu, 2006]: Author(s): Cheng Lu.

Title: . Removing Shadows from Color Images.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, August 2006.

Abstract: This thesis is concerned with the derivation of a shadow-free image representation. We propose several methods of automatically detecting and removing shadows in a color image. Our methods stem from the illuminant invariant theory which requires a camera calibration step to find the direction of changes of illumination. In our work, instead of a camera calibration we aim at finding this direction from evidence in the colour image itself. Specifically, we recognize that producing a 1-d projection in the correct invariant direction which is orthogonal to the direction of changes of illumination, will result in a 1-d distribution of pixel values that have smaller entropy than projecting in the wrong direction. Hence we seek that projection which minimizes entropy, and from there go on to remove shadows from the color images. To be able to develop an effective description of the entropy-minimization task, we go over to the quadratic entropy, rather than Shannon's definition. Replacing the observed pixels with a kernel density probability distribution, the quadratic entropy can be written as a very simple formulation, and can be evaluated using the efficient Fast Gauss Transform. The entropy, written in this enbodiment, is more insensitive to quantization than is the usual definition. The resulting algorithm is quite reliable, and the shadow removal step produces good shadow-free colour image results whenever strong shadow edges are present in the image. In almost every case studied, entropy has a strong minimum for the invariant direction, revealing a new property of image formation. To recover a 3-d, full colour shadow-free image representation, we define a thresholding operation to identify the shadow edge. Edges are in-painted across the shadow edge, and re-integrating yields a colour image, equal to the original save for the fact that it is shadow-free. Shadow detection per se is an important step in image analysis. We propose a method of detecting not just strong shadow edges but indeed entire shadow regions in the image taken under ambient light, given extra information from a flash image registered with the first. We argue that the difference in a log domain of the flash image and the ambient image gives a very simple feature space consisting of two components — one in an illuminant-change 3-vector direction, and one along the gray axis. This space provides excellent separation of the shadow and nonshadow areas. We also propose a method for efficient ambient illuminant estimation using the flash image. We verify that the chromaticities corresponding to illuminants with different temperatures fall into different color temperature groups along a line on a plane in the log geometric-mean chromaticity space. Remarkably, our algorithm is truly practical as it can estimate the color of the ambient light even without any prior knowledge about surface reflectance, flash light, or camera sensors. In addition, we propose a novel white balance method which uses the white patch under the estimated illuminant as reference white color for balancing images. Finally, for consumer-grade digital cameras, due to the different illumination conditions, the flash and non-flash images usually have different camera settings when they are taken. We propose a method which can parametrically adjust the two images so as to compensate for the difference in camera settings. The difference between compensated images reflects only the difference in illumination.
[Mahabadi, 2006]: Author(s): Ladan A. Mahabadi.

Title: . A Pseudorandom Generator Construction based on Randomness Extractors and Combinatorial Designs.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 2006.

Abstract: Nisan and Wigderson in their seminal work introduced a new (conditional) pseudorandom generator construction which since then has been extensively used in complexity theory and has led to extensive further research. Impagliazzo and Wigderson (1997), and Sudan, Trevisan, and Vadhan (2001) have shown how this construction can be utilized to prove conditional derandomization results under weaker hardness assumptions. We study the construction of pseudorandom generators, and use an observation of Sudan etal to recast the Impagliazzo-Wigderson construction in terms of weak sources of randomness; such a source is a distribution on binary strings that is 'random' in the sense of having high 'entropy'. We will then use an efficient algorithm of Gabizon etal to extract almost all of the randomness present, obtaining a pseudorandom generator that stretches O(n) bits to Ω(n2ⁿ) bits.
[Memon, 2006]: Author(s): Mashaal Anwar Memon.

Title: . Specification Language Design Concepts: Aggregation and Extensibility in CoreASM.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 2006.

Abstract: Abstract State Machines (ASMs) are a proven methodology for the precise high-level specification of formal requirements in early phases of software design. Many extensions to ASMs have been proposed and used widely, including Distributed ASMs, Turbo ASMs, Gurevich's partial updates, and syntactically convenient rule forms. This, coupled with the fact that ASMs do not bind the user to any predetermined data types or operators, allows for extreme flexibility in exploration of the problem space. Striving to provide this same level of freedom with executable ASMs, the CoreASM engine and language have been designed with syntactic and semantic extensibility in mind. We formally specify extensibility mechanisms that allow for language augmentation with arbitrary data structures supporting simultaneous incremental modification, new operators, and additional language syntax. Our work is a major step toward providing an environment suitable for both further experimentation with ASMs and for the machine-aided creation of robust software specifications.
[Pekerskaya, 2006]: Author(s): Irina Pekerskaya.

Title: . Mining Changing Regions From Access Constrained Data Sets: A Cluster-Embedded Decision Tree Approach.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, January 2006.

Abstract: Change detection is important in many applications. Most of the existing methods have to use at least one of the original data sets to detect changing regions. However, in some important applications, due to data access constraints such as privacy concerns and limited data online availability, the original data may not be available for change detection. In this work, we tackle the problem by proposing a simple yet effective model-based approach. In the model construction phase, original data sets are summarized using the novel cluster-embedded decision trees as concise models. Once the models are built, the original data will not be accessed anymore. In the change detection phase, to compare any two data sets, we compare the two corresponding cluster-embedded decision trees. Our systematic experimental results on both real and synthetic data sets show that our approach can detect changes accurately and effectively.
[Shen, 2006]: Author(s): Wei Shen.

Title: . BGP Route Flap Damping Algorithms.

M.Sc. Thesis, SFU_CS_School, January 2006.

Abstract: Route flap damping (RFD) is a mechanism used in Border Gateway Protocol (BGP) to prevent persistent routing oscillations in the Internet. It plays an important role in maintaining the stability of the Internet routing system. RFD works by suppressing routes that flap persistently. Several existing algorithms address the issue of identifying and penalizing route flaps. In this thesis, we compare three such algorithms: original RFD, selective RFD, and RFD+. We implement these algorithms in ns-2 and evaluate their performance. We also propose two possible improvements to the RFD algorithms.
[Su, 2006]: Author(s): Ming (Mike) Su.

Title: . Using Abstract State Machines to Model a Graphical User Interface System.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 2006.

Abstract: A graphical user interface (GUI) system is a visual tool for the users to operate computer applications. In the software engineering world, verifying that the functions of a GUI system satisfy the perspective of users is one important goal. System modeling provides an opportunity to verify the functionality of the system before implementing it. In this thesis, we model the GUI system of the CoreASM language debugger based on the abstract state machine (ASM) paradigm, and give a formal specification to the GUI system. This GUI system model provides a formalmathematical foundation to specify the architecture and the function form of the GUI system and to specify the interactive actions between the users and the computer application (the CoreASM engine). The design approach in this work incorporates both object-oriented and task-oriented approaches. A process of level-wise refinement is used to solve particular design problems.
[Tang, 2006]: Author(s): Calvin Tang.

Title: . Model Checking Abstract State Machines with Answer Set Programming.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, 04 2006.

Abstract: Answer Set Programming (ASP) is a logic programming paradigm that has been shown as a useful tool in various application areas due to its expressive modelling language. These application areas include Bounded Model Checking (BMC). BMC is a verification technique that is recognized for its strong ability of finding errors in computer systems. To apply BMC, a system needs to be modelled in a formal specification language, such as the widely used formalism of Abstract State Machines (ASMs). In this thesis, we present BMC of ASMs based on ASP. We show how to translate an ASM and a temporal property into a logic program and solve the BMC problem for the ASM by computing an answer set for the logic program. Experimental results for our method using the answer set solvers SMODELS and CMODELS are also given.
[Voll, 2006]: Author(s): Kimberly Voll.

Title: . A Methodology of Error Detection: Improving Speech Recognition in Radiology.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, April 2006.

Abstract: Automated speech recognition (ASR) in radiology report dictation demands highly accurate and robust recognition software. Despite vendor claims, current implementations are sub-optimal, leading to poor accuracy, and time and money wasted on proofreading. Thus, other methods must be considered for increasing the reliability and performance of ASR before it is a viable alternative to human transcription. One such method is post-ASR error detection, used to recover from the inaccuracy of speech recognition. This thesis proposes that detecting and highlighting errors, or areas of low confidence, in a machine-transcribed report allows the radiologist to proofread more efficiently. This, in turn, restores the benefits of ASR in radiology, including efficient report handling and resource utilization. To this end, an objective classification of error-detection methods for ASR is established. Under this classification, a new theory of error detection in ASR is derived from the hybrid application of multiple error-detection heuristics. This theory is contingent upon the type of recognition errors and the complementary coverage of the heuristics. Inspired by these principles, a hybrid error-detection application is developed as proof of concept. The algorithm relies on four separate artificial-intelligence heuristics together covering semantic, syntactic, and structural error types, and developed with the help of 2700 anonymised reports obtained from a local radiology clinic. Two heuristics involve statistical modeling: pointwise mutual information and co-occurrence analysis. The remaining two are non-statistical techniques: a property-based, constraint-handling-rules grammar, and a conceptual distance metric relying on the ontological knowledge in the Unified Medical Language System. When the hybrid algorithm is applied to thirty real-world radiology reports, the results are encouraging: up to a 24% increase in the recall performance and an 8% increase in the precision performance over the best single technique. In addition, the resulting algorithm is efficient and modular. Also investigated is the development necessary to turn the hybrid algorithm into a real-world application suitable for clinical deployment. Finally, as part of an investigation of future directions for this research, the greater context of these contributions is demonstrated, including two applications of the hybrid method in cognitive science and machine learning.
[Wang, 2006]: Author(s): Yang-Wendy Wang.

Title: . Sentence Ordering For Multi-Document Summarization in Response to Multiple Queries.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, October 2006.

Abstract: The growing access to large amounts of text data opens more opportunities in information processing. Given a list of complex questions and a set of relevant documents, the task of producing an informative and coherent summary of those documents in response to the questions has attracted a great deal of attention recently. However, the problem of organizing information for summarization so that the generated summary is coherent has received relatively little attention. Several approaches have been proposed for sentence ordering in single-document and generic multiple- document summarization, but no single method has been found to address sentence ordering in query-based summarization. In this thesis, we propose and implement an algorithm that combines constraints from query order and topical relatedness in human produced summaries of multiple documents in response to multiple questions. To test the effectiveness of the constraints, we construct a new query- based corpus from the human produced summaries for the Document Understanding Conference(DUC) 2006 evaluation. We then conduct experiments, using an automatic evaluation method based on Kendall's Tao, to evaluate and compare the effectiveness of our approaches to others. Our results show that both query order and topical relatedness improve the ordering performance when compared to a baseline method, and a combination of these two constraints achieves even better results.
[Xie, 2006]: Author(s): Wing Xie.

Title: . Obstructions to Trigraph Homomorphisms.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 2006.

Abstract: Many graph partition problems seek a partition into parts with certain internal constraints on each part, and similar external constraints between the parts. Such problems have been traditionally modeled using matrices, as the so-called M-partition problems. More recently, they have also been modeled as trigraph homomorphism problems. This thesis consists of two parts. In the first part, we survey the literature dealing with both general and restricted versions of these problems. Most existing results attempt to classify these problems as NPcomplete or polynomial time solvable. In the second part of the thesis, we investigate which of these problems can be characterized by a finite set of forbidden induced subgraphs. We develop new tools and use them to find all such partition problems with up to five parts. We also observe that these problems are automatically polynomial time solvable.
[Zhang, 2006]: Author(s): Yinan Zhang.

Title: . Spatial Interference Reduction for Multi-robot Systems Using Rational And Team-based Aggression.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, July 2006.

Abstract: A team of robots with no centralized control performing a transportation task in a confined environment frequently interfere with each other. Previous work has shown that stereotyped robot-robot competition, inspired by aggressive displays in animals, can be used to effectively reduce such interference and improve overall system performance. Two principled approaches to determining aggression for robots are described in this thesis. The first, global investment, is based on the concept of `economical investment'. The second, team-based aggression, extends and improves upon the economical investment scheme by increasing the coordination between robots. Simulation results show that both approaches improve the system efficiency compared to the approach of setting robots' aggression at random, and team-based aggression provides the best performance yet observed. The thesis also introduces a new scheme for implementing aggression functions using a simple network model. Further, the effects of interference-reduction methods over a range of population sizes are studied.

2005

[Bastani, 2005]: Author(s): Behnam Bastani.

Title: . Analysis of Colour Display Characteristics.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, March 2005.

Abstract: Predicting colours across multiple display devices requires implementation of device characterization, gamut mapping, and perceptual models. This thesis studies characteristics of CRT, LCD monitors and projectors. It compares existing models and introduces a new model that improves existing calibration algorithms. Gamut mapping assigns a mapping between two different colour spaces. Previously, the focus of gamut mapping has been between monitor and printer, which have relatively different gamut shape. Implementation and result of existing models are compared and a new model is introduced that its output images are as good as the best available models but runs in less time. DLP projectors with a different technology require a more complex calibration algorithm. A new approach for calibrating DLP projectors is introduced with a significantly better performance on predicting RGB data given tristimulus values. At the end, a new calibration method, using Support Vector Regression is introduced.
[Birke, 2005]: Author(s): Julia Birke.

Title: . A Clustering Approach for the Unsupervised Recognition of Nonliteral Language.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 2005.

Abstract: In this thesis we present TroFi, a system for separating literal and nonliteral usages of verbs through unsupervised statistical word-sense disambiguation and clustering techniques. TroFi distinguishes itself by redefining the types of nonliteral language handled and by depending purely on sentential context rather than selectional constraint violations and paths in semantic hierarchies. TroFi uses literal and nonliteral seed sets acquired and cleaned without human supervision to bootstrap learning. We adapt a word-sense disambiguation algorithm to our task and add learners, a voting schema, SuperTags, and additional context. Detailed experiments on hand-annotated data and the introduction of active learning and iterative augmentation allow us to build the TroFi Example Base, an expandable resource of literal/nonliteral usage clusters for the NLP community. We also describe some other possible applications of TroFi and the TroFi Example Base. Our basic algorithm outperforms the baseline by 24.4%. Adding active learning increases this to over 35%.
[Chen, 2005]: Author(s): Hao (Leo) Chen.

Title: . User Clustering and Traffic Prediction in a Trunked Radio System.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, February 2005.

Abstract: Traditional statistical analysis of network data is often employed to determine traffic distribution, to summarize user's behavior patterns, or to predict future network traffic . Mining of network data may be used to discover hidden user groups, to detect payment fraud, or to identify network abnormalities. In our research we combine traditional traffic analysis with data mining technique. We analyze three months of continuous network log data from a deployed public safety trunked radio network. After data cleaning and traffic extraction, we identify clusters of talk groups by applying AutoClass tool and K-means algorithm on user's behavior patterns represented by the hourly number of calls. We propose a traffic prediction model by applying the classical SARIMA models on the clusters of users. The predicted network traffic agrees with the collected traffic data and the proposed cluster-based prediction approach performs well compared to the prediction based on the aggregate traffic.
[Ghaffari, 2005]: Author(s): Roozbeh Ghaffari.

Title: . Snake Contours in Three-Dimensions From Colour Stereo Image Pairs.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 2005.

Abstract: Snakes (active contour models) are extended to segment regions of interest on the surface of 3D objects. Stereo images taken with calibrated cameras are used as input. For this method the depth map for the whole image does not need to be computed. Instead, 3D external forces are designed to keep the contour on the surface of the object while moving it toward the desired boundaries. Color information is used to improve the ability of snakes in detecting the boundaries, in contrast to the majority of previous methods which are based on intensity information alone. The proposed method produces 3D contours on the surface of the object with coordinates in physical units, e.g. millimeters. These contours can be used to view the structure and dimensions of any distinguishable region on the surface of an object. Examples include oral lesions and skin diseases. The whole process requires minimal human interaction; however, user input can be used to improve segmentation.
[Leung, 2005]: Author(s): Chi Hang (Philip) Leung.

Title: . Understanding, Interpreting and Querying Web Statistical Tables.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 2005.

Abstract: Extraction of information from tables published on the Web is made less complicated because of easy identification of the text inside a table cell. In this thesis, we propose, and have implemented, a scheme which not only understands the contents in a statistical table, but is also able to convert them into a multidimensional database which can then be fed into an off-the-shelf system for querying and data integration. By carefully interpreting the intention of the table author via the visual cues embedded into the HTML text, and the layout design of multidimensional database modelling techniques, our system can successfully classify the keywords into semantically distinct dimension hierarchies, without any domain-specific knowledge, or machine learning. Experiments on a set of real-life statistical tables have confirmed the validity of this approach. Experiments on a set of real-life statistical tables have confirmed the validity of this approach.
[Liu, 2005]: Author(s): Jingyu Liu.

Title: . Lexicon Caching in Full-Text Databases.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, March 2005.

Abstract: Caching is a widely used technique to leverage access time difference between two adjacent levels of storage in the computer memory hierarchy, e.g., cells in main memory leftrightarrow cells in the cpu cache, and blocks on disk leftrightarrow pages in main memory. Especially in a database system, buffer management is an important layer to keep hot spot data in main memory so as to minimize slow disk I/O and thus improve system performance. In this thesis, we present a term-based method to cache lexicon terms in full-text databases, which aims at reducing the size of the lexicon that must be kept in memory, while providing good performance for finding the requested terms. We empirically show that,under the assumption of Zipfs-like term access distribution, given the same amount of main memory, our term-based caching method achieves a much higher hit ratio and much faster response time than traditional page-based buffering methods used in database systems.
[Lu, 2005]: Author(s): Ye Lu.

Title: . Automatic Object Extraction and Reconstruction in Active Video.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, January 2005.

Abstract: A new method of video object extraction is proposed to accurately obtain the object of interest from actively acquired videos. Traditional video object extraction techniques often operate under the assumption of homogeneous object motion and extract various parts of the video that are motion consistent as objects. In contrast, the proposed active video object extraction (AVOE) paradigm assumes that the object of interest is being actively tracked by a camera moving in 3D and classifies the possible motions of the camera that result in the 2D motion patterns as recovered from 2D image sequences. Consequently, the AVOE method is able to extract only objects of interest from active videos while ignoring other less important objects. We formalize the AVOE process using notions from Gestalt psychology. We define a new Gestalt factor called ``shift and hold'' which acts as a bridge between 2D Gestalt groupings and 3D object perception. We also propose a novel cooperative method for efficient dense 2D motion estimation as part of the AVOE framework. Using motion fields recovered from successive frames of the video, we propose a core algorithm to perform 2D object extraction. In addition, we also propose a linear programming based boundary adjustment algorithm that takes into account the strength and orientation of candidate boundary pixels to refine object outlines extracted by the core algorithm. More effective indexing and retrieval techniques can be devised if the extracted objects are not limited only to their 2D views but can be intelligently integrated to form 3D object models. In this way, objects can be searched and retrieved using their 3D shapes in addition to the 2D image based features. In order to address this need for 3D object models, we also describe the design and implementation of an active video object extraction and 3D reconstruction system as part of this thesis.
[Mandryk, 2005]: Author(s): R.L. Mandryk.

Title: . Modeling User Emotion in Interactive Play Environments: A Fuzzy Physiological Approach.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, December 2005.

Abstract: Researchers are integrating emerging technologies into interactive play environments, and established game markets continue to expand, yet evaluating play environments is challenging. While task performance metrics are commonly used to objectively and quantitatively analyse productivity systems; with play systems, the quality of the experience, not the performance of the participant is important. This research presents three experiments that examine users' physiological signals to continuously model user emotion during interaction with play technologies. Modeled emotions are powerful because they capture usability and playability, account for user emotion, are quantitative and objective, and can be represented continuously. In Experiment One we explored how physiological signals respond to interaction with play technologies. We collected a variety of physiological measures while observing participants playing a computer game in four difficulty conditions, providing a basis for experimental exploration of this domain. In Experiment Two we investigated how physiological signals differ between play conditions, and how physiological signals co-vary with subjective reports. A different physiological response was observed when playing a computer game against a collocated friend versus a computer. When normalized, the physiological results mirrored subjective reports. In Experiment Three we developed a method for modeling emotion using physiological data. A fuzzy logic model transformed four physiological signals into arousal and valence. A second fuzzy logic model transformed arousal and valence into five emotions: boredom, challenge, excitement, frustration, and fun. The modeled emotions' means were evaluated with test data, and exhibited the same trends as the reported emotions for fun, boredom, and excitement, but modeled emotions revealed differences between three play conditions, while differences between reported emotions were not significant. Mean emotion modeled from physiological data fills a knowledge gap for objective and quantitative evaluation of entertainment technologies. Using our technique, user emotion can be analyzed over an entire experience, revealing variance within and between conditions. This continuous representation has a high evaluative bandwidth, and is important because the process, not the outcome of playing determines success. The continuous representation of modeled emotion is a powerful evaluative tool, that when combined with other approaches, forms a robust method for evaluating user interaction with play technologies.
[Meynert, 2005]: Author(s): Alison Maria Meynert.

Title: . Common Evidence Network: An Integrated Approach to Investigating Gene Relationships.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 2005.

Abstract: A common evidence network is a data structure that integrates evidence for relationships between genes from disparate data sources and across data types. It is an undirected weighted graph where nodes represent genes and edge weights are quantitative measures of confidence in the evidence linking two genes. We describe methods for producing edge weights for two evidence types: literature co-citation and similarity of Gene Ontology annotations. A tool was developed for identifying genes across multiple databases and consolidating selected annotations. Using gene synonym lists obtained from this tool, we extracted co-citations of genes from annotated biomedical abstracts as evidence. We developed a novel approach to interpreting the similarity of Gene Ontology terms annotated to genes. The method produces a score that quantitatively describes the similarity of Gene Ontology term annotations between two genes. We tested both methods on a set of genes sharing a common sequence feature.
[Song, 2005]: Author(s): Jiaqing Song.

Title: . Modeling and Performance Analysis of Public Safety Wireless Networks.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 2005.

Abstract: Public safety wireless networks (PSWNs) play a vital role in operations of emergency agencies such as police and fire departments. In this thesis, we describe analysis and modeling of traffic data collected from the Emergency Communications for Southwestern British Columbia (E-Comm) PSWN. We analyze network and agency call traffic and find that lognormal distribution and exponential distribution are adequate for modeling call holding time and call inter-arrival time, respectively. We also describe a newly developed wide area radio network simulator, named WarnSim. We use WarnSim simulations to validate the proposed traffic model, evaluate the performance of the E-Comm network, and predict network performance in cases of traffic increase. We also provide recommendations on allocating E-Comm network resources to deal with the increased traffic volume.
[Zhang, 2005]: Author(s): Zhong Zhang.

Title: . Applications of Visibility Space in Polygon Search Problems.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 2005.

Abstract: This thesis investigates several problems related to searching a polygonal area for intruders. The mutual visibility between an arbitrary pair of points on the boundary of a polygon is an important piece of information we can make use of when searching a polygon. We extensively employ the visibility diagram that represents mutual visibility information for each pair of boundary points. We first investigate the two-guard room search problem, where two guards cooperate in finding an intruder by maintaining mutual visibility. In terms of the visibility diagram we characterize the class of searchable rooms in a concise way. We also find all doors in a polygon, if any, such that the resultant rooms are searchable by two guards. The second problem we tackle in this thesis is the polygon search problem by a boundary 1-searcher, who moves along the boundary of a polygon and can see the points on the beam from a flashlight. We identify the patterns that make a polygon non-searchable. The third problem we investigate is to search a polygon with one hole by two boundary 1-searchers. We solve this problem by extending the visibility diagram.
[Zuluaga, 2005]: Author(s): Mauricio Zuluaga.

Title: . RAGE AMONGST THE MACHINES: A biological approach to reducing spatial interference in multi-robot systems.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, May 2005.

Abstract: Interference is a common problem for animals, and can be characterized as a competition for different types of resources such as food, mates or territory. In the case of multi-robot systems, similar problems arise. Many species have evolved aggressive displays as a more efficient alternative to physical combat to solve conflicts over resources. This thesis considers a transportation task in which a team of robots with no centralized control frequently interfere with each other. This thesis describes two new, principled approaches to selecting an aggression level, based on a robot's investment in a task. The methods are economically rational. Simulation experiments in an office-type environment and a smaller-scale real world implementation show that under some special circumstances, the methods are able to significantly improve system performance compared to a similar competition with a random outcome.

2004

[Anderson, 2004]: Author(s): Darryl Anderson.

Title: . Image Object Search Combining Colour With Gabor Wavelet Shape Descriptors.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 2004.

Abstract: An image and object search and retrieval algorithm is devised that combines colour and spatial information. Spatial characteristics are described in terms of Wiskott's jets formulation, based on a set of Gabor wavelet functions at varying scales, orientations and locations. Colour information is first converted to a form more impervious to illumination colour change, reduced to 2D, and encoded in a histogram. The histogram, which is based on a new stretched chromaticity space for which all bins are populated, is resized and compressed by way of a DCT. An image database is devised by replicating JPEG images by a set of transforms that include resizing, various cropping attacks, JPEG quality changes, aspect ratio alteration, and reducing colour to greyscale. Correlation of the complete encode vector is used as the similarity measure. For both searches with the original image as probe within the complete dataset, and with the altered images as probes with the original dataset, the grayscale, the stretched, and the resized images had near-perfect results. The most formidable challenge was found to be images that were cropped both horizontally as well as vertically. The algorithm's ability to identify objects, as opposed to just images, is also tested. In searching for images in a set of 5 classifications, the jets were found to contribute most analytic power when objects with distinctive spatial characteristics were the target.
[Evangelista, 2004]: Author(s): Eric Evangelista.

Title: . A Knowledge-Level View of Consistent Query Answers.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, sep 2004.

Abstract: Of the numerous formal approaches that deal with database inconsistencies with respect to integrity constraints, all share the view that such constraints are statements about the world the database models. An alternative perspective, however, considers constraints as statements about the knowledge the database has of its domain. The result of this shift in perspective allows us to regard integrity constraint violations as a fragment of the incomplete knowledge the system has of the world. We can then query the possibly inconsistent database for consistent query answers. indent We address the above considerations with an epistemic query language KL, where the possible ways to repair a database that violates its integrity constraints are characterized by a set of possible worlds, an epistemic state e_C. This culminates in a situation where only consistent information is known. We ascertain this by querying e_C with KL, providing a knowledge-level formalization of consistent query answers. At the outset, we show that KL is an adequate language for querying databases by specifying a class of admissible formulas for which the set of answers to such queries are safe and domain independent. After formulating database dependencies in KL, we prove that they are members of this class. A Prolog-like sound and complete query evaluator, cqa, for admissible KL formulas is presented. Finally, we completely characterize what is known in e_C with a set of first-order sentences.
[Farahbod, 2004]: Author(s): Roozbeh Farahbod.

Title: . Extending and Refining an Abstract Operational Semantics of the Web Services Architecture for the Business Process Execution Language.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, jul 2004.

Abstract: The Business Process Execution Language for Web Services (BPEL) is a forthcoming industrial standard for automated business processes, proposed by the OASIS footnote Organization for the Advancement of Structured Information Standards Web Services BPEL Technical Committee. BPEL is a service orchestration language which extends the underlying Web services interaction model and enables Web services to support long running business transactions. We formally define an abstract operational semantics for BPEL based on the abstract state machine (ASM) paradigm. Specifically, we model the dynamic properties of the key language constructs through the construction of a BPEL Abstract Machine in terms of partially ordered runs of distributed real-time ASMs. The goal of our work is to provide a well defined semantic foundation for establishing the key language attributes by eliminating deficiencies hidden in the informal language definition. This work combines two well-defined ASM refinement techniques to complement our previous efforts on the core model of the BPEL Abstract Machine. First, we elaborate the core model with regard to structural and behavioural aspects to make it more robust and flexible for further refinements. Specifically, we formalize the process execution model of BPEL and its decomposition into execution lifecycles of BPEL activities. We also introduce an agent interaction model to facilitate the interaction between different Distributed Abstract State Machine (DASM) agents of the BPEL Abstract Machine. We then extend the core model through two consecutive refinement steps to include data handling and one of the most controversial issues in BPEL, fault and compensation handling. The resulting abstract machine model provides a comprehensive formalization of the BPEL dynamic semantics and the underlying Web services architecture.
[Fong, 2004]: Author(s): Philip W. L. Fong.

Title: . Proof Linking: A Modular Verification Architecture for Mobile Code Systems.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, January 2004. (PostScript)

Abstract: This dissertation presents a critical rethinking of the Java bytecode verification architecture from the perspective of a software engineer. In existing commercial implementations of the Java Virtual Machine, there is a tight coupling between the dynamic linking process and the bytecode verifier. This leads to delocalized and interleaving program plans, making the verifier difficult to maintain and comprehend. A modular mobile code verification architecture, called Proof Linking, is proposed. By establishing explicit verification interfaces in the form of proof obligations and commitments, and by careful scheduling of linking events, Proof Linking supports the construction of bytecode verifier as a separate engineering component, fully decoupled from Java's dynamic linking process. This turns out to have two additional benefits: (1) Modularization enables distributed verification protocols, in which part of the verification burden can be safely offloaded to remote sites; (2) Alternative static analyses can now be integrated into Java's dynamic linking process with ease, thereby making it convenient to extend the protection mechanism of Java. These benefits make Proof Linking a competitive verification architecture for mobile code systems. A prototype of the Proof Linking Architecture has been implemented in an open source Java Virtual Machine, the Aegis VM (http://aegisvm.sourceforge.net). On the theoretical side, the soundness of Proof Linking was captured in three correctness conditions: Safety, Monotonicity and Completion. Java instantiations of Proof Linking with increasing complexity have been shown to satisfy all the three correctness conditions. The correctness proof had been formally verified by the PVS proof checker.
[Grbavec, 2004]: Author(s): Ann Marie Grbavec.

Title: . Second-Order Generalization in Neural Networks.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, oct 2004.

Abstract: One of the main strengths of neural networks is their ability to generalize beyond training data. However, recent research suggests that certain types of generalization, which humans appear to perform readily, are problematic for traditional neural networks. This thesis examines the foundations of such claims and considers possibilities for resolving several issues they raise. These forms of generalization have attracted attention in large part due to their implications about the role of symbol-processing in cognition. They have been shown to be beyond the scope of the types of neural networks that have been considered to offer an alternative to classically symbolic representations and rules: back-propagating multilayer perceptrons and simple recurrent networks. Interestingly, claims have been made that many classically symbolic machine learning techniques also fail to generalize in this way. The tasks in question can be described, broadly, as generalizing relations to novel items. Previous formulations of the research problem have offered various specific but somewhat inconsistent criteria for characterizing this type of generalization. In this thesis, an analysis of previous formulations reveals how they are limited by unacknowledged assumptions about the role of representation, task form, and learning in problem equivalence. A framework for specifying a generalization task is introduced to support a more lucid discussion of these issues. Applying the framework to sample tasks reveals an underlying distinction between ways in which a generalization may be reached. Based on the results of the analysis, a more unified view of the problem space is presented and related sub-problems are identified. Ways in which winner-take-all networks could play a role in the solution of two problem classes are outlined. Winner-take-all networks are considered to be more biologically plausible than the back-propagating networks which have been considered previously and unsuccessfully for the solution of such problems.
[Hwang, 2004]: Author(s): Cho Yee Joey Hwang.

Title: . A Theoretical Comparison of Resolution Proof Systems for CSP Algorithms.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, Dec 2004.

Abstract: Many problems from a variety of applications such as graph coloring and circuit design can be modelled as constraint satisfaction problems (CSPs). This provides strong motivation to develop effective algorithms for CSPs. In this thesis, we study two resolution-based proof systems, NG-RES and C-RES, for finite-domain CSPs which have a close connection to common CSP algorithms. We give an almost complete characterization of the relative power among the systems and their restricted tree-like variants. We demonstrate an exponential separation between NG-RES and C-RES, improving on the previous super-polynomial separation, and present other new separations and simulations. We also show that most of the separations are nearly optimal. One immediate consequence of our results is that simple backtracking with 2-way branching is exponentially more powerful than simple backtracking with d-way branching.
[Letourneau, 2004]: Author(s): Michael J. Letourneau.

Title: . Heuristics for Generating Additive Spanners.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, aug 2004.

Abstract: Given an undirected and unweighted graph G, the subgraph S is an additive spanner of G with delay d if the distance between any two vertices in S is no more than d greater than their distance in G. It is known that the problem of finding additive spanners of arbitrary graphs for any fixed value of d with a minimum number of edges is NP-hard. Additive spanners are used as substructures for communication networks which are subject to design constraints such as minimizing the number of connections in the network, or permitting only a maximum number of connections at any one node. In this thesis, we consider the problem of constructing good additive spanners. We say that a spanner is ``good'' if it contains few edges, but not necessarily a minimum number of them. We present several algorithms which, given a graph G and a delay parameter d as input, produce a graph S which is an additive spanner of G with delay d. We evaluate each of these algorithms experimentally over a large set of input graphs, and for a series of delay values. We compare the spanners produced by each algorithm against each other, as well as against spanners produced by the best-known constructions for those graph classes with known additive spanner constructions. We highlight several algorithms which consistently produce spanners which are good with respect to the spanners produced by the other algorithms, and which are nearly as good as or, in some cases, better than the spanners produced by the constructions. Finally, we conclude with a discussion of future algorithmic approaches to the construction of additive spanners, as well as a list of possible applications for additive spanners beyond the realm of communication networks.
[Lin, 2004]: Author(s): Zhiwen Lin.

Title: . Near-Optimal Heuristic Solutions for Truncated Harmonic Windows Scheduling and Harmonic Group Windows Scheduling.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, oct 2004.

Abstract: Dividing a video into many segments and then broadcasting each segment periodically has proved to be an efficient and cost-effective way of providing near Video-on-Demand services. Some of the known broadcasting schemes, such as Fixed-Delay Pagoda Broadcasting (FDPB), adopt the fixed-delay policy, which requires the user to wait for a fixed time before watching a video. Our first broadcasting scheme, the Generalized Fixed-Delay Pagoda Broadcasting (GFDPB), based on the fixed-delay policy, improves Bar-Noy et al.^Òs greedy algorithm for the Harmonic Windows Scheduling Problem. GFDPB achieves the lowest maximum waiting time among all the known protocols using segments of equal duration and channels of equal bandwidth. In addition, its performance is very close to the theoretical optimum. Second, we define the Harmonic Group Windows Scheduling (HGWS) problem and present a new broadcasting scheme to solve it, Harmonic Page-set Broadcasting (HPB), which provides the lowest average waiting time of all currently known protocols by using the fewest channels for given server bandwidth. Finally, we present a hybrid broadcasting scheme, Preloading Page-Set Broadcasting (PPSB), which compromises between the average waiting time and the maximum waiting time of HPB. While still providing the shortest average waiting time of all known protocols using segments of equal duration and channels of equal bandwidth, PPSB achieves much shorter maximum waiting time than HPB. Furthermore, PPSB provides a very desirable trade- off between the average waiting time and the maximum waiting for a given server bandwidth, while guaranteeing that its maximum waiting time is only 1/3 longer than its average waiting time.
[Mukherjee, 2004]: Author(s): Kaustav Mukherjee.

Title: . Application of the Gabriel Graph to Instance Based Learning Algorithms.

M.Sc. Thesis, SFU_CS_School, aug 2004.

Abstract: Instance based learning (IBL) algorithms attempt to classify a new unseen instance (test data) based on some ^Óproximal neighbour^Ô rule, i.e. by taking a majority vote among the class labels of its proximal instances in the training or reference dataset. The k nearest neighbours (k-NN) are commonly used as the proximal neighbours. We study the use of the state of the art approximate technique of k-NN search on the best known IBL algorithms. The results are impressive; substantial speed up in computation is achieved and on average the accuracy of classification is preserved. Geometric proximity graphs especially the Gabriel graph (a subgraph of the well known Delaunay triangulation) provides an elegant algorithmic alternative to k-NN based IBL algorithms. The main reason for this is that the Gabriel graph preserves the original nearest neighbour decision boundary between data points of different classes very well. However computing the Gabriel graph of a dataset in practice is prohibitively expensive. Extending the idea of approximate k-NN search to approximate Gabriel neighbours search, it becomes feasible to compute the latter. We thin (reduce) the original reference dataset by computing its approximate Gabriel graph and use it independently as a classifier as well as an input to the IBL algorithms. We achieve excellent empirical results; in terms of classification accuracy, reference set storage reduction and consequently, query response time. The Gabriel thinning algorithm coupled with the IBL algorithms consistently outperforms the IBL algorithms used alone, with respect to storage requirements and maintains similar accuracy levels.
[Storjohann, 2004]: Author(s): Rasmus Storjohann.

Title: . Growing brains in silico: Integrating biochemistry, genetics and neural activity in neurodevelopmental simulations.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 2004.

Abstract: Biologists' understanding of the roles of genetics, biochemistry and activity in neural function is rapidly improving. All three interact in complex ways during development, recovery from injury and in learning and memory. The software system NeuroGene was written to simulate neurodevelopmental processes. Simulated neurons develop within a 3D environment. Protein diffusion, decay and receptor-ligand binding are simulated. Simulations are controlled by genetic information encoded using a novel programming language mimicking the control mechanisms of biological genes. Simulated genes may be regulated by protein concentrations, neural activity and cellular morphology. Genes control protein production, changes in cell morphology and neural properties, including learning. We successfully simulate the formation of topographic projection from the retina to the tectum. We propose a novel model of topography based on simulated growth cones. We also simulate activity-dependent refinement, through which diffuse connections are modified until each retinal cell connects to only a few target cells.
[Tory, 2004]: Author(s): Melanie Tory.

Title: . Combining 2D and 3D Views for Visualization of Spatial Data.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, jul 2004.

Abstract: This research compares two-dimensional (2D), three-dimensional (3D), and 2D/3D combination displays (orientation icon, ExoVis, and in-place) for visualization of 3D spatial data. Both 2D and 3D views can be valuable for different reasons. 3D views can provide an overview of a 3D space, illustrate the 3D shape of objects, and support 3D navigation. 2D views can reduce occlusion of specific parts, show undistorted angles and distances, and enable precise positioning and navigation. Combining 2D and 3D views is valuable when benefits of 2D and 3D are both relevant to the task. First, three 2D/3D combination displays were compared in terms of physical integration of views, occlusion, deformation, flexibility, screen space requirements, and viewing angles. Orientation icons (i.e., 2D and 3D views separated into different windows) offered high flexibility, non-oblique viewing, and low occlusion and deformation, but required substantial screen space and had poor integration of 2D and 3D views. In-place displays (i.e., clip and cutting planes) were the opposite. ExoVis displays (i.e., 2D views surrounding a 3D view in the same scene) had better integration than orientation icons, but greater flexibility and less occlusion and deformation than in-place displays. A theory describing when orientation icon, ExoVis, and in-place displays would be useful was then developed, and experiments that compared 2D displays, 3D displays, and 2D/3D combinations for mental registration, relative positioning, orientation, and volume of interest tasks were performed. In-place supported the easiest mental registration of 2D and 3D views, followed by ExoVis, and lastly orientation icon displays. 3D displays were effective for approximate navigation and positioning when appropriate cues (e.g., shadows) were present, but were not effective for precise navigation and positioning except in specific circumstances (e.g., with good viewing angles). For precise tasks, orientation icon and ExoVis displays were better than 2D or 3D displays alone. These displays had as good or better performance, inspired higher confidence, and allowed natural, integrated navigation. In-place displays were not effective for 3D orientation because they forced users to frequently switch back and forth between dimensions. Major factors contributing to display preference and usability were task characteristics, personal strategy, orientation cues, spatial proximity of views that were used together, occlusion, oblique viewing of 2D views, and methods used to interact with the display. Results of this thesis can be used to guide designers to choose the most appropriate display technique for a given task.
[Vajihollahi, 2004]: Author(s): Mona Vajihollahi.

Title: . High Level Specification and Validation of the Business Process Execution Language for Web Services.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 2004.

Abstract: The Business Process Execution Language for Web Services (BPEL) is an XML based formal language for the design of networking protocols for automated business processes. Originally introduced by leading e-business vendors, including IBM and Microsoft, BPEL is now a forthcoming industrial standard as the work on the language continues at OASIS footnote Organization for the Advancement of Structured Information Standards (OASIS), www.oasis-open.org within the technical committee on the Web Services Business Process Execution Language (WSBPEL TC). We formally define an abstract executable semantics for the language in terms of a distributed abstract state machine (DASM). The DASM paradigm has proven to be a feasible, yet robust, approach for modeling architectural and programming languages and has been used as the basis for industrial standardization before. The goal of this work is to support the design and standardization of BPEL by eliminating weak points in the language definition and validating key system attributes through experimental validation. The necessity of formalisation in the standardization process is well recognized by the OASIS WSBPEL TC and is formulated as one of the basic issues by the technical committee. ``There is a need for formalism. It will allow us to not only reason about the current specification and related issues, but also uncover issues that would otherwise go unnoticed. Empirical deduction is not sufficient.'' footnote Issue #42, WSBPEL Issue List, WSBPEL TC at OASIS. We take a hierarchical refinement approach to model the language. Starting from an abstract ground model of the core attributes of the language, we perform step-wise refinements obtaining a hierarchy of ground models at different levels of abstraction which leads to the final executable model. The executable model is then used together with a graphical visualization tool to experimentally validate the key attributes of the language through simulation of abstract machine runs.
[Wu, 2004a]: Author(s): Shufang Wu.

Title: . Practical lossless broadcasting schemes for variable bit rate videos in video-on-demand service.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, 06 2004.

Abstract: Broadcasting in video-on-demand services scales well in transporting popular videos. A large number of broadcasting schemes have been proposed to minimize the server bandwidth for a given waiting time, a small number of which deals with variable bit rate (VBR) videos that are used in practice. We propose three new lossless VBR video broadcasting schemes to improve on existing schemes. Backward Segmentation using Equal Bandwidth and Prefix-Caching (BSEB-PC) Segments a video backwards from its end until a relatively small part is left, which is cached by users in advance. All segments except the initial one are repeatedly broadcast on given standard channels of equal bandwidth. Experiments with real videos show that this scheme achieves server bandwidth requirement within 1% of the optimum. Forward Segmentation using Equal Bandwidth (FSEB) addresses the issue of minimizing server bandwidth using channels of equal bandwidth, given an upper bound on initial user wait time and user-bandwidth. Experimental results with real videos show that it achieves both better server bandwidth and user bandwidth than any other known scheme in the same deployment environment (i.e., number of server broadcasting channels, the maximum number of channels from which a user can receive concurrently, and user wait time). General Frame Level Segmentation (GFLS) deals with the situation where there are frames in the video that require future reference frames for decoding. This situation is common in practice, but has not been considered in the literature.
[Wu, 2004b]: Author(s): Xiaojing Wu.

Title: . Efficient Java Interface Invocation Using IZone.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, jun 2004. (PostScript)

Abstract: This thesis addresses the problem of improving the efficiency of interface invocation in the Java Virtual Machine (JVM). In current JVM implementations, interface method invocation is not so efficient as virtual method invocation, because of the need to support multiple interface inheritance in Java. This leads to the mistaken impression that Java interface invocation is inherently inefficient. This thesis will show that, with proper implementation, the performance of interface invocation can be substantially improved. A new approach – IZone based interface invocation – is proposed in this thesis. IZone is a new data structure associated with an interface type in the method area. It is composed of several implementation lookup areas, one for each subclass of the interface. IZone is populated as subclasses are loaded and resolved, ensuring that the lookup areas within it are arranged in the resolution order of the corresponding subclasses. A fully constructed IZone contains pointers to different subclasses' implementation of the interface methods. Within a lookup area, the pointers are arranged according to the corresponding methods' declaration order by the interface. By taking advantage of class resolution orders and method declaration orders, IZone provides a quick access to the implementation of interface methods. As the experimental results demonstrate, with moderate space overhead, IZone-based interface invocation is the fastest approach after lightweight optimizations, and the second fastest after heavyweight optimizations.
[Zahariev, 2004]: Author(s): Manuel Zahariev.

Title: . A (Acronyms).

Ph.D. Thesis, School of Computing Science, Simon Fraser University, June 2004.

Abstract: Acronyms are a significant and the most dynamic area of the lexicon of many languages. Building automated acronym systems poses two problems: acquisition and disambiguation. Acronym acquisition is based on the identification of anaphoric or cataphoric expressions which introduce the meaning of an acronym in text; acronym disambiguation is a word sense disambiguation task, with expansions of an acronym being its possible senses. It is proposed here that acronyms are universal phenomena, occurring in all languages with a written form, and that their formation is governed by linguistic preferences, based on regularities at the character, phoneme, word and phrase levels. A universal explanatory theory of acronyms is presented, which rests on a set of testable hypotheses, and is manifested through a set of violable, ordered rules. The theory is developed based on examples from fifteen languages, with six different writing systems. A dynamic programming algorithm is implemented based on the explanatory theory of acronyms. The algorithm is evaluated on lists of acronyms-expansion pairs in Russian, Spanish, Danish, German, English, French, Italian, Dutch, Portuguese, Finnish, and Swedish and achieves excellent performance. A two-pass greedy algorithm for automatic acronym acquisition is designed, which results in good performance for specific domains. A hybrid, machine learning algorithm . using features generated through dynamic programming acronym-expansion matching . is proposed and results in good performance on noisy, parsed, newspaper text. A machine learning algorithm for acronym sense disambiguation is presented, which is trained and evaluated automatically on information downloaded following search engine lookup. The algorithm achieves good performance on deciding whether an acronym occurs with a certain sense in a given context, and good accuracy when picking the correct sense for an acronym in a given context. All algorithms presented allow for efficient, readily usable implementations that can be included as components in larger natural language frameworks. Technologies developed have applicability beyond acronym acquisition and disambiguation, to aspects of the more general problems of anaphora resolution and word sense disambiguation, within information extraction or natural language understanding systems.
[Zhang, 2004]: Author(s): Yingjian Zhang.

Title: . Prediction of Financial Time Series with Hidden Markov Models.

M.Sc. Thesis, SFU_CS_School, may 2004.

Abstract: In this thesis, we develop an extension of the Hidden Markov Model (HMM) that addresses two of the most important challenges of financial time series modeling: non-stationary and non-linearity. Specifically, we extend the HMM to include a novel exponentially weighted Expectation-Maximization (EM) algorithm to handle these two challenges. We show that this extension allows the HMM algorithm to model not only sequence data but also dynamic financial time series. We show the update rules for the HMM parameters can be written in a form of exponential moving averages of the model variables so that we can take the advantage of existing technical analysis techniques. We further propose a double weighted EM algorithm that is able to adjust training sensitivity automatically. Convergence results for the proposed algorithms are proved using techniques from the EM Theorem. Experimental results show that our models consistently beat the S&P 500 Index over five 400-day testing periods from 1994 to 2002, including both bull and bear markets. Our models also consistently outperform the top 5 S&P 500 mutual funds in terms of the Sharpe Ratio.

2003

[Bian, 2003]: Author(s): Zhengbing Bian.

Title: . Wavelength Assignment Algorithms for WDM Optical Networks.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 2003.

Abstract: The explosive growth of the Internet and bandwidth-intensive applications such as video-on-demand and multimedia conferences require high-bandwidth networks. The current high-speed electronic networks cannot provide such capacity. Optical networks offer much higher bandwidth than traditional networks. When employed with the wavelength division multiplexing (WDM) technology, they can provide the huge bandwidth needed. The tree of rings is a popular topology which can often be found in WDM networks. In this thesis, we first study wavelength assignment (WA) algorithms for trees of rings. A tree of rings is a graph obtained by interconnecting rings in a tree structure such that any two rings intersect in at most one node and any two nodes are connected by exactly two edge-disjoint paths. The WA problem is that given a set of paths on a graph, assign wavelengths to the paths such that any two paths sharing a common edge are assigned different wavelengths and the number of wavelengths is minimized. The WA problem on trees of rings is known to be NP-hard. A trivial lower bound on the number of wavelengths is the maximum number L of paths on any link. In this thesis, we propose a greedy approximation algorithm which uses at most 3L wavelengths on a tree of rings with node degree at most 8. This improves the previous 4L upper bound. Our algorithm uses at most 4L wavelengths for a tree of rings with node degree greater than 8. We also show that 3L is the lower bound for some instances of the WA problem on trees of rings. In addition, we show that our algorithm achieves approximation ratios of 2 frac 116 and 2 frac 313 for trees of rings with node degrees at most 4 and 6, respectively. Optical switches, which keep the data stream transmitted in optical form from source to destination to eliminate the electro-optic conversion bottleneck at intermediate nodes, are key devices in realizing the huge bandwidth of optical networks. One of the common ways to build large optical switches is to use directional couplers (DCs). However, DCs suffer from an intrinsic crosstalk problem. In this thesis, we study the nonblocking properties of Benes networks and Banyan-type networks with extra stages under crosstalk constraints.
[Lapierre, 2003]: Author(s): Soleil Lapierre.

Title: . An Investigation of Fourier Domain Fluid Simulation.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 2003.

Abstract: Motivated by the reduced rendering cost of the Fourier Volume Rendering method, we construct a Navier-Stokes fluid flow simulation that operates entirely in the frequency domain. We show results from a practical implementation and compare with Jos Stam's spatial domain and FFT-based simulations. We break down the simulation pipeline into its major components and evaluate the cost of each component in the spatial domain and the frequency domain. We present an analytical as well as an experimental analysis, and we use our experimental results to identify the most efficient simulation pipelines for all grid sizes. We conclude that frequency domain implementation of the Navier-Stokes flow equations for Euler modeling schemes is prohibitively expensive in terms of compute time. We also conclude that of the solutions we evaluated, Stam's semi-Lagrangian simulation pipelines are the best choices for real-time applications such as video games, where perfect physical accuracy is not required.
[Law, 2003]: Author(s): Benjamin Law.

Title: . Eye Movements in a Virtual Laparoscopic Training Environment.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, 04 2003.

Abstract: Computer simulations are a promising alternative to traditional surgery training methods especially in minimally-invasive procedures such as laparoscopic surgery. In minimally-invasive surgery (MIS), eye-hand coordination is important because the surgical site is viewed on a monitor with limited depth cues and a restricted field of view. Thus, the emphasis has been on training tools that effectively prepare novices for the eye-hand co-ordination difficulties in MIS. How people interact with the system is of interest to those developing surgical training tools. This thesis compares the eye movements of novice and experienced subjects who performed a task in a virtual laparoscopic training environment. The eye movements in skilled performance were found to be different from those with less experience. In addition, the eye movements of novices after training became more similar to the experienced subjects.
[Lee, 2003]: Author(s): Christina Yi-Chien Lee.

Title: . Capturing Users' Perceptions Of Clickability.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 2003.

Abstract: There are many clickable items on computer screens. Each clickable item is a visual representation of a command indicating its functionality. The problem is that designers have their own conceptual models, which means visual representations of commands are usually very different from one designer to another. In addition, users do not always perceive visual cues the way that designers intended. In fact, two different users may interpret the same visual cue in a completely opposite way. A theory of how users know where to click would ease communication from designers to users. The theory of affordance, defined differently by Gibson and Norman, offers the possibility of such a theory. Gibson (1977) defined affordance as the possible actions available in the environment to an animal. Norman (1988) defined affordance as appearance suggesting possible uses of the object. While Gibson was referring to the physical environment, Norman was referring to the mental model. Whereas Gibson's affordance is independent of individuals, Norman's affordance may be dependent on an individual's experience. Designers who are aware of Norman's definition have applied it in their designs. However, this does not guarantee users perceive commands even though they are visible. On the other hand, users may perceive commands even though the design has not met any design guideline. This is because users have some expectations where and how commands should be represented. To resolve the gap between users and designers, we are looking for a theory of clickability, which includes but goes beyond the theory of affordance. We first observed users' behaviour performing specied tasks on real applications. These results were ambiguous. Therefore we developed simple abstract screens, apart from any real application. In those abstract screens, cues are tested separately. Based on the current data, intentions and context direct users' responses. In particular, command location is the most powerful factor of all. In conclusion, there are many factors, which are closely interrelated, involved in the design besides affordance. The theory of clickability must include all of them.
[Lewis, 2003]: Author(s): Benjamin C. Lewis.

Title: . Design Trade-offs for Inclined LEO Satellite Networks.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 2003.

Abstract: We study inclined low earth orbit (LEO) satellite networks and examine the effects that the various parameters in the design of these networks have on intersatellite communications. The performance of a network can be measured in different ways; we focus on satellite to satellite path lengths, link lengths, the time derivative of link lengths, and the angular velocity requirements for antennas used for links between neighbouring orbital planes. In particular, we show that for networks at a given altitude, the most important design parameter determining path lengths is related to the skew of the network. The other relevant factors are the inclination of the orbits, the number of orbits, the number of satellites per orbit; we determine the relative importance of these factors, and of the interactions between them. We also show that if the goal is to minimize the angular velocities of satellite antennas used for inter-satellite communications in these networks, the effect of any of these design parameters depends very strongly on the value of all the other parameters.
[Mitchell, 2003]: Author(s): G. Daryn Mitchell.

Title: . Orientation on Tabletop Displays.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, October 2003.

Abstract: Tabletop computer displays suffer from an orientation problem. Solutions based on various approaches have been implemented, but there remain unanswered questions about the ideal solution. Within a context of interacting with documents on tabletop displays, requirements for orientation control were found by examining literature on how people use paper documents and what manipulations they perform. It was determined that control must provide quick, either-handed, low-attention manipulation of individual objects on the display. An evaluation of existing interaction techniques for rotation was performed in light of these criteria. The trade-off between two desirable characteristics, integral translation and rotation, and direct input, was examined. An evaluation of mouse-based techniques confirmed that integral input has the potential to be faster than the established sequential-manipulation techniques due to the time saved by overlapping manipulation of different degrees of freedom. For a single task combining translation and rotation the separable technique took twice as long compared to two separate tasks of the same translation or rotation, whereas the integral techniques took less time for the combined action than the two separate actions. However, the integral mouse-based techniques were too slow in their actual manipulation, with the scroll wheel taking four times as long to rotate than the separable technique, and the new drag technique not being used in an integral manner. It was concluded that the current generally available technology such as the mouse and single-point touch-sensitive overlays are inadequate. Acceptable tabletop orientation control will be dependent on the maturation of newer technologies based on tangible interfaces or possibly multi-finger input. Until then, interaction techniques for manipulating documents on tabletop displays should be chosen according to their ability to concurrently control position and orientation, such as three degree-of-freedom input devices on digitizing tablets.
[Moise, 2003]: Author(s): Adrian Moise.

Title: . Designing better user interfaces for radiology workstations.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, August 2003.

Abstract: Since the 1980s, radiologists have started to interpret digital radiographs using modern computer systems, a process known as softcopy reading. For softcopy reading, Hanging Protocols are used to automatically arrange images for interpretation upon opening a case, thus minimizing the need for physicians to manipulate images. We have developed a strategy, called HP++, which extends current hanging protocols with support for 'scenario-based' interpretation, matching the radiologist's workflow and ensuring a chronological presentation of information. We hypothesized that HP++ significantly reduces off-image eye fixations, the interpretation time, and the frequency and complexity of user input. We validated our hypothesis with inexpensive usability studies based on an abstraction of the radiologist's task, transferred to novice subjects. For a radiology look-alike task, we compared the performance of 20 graduate students using our HP++ based interaction technique with their performance using a conventional interaction technique. We observed a 15% reduction in the average interpretation time using the staged approach, with one third fewer interpretation errors, two thirds fewer mouse clicks, and over 65% less eye gaze over the workstation controls. User satisfaction with the staged interface was significantly higher than with the traditional interface. Preliminary external validation of these results with physician subjects indicate our usability results transfer to radiology softcopy reading. We conclude that designing radiology workstations with support for HP++ can improve the performance of workstation users.
[Orchard, 2003]: Author(s): Jeffery J. Orchard.

Title: . Simultaneous Registration and Activation Detection: Overcoming Activation-Induced Registration Errors in Functional MRI.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, April 2003.

Abstract: In the processing of functional magnetic resonance imaging (fMRI) data, motion correction is typically performed before activation detection. However, on high-¯eld MR scanners (3 T and higher), the strength of the blood oxygen level dependent (BOLD) signal can cause registration algorithms to produce motion estimates that have stimulus-correlated errors. Motion compensation using these biased motion estimates can result in both false-positive and false-negative regions of activation. By formulating the registration and activation detection problems into a single least- squares problem, both the motion estimates and activation map can be solved for simulta- neously. However, the solution is not unique and an additional constraint is used to ¯nd a solution that is appropriate. This constrained optimization problem can be solved e±- ciently, and two equivalent methods are proposed and demonstrated on both simulated and in vivo datasets.
[She, 2003]: Author(s): Rong She.

Title: . Association-Rule-Based Prediction of Outer Membrane Proteins.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 2003.

Abstract: A class of medically important disease-causing bacteria (collectively known as Gramnegative bacteria) has been shown to have a rather distinct cell structure, in which there exists an extra ``outer'' membrane in addition to the ``inner'' membrane that presents in the cells of most other organisms. Proteins resident in this outer membrane (outer membrane proteins) are of primary research interest since such proteins are exposed on the surface of these bacterial cells and so are the prioritized targets to develop drugs against. Determination of the biological patterns that discriminate outer membrane proteins from non-outer membrane proteins could also provide insights into the biology of this important class of proteins. To date, it remains difficult to predict outer membrane proteins with high precision. Existing protein localization prediction algorithms either do not predict outer membrane proteins at all, or they simply concentrate on the overall accuracy or recall when identifying outer membrane proteins. However, as the study of a potential drug or vaccine takes great amount of time and effort in the laboratory, it is more appropriate that priority be given to getting a high precision on outer membrane protein prediction. In this thesis, we address the problem of protein localization classification with the performance measured mainly on precision of the outer membrane protein prediction. We apply the technique of association-rule based classification and propose several important optimization techniques in order to speed up the rule-mining process. In addition, we introduce the framework of building classifiers with multiple levels, which we call the refined classifier, in order to further improve the classification performance on top of the single-level classifier. Our experimental results show that our algorithms are efficient and produce high precision while maintaining the corresponding recall at a good level. Also, the idea of refined classification indeed improves the performance of the final classifier. Furthermore, our classification rules turn out to be very helpful for biologists to improve their understanding of functions and structures of the outer membrane proteins.
[Wang, 2003]: Author(s): Yong Wang.

Title: . Routing Algorithms for Ring Networks.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, July 2003.

Abstract: In this thesis, we study routing problems on ring networks. The ring is a popular topology for communication networks and has attracted much research attention. A communication application on a ring network can be regarded as a set of connection requests, each of which is represented by a set of nodes to be connected in the ring network. To implement a communication application, we need to develop a routing algorithm to find a path connecting all the nodes involved in each connection request. One of the most important optimization problems for the communication on ring networks is to develop a routing algorithm such that the maximum congestion (i.e., the maximum number of paths that use any single link in the ring) is minimized. This problem can be formulated as the Minimum Congestion Hypergraph Embedding in a Cycle (MCHEC) problem with a set of connection requests represented by a hypergraph. A special case of the MCHEC problem, in which each connection request involves exactly two nodes, is known as the Minimum Congestion Graph Embedding in a Cycle problem. A more general case, in which connection requests may have non-uniform bandwidth requirements, is known as the Minimum Congestion Weighted Hypergraph Embedding in a Cycle problem. The Minimum Congestion Graph Embedding in a Cycle problem is solvable in polynomial time, and the other problems are NP-hard. In this thesis, we focus on the MCHEC problem and propose efficient algorithms in three categories. In the first category is a 1.8-approximation algorithm that improves the previous 2-approximation algorithms. In the second category is an algorithm that computes optimal solutions for the MCHEC problem. This algorithm runs in polynomial time for subproblems with constant maximum congestions, and is more efficient in terms of the time complexity than the previous algorithm that solves the same problem. The third category contains two heuristic approaches. According to our simulation results, both heuristics have lower time complexities and better practical performance than a well known heuristic.
[Zhan, 2003]: Author(s): Meihua Zhan.

Title: . Reliable Multicast Extension To IEEE 802.11 In Ad-hoc Networks.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, 12 2003.

Abstract: The IEEE 802.11 standard uses channel reservation schemes and acknowledgements (ACKs) to provide reliable Medium Access Control (MAC) layer unicast services. In contrast, 802.11 does not guarantee the reliability of MAC layer broadcast and multicast transmissions. This lack of reliability extends to ad hoc routing protocols, such as DSR and AODV, which depend on broadcast packets to exchange routing information among nodes. In this thesis, we introduce an efficient and reliable MAC layer multicast/broadcast protocol called SAM (Sequential Acknowledgement Multicast) protocol. We use simulations to investigate the efficiency and reliability of SAM and to compare the performance of SAM to that of other broadcast/multicast protocols. The foundation of SAM is that the multicast receivers send back Clear to Send (CTS) and ACK frames in a predefined order to avoid collisions at the transmitter. During the retransmission phase of the protocol, the sender only needs to retransmit to nodes that failed to send back a n ACK. This method releases those nodes which have been unnecessarily forced to keep silent in a mult i-hop ad hoc network. SAM is efficient, reliable, and easy to implement. Most importantly, it is compatible with the IEEE 802.11 standard.
[Zhang, 2003]: Author(s): Xiang Zhang.

Title: . A Top-Down Approach for Mining Most Specific Frequent Patterns in Biological Sequence Data.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, September 2003.

Abstract: The emergence of automated high-throughput sequencing technologies has resulted in a huge increase of the amount of DNA and protein sequences available in public databases. A promising approach for mining such biological sequence data is mining frequent subsequences. One way to limit the number of patterns discovered is to determine only the most specific frequent subsequences which subsume a large number of more general patterns. In the biological domain, a wealth of knowledge on the relationships between the symbols of the underlying alphabets (in particular, amino-acids) of the sequences has been acquired, which can be represented in concept graphs. Using such concept graphs, much longer frequent patterns can be discovered which are more meaningful from a biological point of view. In this paper, we introduce the problem of mining most specific frequent patterns in biological data in the presence of concept graphs. While the well-known methods for frequent sequence mining typically follow the paradigm of bottom-up pattern generation, we present a novel top-down method (ToMMS) for mining such patterns. ToMMS (1) always generates more specific patterns before more general ones and (2) performs only minimal generalizations of infrequent candidate sequences. Due to these properties, the number of patterns generated and tested is minimized. Our experimental results demonstrate that ToMMS clearly outperforms state-of-the-art methods from the bioinformatics community as well as from the data mining community for reasonably low minimum support thresholds.

2002

[Afshar, 2002]: Author(s): Ramin Afshar.

Title: . Mining Frequent Max and Closed Sequential Patterns.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, May 2002.

Abstract: Although frequent sequential pattern mining has an important role in many data mining tasks, however, it often generates a large number of sequential patterns, which reduces its efficiency and effectiveness. For many applications mining all the frequent sequential patterns is not necessary, and mining frequent Max, or Closed sequential patterns will provide the same amount of information. Comparing to frequent sequential pattern mining, frequent Max, or Closed sequential pattern mining generates less number of patterns, and therefore improves the efficiency and effectiveness of these tasks. This thesis first gives a formal definition for frequent Max, and Closed sequential pattern mining problem, and then proposes two efficient programs MaxSequence, and ClosedSequence to solve these problems. Finally it compares the results, and performance of these programs with two brute force programs designed to solve the same problems.
[Chen, 2002a]: Author(s): Cecilia Chao Chen.

Title: . Illumination Invariant Image Enhancement.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 2002.

Abstract: The grayscale illumination invariant image is formed by projecting 2D log-chromaticity coordinates into a 1D direction orthogonal to lighting change. The lighting-change direction is determined by a calibration of the imaging camera. This invariant is very useful for many computer vision problems, such as removal of shadows for images, hence helps with image retrieval, object recognition, and so on. However, the invariant image quality, i.e., whether the shadow is completely attenuated and how smooth is the transition across the shadow boundary, is affected by many issues. Some of these are: natural lights only approximate theoretical Planckian lights; cameras with real broad-band sensors scatter the lighting-change direction; quantization process produces different intensity patterns on the two sides of shadow boundaries; and most images output from cameras are nonlinear images, so that the chromaticity values are not linearly related to the original color signals sensed by camera sensors. Previous research has shown that invariant images can be formed from real images with only approximately Planckian lights, and the spectral sharpening method can be used to transform broad-band camera sensors to narrower ones. However, little has been done on dealing with nonlinear images. In this thesis, I propose two optimization routines that establish a 3 times 3 matrix to apply to the camera sensors that can best improve invariant image quality. Observing that sharpened sensors can form better invariant images, I initialize optimizations with a spectral sharpening transform matrix. One optimizer minimizes the difference of invariance intensities formed from the same surfaces under different lightings, and another one directly minimizes the gradient difference of the lighting change directions formed from chromaticity data. Experiment results on both synthesized and real images show that the optimization routines converge and the resulting matrix does improve the invariant. To reduce the effect of quantization, a shadow detection and a smoothing process are carried out on the improved invariant images to diminish the difference between the two sides of shadow boundaries, thus further enhance the invariant images. For nonlinear images, I construct a linearization model and determine the parameters for a particular camera with function fitting. The fitted function is then used to convert nonlinear images taken with this camera to linearized images. Finally the above optimization routines are carried out on linearized images to form invariant images. Experiments show that invariant images derived from linearized images achieve better performance in shadow removal than those formed from nonlinear images directly.
[Chen, 2002b]: Author(s): Justin Jiong Chen.

Title: . A Design and Implementation of Index-Fabric Algorithm.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, February 2002.

Abstract: Index Fabric is an indexing algorithm specially designed for optimizing query processing on semistructured data such as XML. Compared with other latest development of XML indexing techniques, Index Fabric promises performance, flexibility and efficiency. One difficulty facing all XML indexing techniques for query optimization is the fact that XML data is both data and document; here document is treated as data with its internal data processing structure. Therefore XML indexing need not only index data but also the structure path that comes with the data. Index Fabric solves the problem smartly by converting and merging both the data and its structure path into an intermediate string format and indexing on it with the most efficient algorithm. The key of its performance and efficiency relies on a special data structure knows as Patricia trie, which is highly optimized for indexing long and complex key. The goal of this project is to design and implement the Index Fabric system. The idea is to follow the original Index Fabric description to build a system and also study its performance. There are a number of reasons for implementing Index Fabric. Firstly, as an XML indexing algorithm, Index Fabric optimizes XML querying speed significantly and it has flexibility required for optimizing XML query processing. Secondly, Index Fabrics is rather new and its implementation detail is not yet known, therefore these solutions can serve as a testing bed for future algorithm. Thirdly, we shall test the idea of speeding up the overall performance of Index Fabric by running a main memory version.
[Fung, 2002]: Author(s): Benjamin C. M. Fung.

Title: . Hierarchical Document Clustering Using Frequent Itemsets.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, September 2002.

Abstract: Most state-of-the art document clustering methods are modifications of traditional clustering algorithms that were originally designed for data tuples in relational or transactional database. However, they become impractical in real-world document clustering which requires special handling for high dimensionality, high volume, and ease of browsing. Furthermore, incorrect estimation of the number of clusters often yields poor clustering accuracy. In this thesis, we propose to use the notion of frequent itemsets, which comes from association rule mining, for document clustering. The intuition of our clustering criterion is that there exist some common words, called frequent itemsets, for each cluster. We use such words to cluster documents and a hierarchical topic tree is then constructed from the clusters. Since we are using frequent itemsets as a preliminary step, the dimension of each document is therefore, drastically reduced, which in turn increases efficiency and scalability.
[Gallinger, 2002]: Author(s): Ioulia Julie Gallinger.

Title: . Effect of gradient quantization on quality of volume-rendered images.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, October 2002.

Abstract: Improvements to available computation technology and memory storage have lead to increases in sizes of volume datasets. A dataset with 512 samples in each dimension requires 128 megabytes of storage at 8 bits per sample for the tissue density information. In the field of scientific visualisation, the data is rendered for display, and gradient, or surface normal, at each voxel is used to shade the surfaces, and may also be used to determine the tissue opacity. The gradient has three components, typically stored as floating point values using 96 bits. For a 256x256x145 dataset, 108.75 Mb are required for the gradient information. In order to visualise this dataset, the opacities (36.25 Mb total), gradients (108.75 Mb total), and colours (108.75 Mb total) must be computed. Lack of disk and memory storage and long rendering times become prohibitive to efficient use of such data, and data compression becomes essential. The aim of this project is an investigation of the effect of gradient compression on the quality of the rendered images, using several methods of gradient storage, extended to vary the compression rates. Images of several datasets are rendered using ray-casting. The resulting images are compared against a master image, generated with uncompressed gradients. Several image comparison metrics are used. Two of the datasets are synthetically created, and two come from CT and MRI data. Significant reduction in the number of bits per normal, as compared to the 96 bits, is found possible. Several methods and bit ranges are found promising. The choice of a bit range and method depends on the situation at hand. Considerations of available storage, available time and quality requirements are necessary. Whether a dataset is to be rendered repeatedly, or different datasets are to be displayed each time may also affect the final decision, as it is possible to realise greater reduction in necessary storage, while enjoying acceptable quality with some pre-processing for repeatedly-rendered datasets.
[Kroon, 2002]: Author(s): Frederick W. Kroon.

Title: . Linguistic Variation In Information Retrieval Using Query Reformulation.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, September 2002.

Abstract: As time progresses, the amount of written documentation available electronically is increasing dramatically. The field of information retrieval examines automated ways to wade through the reams of electronically available information automatically and efficiently. However, many attempts at providing such a tool have been stymied by various problems relating to the processing of natural languages. The focus of this paper is one such problem - the problem of linguistic variation. Linguistic variation is defined as the ways in which two pieces of text can differ at the surface level, while having the same (or similar) meaning. An overview of the problem of linguistic variation as it relates to the various subfields of information retrieval is presented. Various types of linguistic variation are examined, along with prior approaches to dealing with each. One possible approach to dealing with the problem of linguistic variation is to allow users to construct more complex rules that account for linguistic structure. While this approach may be suitable for those with linguistic training, it is not feasible for casual users. What is needed for such an approach to work for casual users is a system that will construct such rules from simple examples. The Variant Rule Generator (VRG) is a system designed to cope with linguistic variation by generating text retrieval rules that cover all variants from a single text string. VRG essentially defines ``phrase equivalence classes'' into which query phrases can fall. VRG extracts content words from the query phrase using extraction templates that are associated with each phrase equivalence class. Finally, a rule covering the whole equivalence class, instantiated with the content words, is generated.
[Kulpinski, 2002]: Author(s): Dejan Kulpinski.

Title: . LLE and Isomap Analysis of Spectra and Colour Images.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, March 2002.

Abstract: Locally Linear Embedding (LLE) [2] and Isomap [1] techniques can be used to process and analyze high- dimensional data domains, such as semantics, images, and colour. These techniques allow creation of low- dimensional embeddings of the original data that are much easier to visualize and work with then the initial, high-dimensional data. In particular, the dimensionality of such embeddings is similar to that obtained by classical techniques used for dimensionality reduction, such as Principal Components Analysis (PCA). The goal of this thesis is to show how the above methods can be applied to the area of colour vision, in particular to images, chromaticity histograms and spectra. Using the Isomap technique, it was found that the dimensionality of the spectral reflectances could be as low as 3. For the chromaticity histograms, the 251 original histogram dimensions were transformed into 5-6 dimensional space. The chromaticity histogram result is significantly better then the result obtained by PCA on the same data set. In addition to providing an estimate of the underlying dimensionality of the data, both the Isomap and LLE techniques were used to produce low-dimensional embeddings of the high-dimensional data for the purpose of data visualization. These low-dimensional embeddings were valuable in determining the non-linear relationships that existed among the members of the original data sets. For example, the relationship among colour histograms in the embedded space was based on both the chromaticity and image content while the embedded spectra showed groupings based on the RGB values. The possible role of the LLE and Isomap in image classification, spectrarecons truction and RGB-to- Spectra mapping is also examined.
[Ovans, 2002]: Author(s): Russell Ovans.

Title: . A Multiagent Solution to the Venue Equalization Problem.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, August 2002.

Abstract: Within audio engineering the problem of developing systems that automatically equalize themselves has been an active area of research. Traditional approaches have centered on the synthesis of a digital filter that inverts the impulse response of the room, but this technique does not scale to large acoustic spaces measured at multiple listening locations. In this thesis we view the problem as heuristic combinatorial optimization of a search space defined by the settings of conventional 1/3-octave graphic equalizers. The methodology used to develop this system is that of modeling the problem as a multiagent constraint optimization system; i.e., a society of autonomous rational agents that sense and control their environment while seeking to maximize utility. Each agent uses a microphone to sense the frequency response at a given location, and adjusts the equalizer settings in order to maximize its utility by finding a 'good' response curve. Because there are multiple agents each trying to adjust the same set of equalizers, a coordination mechanism is needed to ensure that the system converges to stable states that are fair (maximizes a social welfare function) and efficient (no utility is wasted). Previous multiagent systems research has considered several models of coordination, including economics, dynamical systems, social insects, and AI planning. This thesis proposes a family of novel mechanisms inspired by economic theories of taxation and voting devised for the allocation of public goods. A set of experiments conducted on random problem instances in a simulated acoustic environment compared the efficiency of these mechanisms. The one found to perform best at maximizing the sum of agent utility while minimizing the amount of equalization is a simple aggregate voting system whereby agents submit bids ^V comprised of changes in utility as a function of discrete alternatives to the status quo ^V to a centralized Auctioneer. Coupled with a heuristic equalizer control strategy that seeks to find the best match between the Auctioneer's social choice and the equalizers' preferences, this algorithm is shown to rapidly converge in O(n) iterations and room measurements, where n is the number of equalizers. Utilizing one of these public goods mechanisms, a prototype device was built that successfully equalized a sound system in a real-world setting, and resulted in dramatic improvements in listener quality.
[Pei, 2002]: Author(s): Jian Pei.

Title: . Pattern-growth Methods for Frequent Pattern Mining.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, June 2002.

Abstract: Mining frequent patterns from large databases plays an essential role in many data mining tasks and has broad applications. Most of the previously proposed methods adopt apriori-like candidate-generation-and-test approaches. However, those methods may encounter serious challenges when mining datasets with prolific patterns and/or long patterns. In this work, we develop a class of novel and efficient pattern-growth methods for mining various frequent patterns from large databases. Pattern-growth methods adopt a divide-and-conquer approach to decompose both the mining tasks and the databases. Then, they use a pattern fragment growth method to avoid the costly candidate-generation-and-test processing completely. Moreover, effective data structures are proposed to compress crucial information about frequent patterns and avoid expensive, repeated database scans. A comprehensive performance study shows that pattern-growth methods, FP-growth and H-mine, are efficient and scalable. They are faster than some recently reported new frequent pattern mining methods. Interestingly, pattern growth methods are not only efficient, but also effective. With pattern growth methods, many interesting patterns can also be mined efficiently, such as patterns with some tough non-anti-monotonic constraints and sequential patterns. These techniques have strong implications to many other data mining tasks.
[Shu, 2002]: Author(s): Keith Shu.

Title: . Digital Fashion: Wearing your heart on your sleeve.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, 11 2002.

Abstract: Clothes and fashion open a secondary i.e. non-verbal communication channel that allows individuals to make connections with each other. Our work proposes the concept of Digital Fashion which uses technology to connect people in close proximity by enhancing a visual secondary communication channel. Non-private profile information known as shared knowledge is communicated via an online poll system that activates poll questions periodically. Our Digital Fashion implementation utilizes a handheld computer connected to a wireless radio network to drive a public wearable display worn on the user's body. The wearable display system consists of flexible electroluminescent wire and a light controller responsive to wireless communications. Shared knowledge is stored as answers to poll questions. As poll questions become active, the wearable display changes colours to reflect the wearers' answer to the active poll question. Several user studies were conducted to evaluate this technology with data from field observations, questionnaires and interviews. The results from our user studies revealed that technology could play a useful role within social settings if designed appropriately. We found the choice of fashion as a secondary communication channel very appropriate because of its unobtrusiveness. Our system did not appear to hinder interactions but rather helped to create richer social interactions.
[Stark, 2002]: Author(s): Paul Stark.

Title: . Fourier Volume Rendering of Irregular Data Sets.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, jun 2002.

Abstract: Examining irregularly sampled data sets usually requires gridding that data set. However, examination of a data set at one particular resolution may not be adequate since either fine details will be lost, or coarse details will be obscured. In either case, the original data set has been lost. We present an algorithm to create a regularly sampled data set from an irregular one. This new data set is not only an approximation to the original, but allows the original points to be accurately recovered, while still remaining relatively small. This result is accompanied by an efficient `zooming' operation that allows the user to increase the resolution while gaining new details, all without re-gridding the data. The technique is presented in N-dimensions, but is particularly well suited to Fourier Volume Rendering, which is the fastest known method of direct volume rendering. Together, these techniques allow accurate and efficient, multi-resolution exploration of volume data.
[Su, 2002]: Author(s): Ming-Yen Thomas Su.

Title: . Item Selection by 'Hub-Authority' Profit Ranking.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, September 2002.

Abstract: A fundamental problem in business and other applications is ranking items with respect to some notion of profit based on historical transactions. The difficulty is that the profit of one item not only comes from its own sales, but also from its influence on the sales of other items, that is, the !'cross-selling effect!(. In this thesis, we draw an analogy between this influence and the mutual reinforcement of hub/authority web pages, and we present a novel approach to the item-ranking problem based on this analogy. The idea is ranking items by profit in a way similar to ranking web pages by authority, while taking into account the cross-selling effect. We also address several issues unique to the item ranking. This ranking approach can be applied to solve two selection problems. In the size-constrained selection, the maximum number of items that can be selected is fixed. In the cost-constrained selection, there is no maximum number of items to be selected, but there is some cost associated with the selection of each item. In both cases, the question is what items should be selected to maximize the profit. An empirical study shows that our approach finds profitable items in the presence of cross-selling effect.
[Swindells, 2002]: Author(s): Colin Edward Swindells.

Title: . Use That There! Pointing to Establish Device Identity.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, June 2002.

Abstract: Computing devices within current work and play environments are relatively static. As the number of networked devices grows, and as people and their devices become more dynamic, situations will commonly arise where users will wish to use that device there instead of navigating through traditional user interface widgets such as lists and trees. Our method of interacting with that device there is composed of two main parts: identification of a target device, and transfer of information to or from the target device. By decoupling these processes, we can explore the most effective way to support each part. This thesis describes our process for identifying devices through a pointing gesture using custom tags and a custom stylus called the gesturePen. Implementation details for this system are provided along with qualitative and quantitative results from a formal user study. The results of this work indicate that our gesturePen method is an effective method for device identification, and is well suited to dynamic computing environments that are envisioned to be commonplace in the near future.
[Walenstein, 2002]: Author(s): Andrew Walenstein.

Title: . Cognitive Support in Software Engineering Tools: A Distributed Cognition Framework.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, May 2002.

Abstract: Software development remains mentally challenging despite the continual advancement of training, techniques, and tools. Because completely automating software development is currently impossible, it makes sense to seriously consider how tools can improve the mental activities of developers apart from automating them away. Such mental assistance can be called ``cognitive support''. Understanding and developing cognitive support in software engineering tools is an important research issue but, unfortunately, at the moment our theoretical foundations for it are inadequately developed. Furthermore, much of the relevant research has occurred outside of the software engineering community, and is therefore not easily available to the researchers who typically develop software engineering tools. Tool evaluation, comparison, and development are consequently impaired. The present work introduces a theoretical framework intended to seed further systematic study of cognitive support in the field of software engineering tools. This theoretical framework, called RODS, imports ideas and methods from a field of cognitive science called ``distributed cognition''. The crucial concept in RODS is that cognitive support can be understood and explained in terms of the computational advantages that are conferred when cognition is redistributed between software developer and their tools and environment. The name RODS, in fact, comes from the four cognitive support principles the framework describes. With RODS in hand, it is possible to interpret good design in terms of how cognition is beneficially rearranged. To make such analyses fruitful, a cognitive modeling framework called HASTI is also proposed. The main purpose of HASTI is to provide an analysis of ways of modifying developer cognition using RODS. RODS and HASTI can be used to convert previously tacit design knowledge into explicit and reusable knowledge. RODS and HASTI are evaluated analytically by using them to reconstruct rationales for two exemplar reverse engineering tools. A preliminary field study was also conducted to determine their potential for being inexpensively applied in realistic tool development settings. These studies are used to draw implications for research in software engineering and, more broadly, for the design of computer tools in cognitive work domains.
[wen Hsiao, 2002]: Author(s): Janet Hui wen Hsiao.

Title: . Dealing with Semantic Anomalies in a Connectionist Network for Word Prediction.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 2002.

Abstract: Humans are able to recognize a grammatically correct but semantically anomalous sentence. On the task of predicting the range of possible next words in a sentence, given the current word as the input, however, many networks (e.g. Elman, 1990, 1993; Christiansen & Chater, 1994; Hadley et al, 2001) that have been proposed are capable of displaying a certain degree of systematicity, but fail in recognizing anomalous sentences. We believe that humans require both syntactic and semantic information to predict the category of the next word in a sentence. Based on an expansion of Hadley's model (Hadley et al, 2001), we present a competitive network, which employs two sub-networks that discern coarse-grained and fine-grained categories respectively, by being trained via different parameter settings. Hence, one of the sub- networks will have a greater capacity for recognizing the syntactic structure of the preceding words, while the other will have a greater capacity for recognizing the semantic structure of the preceding pattern of words. Also, we employ a mechanism to switch attention between the predictions from the two sub-networks, in order to make the global network more closely approximate human behavior. The results show that the network is capable of exhibiting strong systematicity, as defined by Hadley (Hadley, 1994a). In addition, it is able to predict in compliance with the semantic constraints implied in the training corpus, and deal with grammatically correct but semantically anomalous sentences. We can conclude that the network has provided a more realistic model for human behavior on the task of predicting the range of possible next words in a sentence.
[Yan, 2002]: Author(s): Edward Mingjun Yan.

Title: . Minimizing Bandwidth Requirement of Broadcasting Protocol in Video-on-Demand Services.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, apr 2002.

Abstract: In order to address the scalability problem of video- on-demand (VOD) systems, several periodic broadcast schemes have been proposed, which partition a video into segments and repetitively broadcast each segment on a separate channel. Most of the research literature focuses on minimizing the server broadcast bandwidth for a waiting time. The most efficient broadcasting schemes currently available require the same bandwidth on the client side as that on server side. In reality, however, the client side bandwidth requirement is often the limiting factor. We propose a new broadcast scheme, named Generalized Fibonacci Broadcasting (GFB), to address the issue of minimizing the client-side bandwidth requirement. GFB allows the client to download data from c (a positive integer that can be selected) concurrent broadcasting channels, each with a bandwidth of b/k (k > 0), where b (bits/sec) is the display rate. We demonstrate that, for realistic sets of parameters, GFB is the most efficient among the currently known broadcasting schemes with client bandwidth limitation. Furthermore, it gives a VOD service provider great flexibility and simplicity in implementing VOD services based on the current technologies.
[Yang, 2002]: Author(s): Yingchen Yang.

Title: . Web Table Mining And Database Discovery.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 2002.

Abstract: Table mining, as a sub-field of Information Extraction (IE), is concerned with recognizing and extracting information from tables, which are embedded either in plain texts or HTML texts. While early studies of table mining focused on plain text table mining, web table mining has lately received much attention. This is not only because the web is a popular medium for publishing information, but the table is also a document type in HTML in which a great deal of information is found. The focus of this thesis is to investigate how the knowledge of the structural aspects of HTML tables may improve the effectiveness of web table mining, in terms of both increased accuracy and more specificity in the extracted information, which is in a database form, i.e. attribute-value pairs. We have made a systematic study into the structures of both simple and complex tables, and proposed some generic web table mining heuristics and algorithms that can theoretically apply to any web site domain of discourse, and practically proved to be successful in fulfilling web table mining tasks across multiple domains. Based on the analyses of the structures of web tables, the studies in this thesis show that our web table mining heuristics and algorithms perform more effectively than other current approaches, as measured by IE evaluation metrics, precision and recall, in the whole web table mining process including web table recognition, structure interpretation and attribute-value pair extraction. We have also developed heuristics to discover OLAP database by mining into multi-dimensional tables, which usually possess hierarchical attribute structures, and therefore are much more complex than a relation as defined by a relational database.

2001

[Au, 2001]: Author(s): James Ka Sin Au.

Title: . Object Segmentation and Tracking Using Video Locales.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, July 2001.

Abstract: The ability to automatically locate and track objects from videos has always been very important in traditional applications such as surveillance, robotics, and object recognition. With the proliferation of digital videos and online multimedia data and the need of content-based multimedia encoding and retrieval, locating and tracking objects in digital videos become ever more important. This thesis presents a new technique based on feature localization for segmenting and tracking objects in videos. A video locale is a sequence of image feature locales that share similar features (color, texture, shape, and motion) in the spatio-temporal domain of videos. Image feature locales are grown from tiles (blocks of pixels) and can be non-disjoint and non-connected. Instead of using regions, the set of tiles belonging to the feature locale (called the envelope) is used to represent the locality of the feature. Intuitively, feature locales are significant feature blobs. To exploit the temporal redundancy in digital videos, two algorithms (intra-frame and inter-frame) are used to grow locales efficiently. Multiple motion tracking is achieved by tracking and performing tile-based dominant motion estimation for each locale separately (i.e., only member tiles are used); hence, the difficulty of multiple non-dominating motions is avoided and using tiles as the base unit makes the method more robust to pixel-level noise. Furthermore, video locales that move together through time may be grouped together to approximate multi-feature video objects. Being at a higher feature level than pixels and more robust than regions, video locales are more suited for content-based video processing. How video locales may be used in content-based multimedia encoding and retrieval such as outlined in MPEG-4 and MPEG-7 is discussed. Tests on natural videos have shown very good results.
[Blackstock, 2001]: Author(s): Michael A. Blackstock.

Title: . Markup for Transformation to Mobile Device Presentation Markups using XML.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, nov 2001.

Abstract: This thesis provides some background on mobile devices and wireless networks, web infrastructure for mobile application development using HTML, and newer markups such as Wireless Markup Language (WML) and Voice XML. It describes a prototype XML application developed during our research called Mobility Markup (MM) that can be used for developing simple web applications for HTML, WML and VoiceXML from a single source. This could save development time by reducing the need for authors to learn user interface and mobile device-specific markups. MM is compared and contrasted with other approaches for the development of mobile web applications. While other approaches provide partial solutions that address web forms, conversion from HTML to WML, or application- specific solutions, MM illustrates that a more general solution is possible. MM separates application objects, purpose and presentation into sections within the markup and provides an abstraction for the purpose of an HTML web page, a WML deck or VoiceXML dialog to accomplish its goals. To study the feasibility of MM a simple example is described in detail. This example shows that an author can create general purpose content and convert it to useable HTML, WML and VoiceXML for virtually any application using XSLT transformations. Two additional feasibility prototypes follow to provide a framework for a future implementation. In addition to its original purpose, MM could have a wider range of applications including the facilitation of content localization, the development of content for the disabled, and possibly support for collaboration between people with different devices.
[Cong, 2001]: Author(s): Shi Cong.

Title: . MINING THE TOP-K FREQUENT ITEMSET WITH MINIMUM LENGTH M.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, July 2001.

Abstract: With the explosive growth of data stored in electronic form, data mining has become essential in searching nontrivial, implicit, previously unknown and potentially useful information from a huge amount of data. Association rule mining in large transactional databases is an important topic in the field of data mining. It takes a set of transactions, which is a list of items, as input to find sets of objects tending to associate with each other. Having been investigated intensively during the past few years, it has been shown that the major computational task is to identify all of the frequent itemsets which satisfy a minimum support threshold, min-sup. Association rules can then be generated easily. In this work, I propose two interesting frequent itemset mining algorithms, DIPT and FIPT, which find the top-k frequent itemsets with the minimum length m in the transaction database. The novelty of these algorithms is that there is no need to specify a min-sup threshold by users. I compared these methods with other frequent itemset mining algorithms and showed that this approach presents greater advantages in terms of both flexibility and efficiency.
[Huang, 2001]: Author(s): Haiming Huang.

Title: . Lossless Semantic Compression for Relational Databases.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, nov 2001.

Abstract: With the widespread use of databases and data warehousing technologies, the amount of data stored in databases has increased tremendously. It becomes attractive to compress data in database systems. Relational data is quite different from text and multimedia data, because many semantic structures (e.g., data dependencies and correlations) exist within relational data. However, traditional compression methods, such as Lempel-Ziv, simply treat input data as a large byte string and operate at the byte level. Thus, they fail to exploit the semantic structures in the relation. Based on the observation that sets of some attribute values occur frequently in a relational table, we propose a semantic compression technique that exploits frequent dependency patterns embedded in the relational table. One advantage of this approach is that compression/decompression is performed at the tuple-level, which is desirable for integrating the compression technique into database systems. We show that it is hard to compute an optimal compression solution. Therefore, an iterative greedy compression framework is offered to solve this problem. This work primarily focuses on the underneath component of the compression framework, that is to efficiently find dependency patterns in relational data to optimize the compression ratio. The experimental results on several real-life datasets demonstrate the effectiveness of our approach, as well as the efficiency and scalability.
[Itskevitch, 2001]: Author(s): Julia Itskevitch.

Title: . Automatic Hierarchical E-mail Classification using Association Rules.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, July 2001.

Abstract: The explosive growth of on-line communication, in particular e-mail communication, makes it necessary to organize the information for faster and easier processing and searching. Storing e-mail messages into hierarchically organized folders, where each folder corresponds to a separate topic, has proven to be very useful. Previous approaches to this problem use Naive Bayes- or TF-IDF-style classifiers that are based on the unrealistic term independence assumption. These methods are also context-insensitive in that the meaning of words is independent of presence/absence of other words in the same message. It was shown that text classification methods that deviate from the independence assumption and capture context achieve higher accuracy. In this thesis, we address the problem of term dependence by building an associative classifier called Classification using Cohesion and Multiple Association Rules, or COMAR in short. The problem of context capturing is addressed by looking for phrases in message corpora. Both rules and phrases are generated using an efficient FP-growth-like approach. Since the amount of rules and phrases produced can be very large, we propose two new measures, rule cohesion and phrase cohesion, that possess the anti-monotone property which allows the push of rule and phrase pruning deeply into the process of their generation. This approach to pattern pruning proves to be much more efficient than ``generate-and-prune'' methods. Both unstructured text attributes and semi-structured non-text attributes, such as senders and recipients, are used for the classification. COMAR classification algorithm uses multiple rules to predict several highest probability topics for each message. Different feature selection and rule ranking methods are compared. Our studies show that the hierarchical associative classifier that utilizes phrases, multiple rules and deep rule pruning and uses biased confidence or rule cohesion for rule ranking achieves higher accuracy and is more efficient than other associative classifiers and is also more accurate than Naive Bayes.
[Lam, 2001]: Author(s): Joyce Man Wing Lam.

Title: . Multi-dimensional Constrained Gradient Mining.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 2001.

Abstract: Since its emergence, data cubes have been well-adopted by users as a tool to analyze data collected in multi-dimensional way. However, often users are overwhelmed by the massive amount of data presented in a data cube, especially it is not uncommon for the number of cells to reach the magnitude of thousand or even million. This poses non-trivial challenges on locating pairs of cells having significant difference in measures within a data cube. Thus, it is crucial and practical to investigate into the issue of efficient cell comparisons with complex measures based on user-specified constraints. This thesis introduces the concept of constrained gradients. It proposes two algorithms, All-Significant-Pairs and LiveSet-Driven, to mine constrained gradients in data cubes by extracting pairs of cells with significant difference in their meausres. The two algorithms approach the same problem from different angles. Experimental results show that LiveSet-Driven algorithm, which employs group-processing technique, is more efficient. To complete our study, we also investigate the problem of constrained gradient mining in transaction databases. Motivated by cross-selling, constrained frequent pattern pairs with significant difference in their measures are mined in transaction database. A Top-k FP-tree data structure is proposed to deal with complex measures in frequent pattern mining. Based on FP-growth algorithm, two algorithms are presented to find valid constrained frequent pattern pairs.
[Lee, 2001]: Author(s): Tim Kam Lee.

Title: . Measuring Border Irregularity and Shape of Cutaneous Melanocytic Lesions.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, January 2001.

Abstract: Cutaneous melanocytic lesions, commonly known as moles, are mostly benign; however, some of them are malignant melanomas, the most fatal form of skin cancer. Because the survival rate of melanoma is inversely proportional to the thickness of the tumor, early detection is vital to the treatment process. Many dermatologists have advocated the development of computer-aided diagnosis systems for early detection of melanoma. One of the important clinical features differentiating benign nevi from malignant melanomas is the lesion border irregularity. There are two types of border irregularity: texture and structure irregularities. Texture irregularities are the small variations along the border, while structure irregularities are the global indentations and protrusions that may suggest either the unstable growth in a lesion or regression of a melanoma. An accurate measurement of structure irregularities is essential to detect the malignancy of melanoma. This thesis extends the classic curvature scale-space filtering technique to locate all structure irregular segments along a melanocytic lesion border. An area-based index, called irregularity index, is then computed for each segment. From the individual irregularity index, two important new measures, the most significant irregularity index and the overall irregularity index, are derived. These two indices describe the degree of irregularity along the lesion border . A double-blind user study is performed to compare the new measures with twenty experienced dermatologists' evaluations. Forty melanocytic lesion images were selected and their borders were extracted automatically after dark thick hairs were removed by a preprocessor called DullRazor. The overall irregularity index and the most significant irregularity index were calculated together with three other common shape descriptors. All computed measures and the dermatologists' evaluations were analysed statistically. The results showed that the overall irregularity index was the best predictor for the clinical evaluation, and both the overall irregularity index and the most significant irregularity index outperformed the other shape descriptors. The new method has great potential for computer-aided diagnosis systems.
[Li, 2001]: Author(s): Tianyi Li.

Title: . Web-Document Prediction and Presending Using Association Rule Sequential Classifiers.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, July 2001.

Abstract: An important data source for data mining is the web-log data that traces the user's web browsing actions. From the web logs, one can build prediction models that predict with high accuracy the user's next requests based on past behavior. To do this with the traditional classification and association rule methods will cause a number of serious problems due to the extremely large data size and the rich domain knowledge that must be applied. Most web log data are sequential in nature and exhibit the ``most recent–most important'' behavior. To overcome this difficulty, we examine two important dimensions of building prediction models, namely the type of antecedents of rules and the criterion for selecting prediction rules. This thesis proposes a better overall method for prediction model representation and refinement. We show empirically on realistic web log data that the proposed model dramatically outperforms previous ones. How to apply the learned prediction model to the task of presending web documents is also demonstrated.
[Liao, 2001]: Author(s): Nancy Yaqin Liao.

Title: . Fault-Tolerant Repeat Pattern Mining on Biological Data.

M.Sc. Thesis, SFU_CS_School, August 2001.

Abstract: With the development of biotechnology, more and more biological data is collected and available for analysis. One example is the GenBank and Proteins data from NCBI (National Center for Biotechnology Information). There is huge amount of data available, including DNA sequences, RNA sequences and protein sequences of all different species. And not much is known about this data. How can one extract the most interesting and knowledgeable patterns from that data which may guide us to more discoveries is an interesting task. In this thesis, we study a quite important problem in molecular biology, tandem repeat finding problem. Furthermore, we refine the problem to find the complete set of tandem repeat patterns and analyze the problem, investigate the properties that the tandem repeat patterns have, and propose two algorithms to solve it. Interesting patterns we found by using our algorithms TCD and LSD are proposed. Our performance studies show that our LSD algorithm is efficient and scalable.
[Mao, 2001]: Author(s): Runying Mao.

Title: . Adaptive-FP: An Efficient and Effective Method For Multi-Level Multi-Dimensional Frequent Pattern Mining.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 2001.

Abstract: Real life transaction databases usually contain both item information and dimension information. Moreover, taxonomies about items likely exist. Knowlege about multi-level multi-dimensional frequent patterns is interesting and useful. The classic frequent pattern mining algorithms based on a uniform minimum support, such as Apriori and FP-growth, either miss interesting patterns of low support or suffer from the bottleneck of itemset generation. Other frequent pattern mining algorithms, such as Adaptive Apriori, though taking various supports, focus mining at a single abstraction level. Furthermore, as an Apriori-based algorithm, the efficiency of Adaptive Apriori suffers from the multiple database scans. In this thesis, we extend FP-growth to attack the problem of multi-level multi-dimensional frequent pattern mining. We call our algorithm Ada-FP, which stands for Adaptive FP-growth. The efficiency of our Ada-FP is guaranteed by the high scalability of FP-growth. To increase the effectiveness, our Ada-FP pushes various support constraints into the mining process. First, item taxonomy has been explored. Our Ada-FP can discover both inter-level frequent patterns and intra-level frequent patterns. Second, in our Ada-FP, dimension information has been taken into account. We show that our Ada-FP is more flexible at capturing desired knowledge than previous studies.
[Mortazavi-Asl, 2001]: Author(s): Behzad Mortazavi-Asl.

Title: . Discovering And Mining User Web-page Traversal Patterns.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 2001.

Abstract: As the popularity of WWW explodes, a massive amount of data is gathered by Web servers in the form of Web access logs. This is a rich source of information for understanding Web user surfing behavior. Web Usage Mining, also known as Web Log Mining, is an application of data mining algorithms to Web access logs to find trends and regularities in Web users' traversal patterns. The results of Web Usage Mining have been used in improving Web site design, business and marketing decision support, user profiling, and Web server system performance. In this thesis we study the application of assisted exploration of OLAP data cubes and scalable sequential pattern mining algorithms to Web log analysis. In multidimensional OLAP analysis, standard statistical measures are applied to assist the user at each step to explore the interesting parts of the cube. In addition, a scalable sequential pattern mining algorithm is developed to discover commonly traversed paths in large data sets. Our experimental and performance studies have demonstrated the effectiveness and efficiency of the algorithm in comparison to previously developed sequential pattern mining algorithms. In conclusion, some further research avenues in web usage mining are identified as well.
[Pinto, 2001]: Author(s): Helen Pinto.

Title: . Multi-dimensional Sequential Pattern Mining.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 2001.

Abstract: With our recently developed sequential pattern mining algorithms, such as PrefixSpan, it is possible to mine sequential user-access patterns from Web-logs. While this information is very useful when redesigning web-sites for easier perusal and fewer network traffic bottlenecks, it would be so much richer if we could incorporate multiple dimensions of information. For example, if you knew the referral site that users frequently come from, you might be able to determine what information on your own web-site is of interest to them — and enhance or separate this information as needed. Similarly, if you knew what weekday and time certain access patterns frequently occur at, you could ensure updated information is ready and available for these users. This thesis proposes and explores two different techniques, HYBRID and PSFP, to incorporate additional dimensions of information into the process of mining sequential patterns. It investigates the strengths and limitations of each approach. The HYBRID method first finds frequent dimension value combinations, and then mines sequential patterns from the set of sequences that satisfy each of these combinations. PSFP approaches the problem from the opposite direction. It mines the sequential patterns for the whole dataset only once (using PrefixSpan), and mines the corresponding frequent dimension patterns alongside each sequential pattern (using existing association algorithm FP-growth). Experiments show that HYBRID is most effective at low support in datasets that are sparse with respect to dimension value combinations but dense with respect to the sequential patterns present. PSFP is the better alternative in every other case, including datasets that are dense with respect to both dimension values combinations and sequential items at low support.
[Ranaweera, 2001]: Author(s): Prabha Ranaweera.

Title: . Quorum Based Total Order Group Communication System.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 2001.

Abstract: Applications designed for distributed systems are gaining popularity as they provide high availability and high performance at a relatively low cost. However designing efficient, fault tolerant and well coordinated, distributed applications is a difficult task. Group communication systems provide a powerful building block for efficient design and implementation of distributed applications. The main services provided by a group communication system are the reliable ordered delivery of messages among the participants of the distributed system, and the handling of membership changes. Reliable ordered delivery of messages provide the guarantee that messages that are exchanged among the participants of the system are delivered to all the participants in the correct agreed order. The membership services provide the fault tolerance capabilities. They handle membership changes that can occur in the system such as processor failures, recoveries and join-ins of new processors to the system. In this thesis we design and implement a new group communication system that is based on the concept of quorum systems and coteries. The concept of quorum systems and coteries is widely used in the design of distributed algorithms as a tool that provides mutual exclusion and distributed coordination. We put forward the design and implementation of the Quorum Based Total Order (QBTO) group communication system as the subject of this thesis. It is a new group communication system that provides reliable total order message delivery services and membership services for fault tolerance that can be employed to serve efficient development of distributed applications. In QBTO group communication protocol, totally ordered delivery of messages is achieved by the use of a globally unique sequence number assigned to each message. This globally unique sequence number is obtained with the use of a coterie imposed on the group. The properties of a coterie guarantee the uniqueness of the sequence number. As a result of the extensive and comprehensive research performed on this topic we conclude that it is feasible to design an efficient group communication protocol which provides reliable total order message delivery services and membership services for distributed system development using the concept of quorum systems and coteries.
[Tang, 2001]: Author(s): Liu Tang.

Title: . Top-Down FP-Growth for Association Rule Mining.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, dec 2001.

Abstract: Association rule mining has attracted lots of interests in data mining research. Most of previous studies adopt Apriori heuristic and use uniform minimum support threshold in pruning. In this paper, we propose a novel TD-FP-Growth (the shorthand for Top-Down FP-Growth) algorithm family to mine association rule more efficiently and effectively. TD-FP-Growth algorithm for frequent pattern mining beats both Apriori algorithm and newly suggested FP-growth algorithm. TD-FP-Growth algorithm for association rule mining explores association rule mining from two aspects: using multiple minimum supports for different classes and pushing a new constraint, acting constraint of confidence, into the mining process to reduce the search space and speed up the mining. Our performance shows that TD-FP-Growth algorithm family is the most efficient algorithm so far. The efficiency of TD-FP-Growth is achieved by: (1) it stores the compressed database in a highly compressed tree structure. (2) the mining of the rules involves only the FP-tree and a few header tables. (3) the introduction of a new anti-monotone constraint, acting constraint of confidence, and push it into the mining process.
[Wang, 2001]: Author(s): Zhaoxia Wang.

Title: . Collaborative Filtering Using Error-Tolerant Fascicles.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 2001. (PDF)

Abstract: With the rapid growth of the Internet, information overload is becoming increasingly acute. Recently, collaborative filtering has enjoyed considerable commercial success in addressing the information overload problem and thus it becomes an active research area. In this thesis, we propose a new approach for collaborative filtering, which makes use of the concept of error-tolerant fascicle. An error-tolerant fascicle is defined as a set of records that share similar values among the majority but not all of a set of attributes. In this thesis, we define the new concept of error-tolerant fascicle, investigate the properties that the error-tolerant fascicles have, and propose an algorithm for mining the complete set of frequent and strong error-tolerant fascicles in the data. Furthermore, we also propose the method for applying error-tolerant fascicles to collaborative filtering. Our performance studies show that our ET-Projection algorithm is efficient and scalable, and using error-tolerant fascicles is a novel and effective way for collaborative filtering.
[Zhang, 2001]: Author(s): Haining Zhang.

Title: . Improving Performance on WWW Using Path-based Predictive Caching and Prefetching.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, March 2001.

Abstract: Caching and prefetching are well known strategies for improving the performance of Internet systems. The heart of a caching system is its page replacement policy, which selects the pages to be replaced in a proxy cache when a request arrives. By the same token, the essence of a prefetching algorithm lies in its ability to accurately predict future requests. In this paper, we present a method for caching variable-sized web objects using an n-gram based prediction of future web requests. Our method aims at mining a Markov model from past web document access patterns and using it to extend the well-known GDSF caching policies. In addition, we present a new method to integrate this caching algorithm with our n-gram based prefetching algorithm using the mined Markov model. We empirically show that the system performance is greatly improved using the integrated approach.

2000

[Baker, 2000]: Author(s): Gregory G. Baker.

Title: . Caches in a Theoretical Model of Multicasting.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 2000.

Abstract: Multicasting is the process by which a single node, using a sequence of calls, sends a message to a set of nodes in a communication network. The message is passed from the source, through intermediate nodes to the destinations. These intermediate nodes do not remember the message once it is passed on. There is a possibility of transmission failure with each call. A failure forces retransmission from the source. Some intermediate nodes can be designated as caches. Once they have received the message, caches will remember it. Thus, these nodes can also be used to retransmit the message if there is a failure. If the retransmission can be done from a cache node that is closer than the source, the total number of calls necessary will be decreased. The expected number of calls necessary to complete a multicast is examined. Two methods of counting the number of calls are presented. Upper and lower bounds on the expected number of calls are given. Placement of a given number of caches is examined in order to determine the locations which minimize the expected traffic. Optimal placements are given for small graphs and a heuristic is given which can be used to place caches in larger graphs. It is shown that determining if the caches can be placed so that the expected traffic does not exceed a given threshold is NP-complete in directed acyclic graphs.
[Bart, 2000]: Author(s): Bradley Bart.

Title: . Representations of and Strategies for Static Information, Nonc ooperative Games with Imperfect Information.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 2000.

Abstract: In this thesis, we will examine a card game called MYST, a var iant of the Parker Brothers' classic board game Clue. In MYST, a set of cards is divided uniformly among a set of players, and the remaining cards form a hidde n pile. The goal of each player is to be the first to determine the contents of the hidden pile. On their turn, a player asks a question about the holdings of the other players, and, through a process of elimination, a player can determine the contents of the hidden pile. MYST is one of few static information games, wherein the position does not ch ange during the course of the game. To do well, players need to reason about their opponents' holdings over the course of multiple turns, and therefore a sound representation of knowledge is required. MYST is an interesting game for AI be cause it ties elements of knowledge representation to game theory and game strategy. After informally introducing the essential elements of the game, we will offer a formal specification of the game in terms of first-order logic and the situation calculus developed by Levesque et al.. Strategies will be discussed including: existence of a winning strategy, randomized strategies, and bluffing. Implementation of some strategies will be discussed.
[Chee, 2000]: Author(s): Sonny Han Seng Chee.

Title: . RecTree: A Linear Collaborative Filtering Algorithm.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, September 2000.

Abstract: With the ever-increasing amount of information available for our consumption, the problem of information overload is becoming increasingly acute. Automated techniques such as information retrieval (IR) and information filtering (IF), though useful, have proven to be inadequate. This is clearly evident to the casual user of Internet search engines (IR) and news clipping services (IF); a simple query and profile can result in the retrieval of hundreds of items or the delivery of dozens of news clippings into his mailbox. The user is still left to the tedious and time-consuming task of sorting through the mass of information and evaluating each item for its relevancy and quality. Collaborative filtering (CF) is a complimentary technique to IR/IF that alleviates this problem by automating the sharing of human judgements of relevancy and quality. Collaborative filtering has recently enjoyed considerable commercial success and is the subject of active research. However, previous works have dealt with improving the accuracy of the algorithms and have largely ignored the problem of scalability. This thesis introduces a new algorithm, RecTree that to the best of our knowledge is the first collaborative filtering algorithm that scales linearly with the size of the data set. RecTree is compared against the leading nearest-neighbour collaborative filter, CorrCF [RIS+94], and found to outperform CorrCF in execution time, accuracy and coverage. RecTree has good accuracy even when the item-rating density is low a region of difficulty for all previously published nearest-neighbour collaborative filters and commonly referred to as the sparsity problem. Our experimental and performance studies have demonstrated the effectiveness and efficiency of this new algorithm.
[Dakic, 2000]: Author(s): Tamara Dakic.

Title: . On the Turnpike Problem.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, 08 2000.

Abstract: The turnpike problem, also known as the partial digest problem, is: begin equation* begin minipage4in Given a multiset of n choose 2 - positive numbers Δ X, does there exist a set X such that Δ X is exactly the multiset of all positive pairwise differences of the elements of X. end minipage end equation* The complexity of the problem is not known. We write the turnpike problem as a 0-1 quadratic program. In order to solve a quadratic program, we relax it to a semidefinite program, which can be solved in polynomial time. We give three different formulations of the turnpike problem as a 0-1 quadratic program. For the first 0-1 quadratic program we introduce a sequence of semidefinite relaxations, similar to the sequence of semidefinite relaxations proposed by Lovåsz and Schrijver in their seminal paper ``Cones of matrices and set-functions and 0-1 optimization'' (SIAM Journal on Optimization 1, pp 166-190, 1990). Although a powerful tool, this method has not been used except in their original paper to develop a polynomial time algorithm for finding stable sets in perfect graphs. We give some theoretical results on these relaxations and show how they can be used to solve the turnpike problem in polynomial time for some classes of instances. These classes include the class of instances constructed by Zhang in his paper ``An exponential example for partial digest mapping algorithm'' (Tech Report, Computer Science Dept., Penn State University 1993) and the class of instances that have a unique solution and all the numbers in Δ X are different and on which Skiena, Smith and Lemke's backtracking procedure, from their paper ``Reconstructing sets from interpoint distances'' (Proc. Sixth ACM Symp. Computational Geometry, pp 332 - 339, 1990) backtracks only a constant number of steps. Previously it was not known how to solve the former in polynomial time. We use our theoretical formulations to develop a polynomial time heuristic to solve general instances of the problem. We perform extensive numerical testing of our methods. To date we do not have an instance of the turnpike problem for which our methods do not yield a solution. The second 0-1 quadratic program formulation of the turnpike problem will be too large for practical purposes. We use association schemes and some other methods, to reduce its size and obtain the third 0-1 quadratic program. We establish a connection between this relaxation and the first relaxation and show its limitations.
[Kuederle, 2000]: Author(s): Oliver Kuederle.

Title: . Presenting Image Sequences: A Detail-in-Context Approach.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 2000. (PDF)

Abstract: Due to the continuous drop in computer hardware prices, the use of high-end computer systems has become attractive to many hospitals. Radiology departments are now facing the transition from the use of traditional light screens and photographic films to online medical imaging systems. These new systems offer several advantages over traditional methods: films are less likely to be lost, automatic anomaly detection can improve diagnosis, and high-speed networks allow for cross-site consultations (telemedicine). However, the use of desktop monitors severely limits the space in which medical images can be viewed. This applies particularly to Magnetic Resonance Imaging (MRI) where, frequently, up to eight films, each containing up to 20 images, are viewed simultaneously on a light screen. As a result, the screen real-estate problem inherent in desktop monitors becomes a serious restriction for the radiologists. In a previous requirements analysis, researchers suggested displaying selected images in full resolution with the surrounding images remaining on the screen although reduced in size (called detail-in-context). Based on feedback obtained from this previous research, and from our additional observations of radiologists in their workplace, we have extended the algorithm of this detail-in-context technique. We have implemented the extended technique in a software system that allows users to navigate sequences of images such as those found in MRI and in other types of image sequences such as in Meteorology, Video Editing, and Animation. To evaluate our system, we conducted a user study with university students. The detail-in-context technique was compared to an implementation of the thumbnail technique which is utilized in many commercially available medical imaging systems. Results show that performance as well as user preference was similar for both display techniques. Our analysis of the computer logs recorded during the study, however, suggests that the detail-in-context technique can accomodate a variety of individual strategies and offers strong comparison capabilities. The thumbnail technique, on the other hand, strongly encourages a sequential examination of the images but allows for higher magnification factors. This research has implications on the selection of appropriate display techniques in many areas dealing with image sequences, including radiology.
[Lu, 2000]: Author(s): Xuebin Lu.

Title: . Fast Computation of Sparse Data Cubes and Its Applications.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 2000.

Abstract: Data cube queries compute aggregates over database relations at a variety of granularities, and they constitute an important class of decision support queries. For large and sparse data cubes, many applications only compute aggregates functions over one or a set of attributes to find aggregate values above some threshold. Those kinds of queries are called iceberg queries. Iceberg queries over all the combinations of the grouping attributes are called Iceberg-CUBEs. Iceberg-CUBEs have wide applications in data warehousing and data mining. An algorithm called BUC is proposed recently to compute Iceberg-CUBEs efficiently. On large and sparse data cubes, this algorithm outperforms all the other known CUBE computation algorithms by a big margin. However, the original algorithm and implementation suffer from two problems which prevent it from achieving optimal performance. In this thesis, we introduce two approaches to solve those two issues in the original BUC algorithm. Our study shows that the improved BUC algorithm reduces the memory requirement by as much as 50% while improving the performance by as much as 120%. To demonstrate the wide applications of the BUC algorithm, we present an approach to adapt the BUC algorithm to mine multi-dimensional association rules. Our performance study shows that on the large and sparse data cubes, the adapted BUC algorithm is significantly faster than its closest competitors.
[MacArthur, 2000]: Author(s): Jeffrey Douglas MacArthur.

Title: . Visualization of MIDI Encoded Music.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 2000.

Abstract: In our day to day lives we normally consider the human sensory facilities of sight and hearing as independent. Although the mechanisms involved in seeing and hearing can be very different, they do have aspects in common. When we hear music we can identify the individual instruments and when we see we recognize distinct objects. The objects we see vary in colour and the notes of each instrument we hear vary in tone. Previous research has visualized music using a variety of different approaches such as animation, digital audio filtering, and MIDI encoding. When using sound waves it is diffcult to extract all of the different sources of sound and their individual activities. However, this does not accurately convey many aspects of how people hear music nor draw upon the similarities between sight and sound. MIDI stores the events that would create the sound instead of the sound itself. This allows for easy access to pitch, velocity, instrument, and timing information involved in the playing of a piece of music. It is the goal of this thesis to investigate methods for mapping sound, in the form of music, to spatial symbols and positions. The implications of succeeding with this goal are important both artistically and educationally. By considering how we might interpret aural input using the sense of sight we can increase the potential for understanding the input itself and perhaps even aid the hearing impaired in experiencing music in another manner. We map each of the parameters contained in a MIDI event to the screen in a way that reflects some of the natural mappings between seeing and hearing. Through these mappings we translate musical data into spatial and textural information and assemble the results into a QuickTime(TM) movie. The result is a visualization synchronized with the music. We have conducted a user study as a preliminary investigation into our mapping theories. This study provided a group of test subjects with several different visualizations, in the form of completed QuickTime TM movies, of the same musical piece. The subjects then rated which mappings most directly conveyed the musical activity and which were most aesthetically pleasing. The results of this have given us useful and encouraging feedback on the effectiveness of our initial mappings in conveying channel, pitch, and velocity information as well as further directions to explore.
[O'Neill, 2000]: Author(s): Melissa Elizabeth O'Neill.

Title: . For Version Stamps for Functional Arrays and Determinacy Checking: Two Applications of Ordered Lists for Advanced Programming Languages.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, 2000.

Abstract: This thesis describes the fat-elements method for providing functional arrays and the LR-tags method for determinacy checking. Although these topics may seem very different, they are actually closely linked: Both methods provide reproducibility in advanced programming languages and share many implementation details, such as tagging data using version stamps taken from an ordered list. The fat-elements method provides arrays as a true functional analogue of imperative arrays with the properties that functional programmers expect from data structures. It avoids many of the drawbacks of previous approaches to the problem, which typically sacrifice usability for performance or vice versa. The fat-elements method efficiently supports array algorithms from the imperative world by providing constant-time operations for single-threaded array use. Fully persistent array accesses may also be performed in constant amortized time if the algorithm satisfies a simple requirement for uniformity of access. For algorithms that do not access the array uniformly or single-threadedly, array reads or updates take at most O(log n) amortized time for an array of size n. The method is also space efficient – creating a new array version by updating a single array element requires constant amortized space. The LR-tags method is a technique for detecting indeterminacy in asynchronous parallel programs – such as those using nested parallelism on shared-memory MIMD machines – by checking Bernstein's conditions, which prevent determinacy races by avoiding write/write and read/write contention between parallel tasks. Enforcing such conditions at compile time is difficult for general parallel programs. Previous attempts to enforce the conditions at runtime have had non–constant-factor overheads, sometimes coupled with serial-execution requirements. The LR-tags method can check Bernstein's (or Steele's generalized) conditions at runtime while avoiding some of the drawbacks present in previous approaches to this problem. The method has constant-factor space and time overheads when run on a uniprocessor machine and runs in parallel on multiprocessor machines, whereas the best previous solution, CILK's Nondeterminator (a constant-space and nearly constant-time approach), requires serial execution. Both parts of the thesis include theoretical and practical material. Tests from implementations of both techniques indicate that the methods should be quite usable in practice.
[Rochefort, 2000]: Author(s): Stephen Rochefort.

Title: . Logic Programming Applications in Educational Environments.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, August 2000.

Abstract: A dramatic increase in the number of people wishing to obtain higher education via distance education has resulted in the need for advancements in asynchronous learner support tools. Students that use distance delivery systems typically work asynchronously, meaning that they have little or no interaction with the instructional team and tend to progress through the course material at their own pace. During the learning process students face many challenges which in a typical classroom they could approach a fellow student or the instructor to resolve. In distance delivery systems, there must be support mechanisms that assist the student through the challenging areas. This thesis presents The Logic Programming in Education for Asynchronous Learning Environment (LPed ALE) which is an integration of logic programming technologies, providing reasoning capabilities, into a Java-based delivery system. Through the use of multi-agent technologies, software agents have been implemented to provide automated, asynchronous learner-based tools. The tools that are provided include a natural language interface, communication mechanisms, question-answer interactions, resource indexing and searching, tutor assistance and quizzing mechanisms. The research for this thesis has resulted in three significant contributions. The first is the development of asynchronous learning support in distance education delivery systems, via the construction of LPed ALE. A second contribution has been the realization of a multi-agent system architecture for educational environments. This architecture examines the approach needed for integrating agent technologies within a standard application development. This has also led to the third contribution which is the creation of a methodology for developing multi-agent systems for educational environments. This methodology identifies modifications to the traditional software development approaches required to incorporate software agent specification, design, implementation and testing.
[Scurtescu, 2000]: Author(s): Marius Scurtescu.

Title: . Java Program Representation and Manipulation in Prolog.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, November 2000.

Abstract: Traditionally, program source code is stored as plain text. In most cases one would use a simple text editor to write programs and then would run them through a compiler to generate machine code. The compiler parses the text file based on the grammar of the programming language and then, using the parse tree, generates the machine code based on the semantics of the language. If the compiler encounters syntactic or semantic errors then those errors will be reported, the original source file has to be edited and fixed, and compilation started again. In order to catch most of the syntactic errors while editing, a new class of editors was created. These editors will support you with features specific to the grammar of the language you are writing in. Examples of such support are syntax highlighting, identifier cross-reference, source code navigation, code pretty-printing etc. To be able to do this, the editor must have a parser for the grammar itself and knowledge about the language semantics. This leads to a situation where the source code, even though is stored as plain text, is continuously parsed by the editor. So you have plain text on the outside and parse tree in the inside. This project explores an alternative way of storing source code in order to facilitate better editor support and faster compilation. Java2Prolog is a Java language parser, written in Java, that generates a representation of a Java program as Prolog predicates. These Prolog predicates can be queried later using Prolog queries in order to detect possible semantic errors or restructure/edit the program.
[Shoemaker, 2000]: Author(s): Garth B. D. Shoemaker.

Title: . Single Display Privacyware: Augmenting Public Displays with Private Information.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, November 2000.

Abstract: The traditional human-computer interaction model limits collaboration by restricting physical computer access to one user at a time. Single Display Groupware (SDG) research confronts the standard interaction model by examining how to best support groups of users interacting with a shared display. One problem that arises with the use of SDG systems is that they often do not adequately support the display of private information. Having the ability to keep information private can be useful for addressing issues such as the ``screen real-estate'' problem, and problems associated with awareness overload. With normal SDG systems, any information that is to be kept private by a user cannot be displayed on the shared display, as that display is visible to all users. Some researchers have addressed the privacy issue by developing techniques that allow small private displays to be networked with a large shared display. This technique is useful for many applications, but has limitations. For example, private information cannot be shown within the context of related public information, and users are required to constantly shift attention from the private display to the shared display. This dissertation introduces Single Display Privacyware (SDP), a new interaction model that extends the SDG model to include the display of private information on a shared display. Not only is public information shown on the shared display, but private information is also shown on the display, and is kept private by ltering the output of the display. This interaction model can be used to address the screen real-estate problem and the awareness overload problem, and also, unlike other solutions, allows private information to be shown within the public context of the shared display. A description of a prototype implementation of an SDP system is given, along with results of a user study performed to investigate users interacting with the system. The signicance of SDP and conclusions regarding future research in the area are discussed.
[Sun, 2000]: Author(s): Yinlong Sun.

Title: . A SPECTRUM-BASED FRAMEWORK FOR REALISTIC IMAGE SYNTHESIS.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, July 2000.

Abstract: Realistic image synthesis provides principles and techniques for creating realistic imagery based on models of real-world objects and behaviors. It has widespread applications in 3D design, computer animation, and scientific visualization. While it is common to describe light and objects in terms of colors, this approach is not sufficiently accurate and cannot render many spectral phenomena such as interference and diffraction. Many researchers have explored spectral rendering and proposed several methods for spectral representation, but none satisfy all representation criteria such as accuracy, compactness and efficiency. Furthermore, previous studies have focused on distinct behaviors of natural phenomena but few on their commonality and generality, and it is difficult to combine existing algorithms to simulate complex processes. This thesis proposes solutions to these problems within a spectrum-based rendering framework. The pipeline begins by loading spectra from a database to specify light sources and objects, then generates a spectral image based on loca l and global illumination models, projects the spectral image into a CIE image, an d finally converts the CIE image into an RGB image for display or a CIELab image for evaluation. In spite of omitting the light phase information, it is shown that this approach suffices to generate all optical effects important for realistic image synthesis. As components of the new framework, a heuristic metho d is proposed for deriving spectra from colors and an error metric is provided for evaluating synthesized images. The new spectral representation proposed in this thesis is called the composite model. Its key point is to decompose any spectrum into a smooth background and a collection of spikes. The smooth part can be represented by Fourier coefficients and a spike by its location and height. A re-sampling technique is proposed to improve performance. Based upon the characteristics of human perception, the sufficiency of a low-dimensional representation is shown analytically. This mode l improves upon existing methods with aspect to accuracy, compactness and efficiency, and offers an effective vehicle for rendering optical effects involving spiky spectra. Using the proposed framework and composite spectral model, this thesis develops new illumination models for rendering a number of optical effects including dispersion, interference, diffraction, fluorescence, and volume absorption. The rendered images closely correspond to their real-world counterparts. Overall, this thesis improves realistic image synthesis by expanding its rendering capabilities. It may serve as a basis for a more sophisticated rendering environment for high-quality computer image generation.
[Tatu, 2000]: Author(s): Serban G. Tatu.

Title: . BIBP: A Bibliographic Protocol and System for Distributed Reference Linking to Document Metaservices on the Web.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, July 2000.

Abstract: Bibliographic Protocol (BIBP) is a proposed method for the creation and resolution of abstract reference links on the World-Wide Web. In this context, an abstract reference link is a bibliographic citation denoting the referenced document as an abstract entity, independent of any particular manifestation or service with respect to it. Resolution of such a link logically provides access to a document metaservice, which in turn may provide access to a selection of alternative sources, formats or further services with respect to the document. A distributed system based on BIBP is proposed, which links documents identified by Universal Serial Item Names (USINs) to document metaservers – Web services providing metadata about the documents. USINs are abstract identifiers that can be derived from standard bibliographic information; they can be used in conjunction with the BIBP scheme to form Uniform Resource Identifiers, which in turn can be embedded as hyperlinks in HTML documents. Client software resolves USINs to metadata retrieved from document metaservers chosen by the user. The document metaservers form a network that builds upon a large database of bibliographic information which is acquired in a piecemeal fashion by each metaserver. A prototype BIBP system has been implemented to illustrate the concepts presented in the thesis and to serve as a tool for the evaluation of technical issues.
[Zhang, 2000]: Author(s): Zhen Zhang.

Title: . An integrated prefetching and caching algorithm for web proxies using a correlation-based prediction model.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 2000.

Abstract: Reducing Web latency is one of the primary concerns of the Internet research. Web caching and Web prefetching are two effective techniques to achieve this. However, most of previous researches address only one of these two techniques. We propose a novel model to integrate Web caching and Web prefetching. In this model, prefetching aggressiveness, replacement policy and increased network traffic caused by prefetching are all taken into account instead of being modeled separately. The core of our integrated solution is an effective prediction model based on statistical correlation between Web documents. By utilizing the prediction power of the model, we develop an integrated prefetching and caching algorithm, Pre-GDSF. We conduct simulations to examine the effectiveness of our algorithm by using realistic Web logs. We show the tradeoff between the latency reduction and the increased network traffic achieved by Pre-GDSF. We also show why prefetching is more effective for smaller caches than larger ones.

1999

[Barnard, 1999]: Author(s): Kobus Barnard.

Title: . Practical Colour Constancy.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, December 1999. the thesis does not require the colour plates, though clearly they are a part of it and are provided for download if desired.

Abstract: The image recorded by a camera depends on three factors: The physical content of the scene, the illumination incident on the scene, and the characteristics of the camera. This leads to a problem for many applications where the main interest is in the physical content of the scene. Consider, for example, a computer vision application which identifies objects by colour. If the colours of the objects in a database are specified for tungsten illumination (reddish), then object recognition can fail when the system is used under the very blue illumination of a clear sky. This is because the change in the illumination affects object colours far beyond the tolerance required for reasonable object recognition. Thus the illumination must be controlled, determined, or otherwise taken into account. The ability of a vision system to diminish, or in the ideal case, remove, the effect of the illumination, and therefore 'see' the physical scene more precisely, is called colour constancy. There is ample evidence that the human vision system exhibits some degree of colour constancy. Interest in human vision, as well as robotics and image reproduction applications, has led to much research into computational methods to achieve colour constancy. Much progress has been made, but most work has addressed the problem in the context of synthetic data and quite simple physical conditions. However, in order to serve the needs of the proposed applications, it is necessary to develop and test computational colour constancy algorithms for real image data. This practical development of computational colour constancy is the focus of this work. In order to study and to use computational models for real image data, it is necessary to develop a model of the physical characteristics of the vision system of interest. Specifically, we want to predict the camera response based on spectral input, and as part of this work I propose a new method for doing this. In addition, the spatial variation of the optical system, and its noise characteristics are considered. In summary, the many pragmatic difficulties encountered when computational colour constancy meets the real world demand an approach that embraces modeling the physical nature of the world, analysis of camera characteristics, and the use of real images for comprehensive testing and development of algorithms. These requirements for successfully applying colour constancy to the real world have driven the present work.
[Carpendale, 1999]: Author(s): Marianne Sheelagh Therese Carpendale.

Title: . A Framework for Elastic Presentation Space.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, April 1999. The large size of this thesis can cause problems when printing. The author has made the thesis available as a a set of smallerand more easily printable postcript files (one per chapter.

Abstract: Since the advent of video display terminals as the primary interface to the computer, how to make the best use of the available screen space has been a fundamental issue in user interface design. The necessity for effective solutions to this problem is intensifying as the ability to produce visual data in greater volumes continues to outstrip the rate at which display technology is developing. Most research in this area has concentrated on specific applications, exploiting their underlying information structure to obtain reasonable displays. In this work we take a different approach, examining the display problem independent of the application. In particular, we divide the display problem into two components: representation and presentation. Representation is the act of creating a basic image that corresponds to the information such as creating a drawing of a graph. Presentation is the act of displaying this image, emphasizing and organizing areas of interest. For example, a map of Vancouver may be presented with one's route to work magnified to reveal street names. Since representation is inherently dependent on the information, this part of the problem is not considered. Instead we concentrate on presentation and assume the existence of a two-dimensional representation. Our research into the presentation problem has led to the development of a framework that describes a presentation space in which the adjustments and reorganizations are elastic in the sense that reverting to previous presentations is facilitated. Within this framework the approach is to map the representation onto a surface in three dimensions and use perspective projection to create the final display. Varying the surface provides control for magnification and organization of representation details. Use of the third dimension provides the possibility of making these presentation adjustments comprehensible. Our framework encompasses previous techniques and indicates a broad range of new ones. Existing presentation methods create displays that vary considerably visually and algorithmically. Our framework provides a way of relating seemingly distinct methods, facilitating the inclusion of more than one presentation method in a single interface. Furthermore, it supports extrapolation between the presentation methods it describes. Of particular interest are the presentation possibilities that exist in the ranges between distortion presentations and magnified insets, distortion presentations and a full-zooming environment, and distortion presentations and those that support the repositioning of separate views. Our elastic presentation space framework describes existing presentation methods, identifies new presentation variations, and provides methods for combining them. This removes some of the current difficulty around making a presentation choice, and allows a designer of new information visualizations to include a combination of presentation methods that best suit the needs of their application's information and tasks.
[Chen, 1999]: Author(s): Qing Chen.

Title: . Mining Exceptions and Quantitative Association Rules in OLAP Data Cube.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, July 1999.

Abstract: People nowadays are relying more and more on OLAP data to find business solutions. A typical OLAP data cube usually contains four to eight dimensions, with two to six hierarchical levels and tens to hundreds of categories for each dimension. It is often too large and has too many levels for users to browse it effectively. In this thesis we propose a system prototype which will guide users to efficiently explore exceptions in data cubes. It automatically computes the degree of exceptions for cube cells at different aggregation levels. When user browses the cube, exceptional cells as well as interesting drilling-down paths that will lead to lower level exceptions are highlighted according to their interestingness. Different statistical methods such as log-linear model, adapted linear model and Z-tests are used to compute the degree of exceptions. We present algorithms and address the issue of improving the performance on large data sets. Our study on exceptions leads to mining quantitative association rules. Traditionally, the events in association rules consist of only categorical items. If the rule involves a numerical attribute, the continuous values of that attribute are generated into numerical intervals that can be treated the same way as categorical values. This approach, however, is inaccurate and insufficient. We adopt a new definition of quantitative association rules based on statistical inference theory. The rules associate the characteristics defining a population subset with some interesting statistical behaviors of that subset. We present an algorithm in mining such rules based on data cubes. The thesis also introduces an efficient method for implementing boxplots in the DBMiner System. By the help of the information stored in the cube, we can find the median and quartiles of the back-end records without having to go through all the data, which greatly reduces the time needed to construct boxplots. Algorithms are given. Circumstances that may influence the performance and possible solutions are discussed.
[Demers, 1999]: Author(s): Nicolas P. Demers.

Title: . A Lexicalist Approach to Natural-language Database Front-Ends.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 1999.

Abstract: Natural language interfaces (NLI's) to databases allow end-users to access information in databases by typing in commands and requests in a natural language. These commands and requests are then translated into some formal database language, most commonly SQL. NLI's suffer from the same linguistic and computational problems as other natural language processing systems, though the severity of someproblems is reduced due to the restricted language used and the restricted domain of discourse. Another important issue for database interfaces concerns porting them to new domains or to other natural languages. This thesis introduces a lexicalist approach to database NLI's based on unification grammars, along with a small-scale example. A sample bilingual lexicon is proposed, to demonstrate the feasibility of this approach. The solution proposed shows that a lexicalist approach is not only feasible but, for unambiguous words and expressions, can result in reasonable complexity and processing time. In the case of ambiguity the solution is less satisfactory: there are inevitable tradeoffs between correctness and low processing time on the one hand, low complexity and ease of lexicon creation on the other. I will also propose an algorithm to semi-automatically generate a bilingual lexicon for unambiguous expressions. The solution is based on the concept of a bilingual text; in this case we will be looking at sets of English/SQL query pairs. These pairs are meant to establish syntactic/semantic relationships between English expressions and SQL expressions, providing the necessary building blocks for more complex queries
[Edgar, 1999]: Author(s): John Edgar.

Title: . A System for Qualitative Decision Making Under Risk.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1999. (PostScript)

Abstract: Decision theory provides rules for determining an action that results in the optimal result for a decision. For decisions under risk an important strategy is to maximize expected value. This expected value may be monetary or a value determined by a utility function. The strategy of maximizing expected value may result in selecting, or avoiding, actions where the relative improbability of an outcome is outweighed by the magnitude of its desirability. The majority of qualitative approaches to decision selection have used ordinal scales to value the desirability and likelihood of outcomes which precludes selecting the appropriate action in the situation mentioned above. This thesis presents an approach that uses qualitative representations of probability and desirability yet allows for comparisons of expected values on an interval scale. This is achieved by using a ``qualitative number'' which is an analogue of conventional scientific notation. A qualitative number differs from a real number represented in scientific notation in that the set of possible mantissae is limited and the base of the exponent is undefined. This results in an order of magnitude representation for values. Rules are given for manipulating qualitative numbers that allows qualitative representations of probability and desirability to be combined to produce expected values for decision problems. These expected values can be used to determine the optimal action for decisions where the relative improbability of an outcome is outweighed by the magnitude of its desirability. While this approach is limited because of the nature of qualitative numbers, in that decision problems may often result in ties, it does not appear to produce inappropriate results. An implementation of this approach is presented which allows a decision problem to be formulated and a result determined.
[Grün, 1999]: Author(s): Gabrielle Assunta Grün.

Title: . The Maximum Distance Problem.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, June 1999.

Abstract: Efficient algorithms for temporal reasoning are essential in knowledge-based systems. This is central in many areas of Artificial Intelligence including scheduling, planning, plan recognition, and natural langu age understanding. As such, scalability is a crucial consideration in temporal reasoning. While reasoning in the interval algebra i s NP-complete, reasoning in the less expressive point algebra is tractable. In this thesis, we explore an extension to the work of Gerevini and Schubert which is based on the point algebra. In their seminal framework, temporal relations are expressed as a directed acyclic graph partitioned into chains and supported by a metagraph data structure, where time points or events are represented by vertices, and directed edges are labelled with < or leq . They are interested in fast algorithms for determining the strongest rel ation between two events. They begin by developing fast algorithms for the case where all points lie on a chain. In this thesis we are i nterested in a generalization of this, namely we consider the problem of finding the maximum ``distance'' between two vertices in a chain; this problem arises in real world applications such as course planning. We describe an O(n) time preprocessing alg orithm for the maximum distance problem on chains. It allows queries for the maximum number of < edges between two vertices to be answered in O(1) time. This matches the performance of the algorithm of Gere vini and Schubert for determining the strongest relation holding between two ve rtices in a chain.
[Hepting, 1999]: Author(s): Daryl H. Hepting.

Title: . A New Paradigm for Exploration in Computer-Aided Visualization.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, December 1999.

Abstract: This dissertation examines how the computer can aid the creative human endeavour which is data visualization. That computers now critically aid many fields is apparent, as is evidenced by the breadth of contemporary research on this topic. Indeed, computers have contributed widely to the whole area of data comprehension, both in performing extensive computations and in producing visual representations of the results. Computers originally aided mathematicians who could both write the instructions necessary to direct the computer and interpret the resulting numbers. Even though modern computers include advanced graphical capabilities, many issues of access still remain: the users of data visualization software systems may not be experts in any computer-related field, yet they want to see visual representations of their data which allow them insight into their problems. For example, today's mathematicians who are generally expert in exploiting computational opportunities for experimentation may lack similar experience in opportunities for visual exploration. Of particular concern is how a computer-aided visualization tool can be designed to support the user's goal of obtaining insight. There are many visual representations for a given set of data, and di erent people may obtain insight from different visual representations. Selection of the ``best'' one for an individual can be exceedingly difficult, as the sheer number of possible representations may be staggering. Current software designs either recognize the possibility of overwhelming the individual and therefore employ some means of restricting the choices that the user is allowed to make, or the designs focus on providing only the raw materials necessary for constructing the representations, leaving the user unrestricted but potentially unaided in searching out the desired representation. The novel approach presented in this dissertation adapts a genetic algorithm to provide a means for an individual to search alternative visual representations in a systematic and manageable way. Any visual representation is a combination of elements, each selected from a different component. This approach encourages the individual's creativity without restricting available choices, and leaves the task of bookkeeping to the computer. A computer-aided visualization system which is driven by the unique preferences of each user has been developed. The efficacy of this system, cogito, is demonstrated through a software user study. From an initial specification of components and elements, the system provides a wide variety of visual representations. From within this range of available visual representations, the user pursues the goal of achieving insight by applying personal criteria for effectiveness to the iterative selection and evaluation of candidate representations.
[Koperski, 1999]: Author(s): Krzysztof Koperski.

Title: . A Progressive Refinement Approach to Spatial Data Mining.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, April 1999.

Abstract: Spatial data mining, i.e., mining knowledge from large amounts of spatial data, is a demanding field since huge amounts of spatial data have been collected in various applications, ranging from remote sensing to geographical information systems (GIS), computer cartography, environmental assessment and planning. The collected data far exceed people's ability to analyze it. Thus, new and efficient methods are needed to discover knowledge from large spatial databases. The goal of this thesis is to analyze methods for mining of spatial data, and to determine environments in which efficient spatial data mining methods can be implemented. In the spatial data mining process, we use (1) non-spatial properties of the spatial objects and (2) attributes, predicates and functions describing spatial relations between described objects and other features located in the spatial proximity of the described objects. The descriptions are generalized, transformed into predicates, and the discovered knowledge is presented using multiple levels of concepts. We introduce the concept of spatial association rules and present efficient algorithms for mining spatial associations and for the classification of objects stored in geographic information databases. A spatial association rule describes the implication of one or a set of features (or predicates) by another set of features in spatial databases. For example, the rule ``80% of the large cities in Canada are close to the Canada-U.S. border'' is a spatial association rule. A spatial classification process is a process that assigns a set of spatial objects into a number of given classes based on a set of spatial and non-spatial features (predicates). For example, the rule ``if a store is inside a large mall and is close to highways, then it brings high profits'' is a spatial classification rule. The developed algorithms are based on the progressive refinement approach. This approach allows for efficient discovery of knowledge in large spatial databases. A complete set of spatial association rules can be discovered, and accurate decision trees can be constructed, using the progressive refinement approach. Theoretical analysis and experimental results demonstrate the efficiency of the algorithms. The completeness of the set of discovered spatial association rules is shown through the theoretical analysis and the experiments show that the proposed spatial classification algorithm allows for better accuracy of classification than the algorithm proposed in the previous work [EKS97]. An important proposed optimization technique used in the progressive refinement approach, is that the search for patterns at high conceptual levels may apply efficient spatial computation algorithms at a relatively coarse resolution. Only the candidate spatial predicates, which are worth detailed examination will be computed by refined spatial computation techniques. Such a multi-level approach saves computation effort because it is very expensive to perform detailed spatial computation for all the possible spatial relationships. The results of the research have been incorporated into the spatial data mining system prototype, GeoMiner. GeoMiner includes five spatial data mining modules: characterizer, comparator, associator, cluster analyzer, and classifier. The SAND (Spatial And Nonspatial Data) architecture has been applied in the modeling of spatial databases. The GeoMiner system includes the spatial data cube construction module, the spatial on-line analytical processing (OLAP) module, and spatial data mining modules. A spatial data mining language, GMQL (Geo-Mining Query Language), is designed and implemented as an extension to Spatial SQL, for spatial data mining. Moreover, an interactive, user-friendly data mining interface has been constructed and tools have been implemented for visualization of discovered spatial knowledge.
[Kubon, 1999]: Author(s): Petr Pp Kubon.

Title: . Information Packaging Revisited.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, March 1999.

Abstract: Informational partition, i.e., the partition of a sentence into given and new information, has been studied extensively in the literature, but no existing account has succeeded in formulating a formal theory of the matter, amenable to computational implementation. Arguably, the information-packaging theory of Enric Vallduvi has been the most promising. In his account the informational partition of a sentence is interpreted as an instruction to the hearers as to how their respective knowledge stores should be updated with the information carried by the sentence. This gives a precise, concrete meaning to the notions of given and new information. Although Vallduvi's approach represents an important step forward for theories of informational partition, it still has several drawbacks which reduce its overall impact. This dissertation identifies these drawbacks and addresses them in a constructive manner. First, Vallduvi argues that the interpretation of informational partition requires that the hearer's knowledge store be organized in a specific way, as a file in the sense of Heim. I show that this conclusion is unwarranted and propose a more general theory of informational partition which supports different models of the hearer's knowledge store. To demonstrate the viability of this approach, I give two realizations of the general theory—one where the knowledge store is organized by means of files, and one where it is organized by means of discourse representation structures of Kamp and Reyle. Second, Vallduvi's account is not fully formal because his instructions for interpreting informational partition rely on auxiliary operations which have never been formally described or implemented. I address this problem by giving the instructions a fully developed, algorithmic interpretation. Finally, Vallduvi argues for dividing ground (the sentence part which encodes given information) further into two parts—link and tail. I show that this move is flawed for several reasons and that the theory of informational partition working only with two primitive notions of ground and focus (the sentence part which encodes new information) is clearly preferable.
[Li, 1999]: Author(s): Jin Li.

Title: . Constructing classification trees with exception annotations for large datasets.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, 11 1999.

Abstract: Classification is an important problem in data mining, which identifies essential features of different classes based on a set of training data and then classifies new instances into appropriate classes. Among many popular classification mining techniques, classification tree induction has attracted great attention in database research community due to its high accuracy, understandability and efficiency. Traditional classification tree algorithms work well for small datasets. In data mining applications, however, very large datasets with several million records are common. With the large amount and wide diversity of data, conventional classification trees become inaccurate, bushy and inefficient, which limits the utilization of classification tree algorithms in decision-making. In this thesis, we propose a novel method to construct classification trees for large datasets, which is called Classification Trees with Exception Annotations, or CTEA in short. The new method is a two-step process. First, a classification tree is constructed to a certain depth, which describes the main trend of the whole dataset. This step results in a concise classification tree efficiently. The leaf nodes of the tree are marked by different categories. Second, for the majority leaf nodes, %which describe the main trend of the dataset, exception annotations are extracted from the data subset local to these nodes, which contributes to improved accuracy. Two algorithms, ENEA and BAPE, are proposed to find exception annotations efficiently. ENEA is based on information entropy, which induces a hybrid classification tree by implicitly growing an exception subtree. BAPE is based on naive Bayesian property, which uses iterative pairwise merging method to find a set of exception rules directly from each non-majority class label group. Experimental results show that classification trees with exception annotations are more accurate and more concise than those without. In addition, our method is integrated with data warehouse functionalities, thus achieves scalable, flexible, multi-dimensional, multi-level classification mining with high performance.
[Morin, 1999]: Author(s): Christian Morin.

Title: . Markup and Transformation of Academic Degree Requirements Using XML.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1999.

Abstract: Academic institutions typically publish annual calendars that describe the rules and regulations for their degree programs. Using Extensible Markup Language (XML), this thesis investigates the development of a lightweight markup scheme for degree requirements that both structures the text of the regulations and formally encodes its logic. This formal encoding is shown to have immediate benefit in identifying cross-reference errors and other inconsistencies that otherwise may go undetected. In addition, the formal encoding captures the logic of the requirements in a way that allows a number of useful secondary products to be derived by transformation. These include hypertext (web-based) versions of a traditional printed calendar, requirements checklists so that students and advisors can track progress through particular programs, and computer software that implement degree audit: the automated checking of degree requirements. Furthermore, a multi-year markup notation is developed so that successive annual versions of the requirements may be described as revisions of previous versions. Transformations on the multi-year markup allow differences to be highlighted both as an aid to the curriculum revision process and as an aid to the evaluation of students who begin a program under one calendar and propose to graduate under a different one. The markup system and a full set of transformation tools have been implemented and demonstrated in application to a significant portion of the Simon Fraser University calendar.
[Simpson, 1999]: Author(s): David John Simpson.

Title: . Space-Efficient Execution of Parallel Functional Programs.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, November 1999. (PostScript)

Abstract: A parallel program consists of a set of tasks, some of which can be executed at the same time. If at some time during the execution of the program there are more tasks ready to be executed than there are available processors, a scheduling algorithm is used to determine what tasks will be executed at that time and on what processors. The scheduling algorithm should attempt to minimize the amount of time and space required to execute the program. We present a scheduling algorithm with guaranteed bounds on the amount of time and space required for the parallel execution of programs. The algorithm is greedy, so its time requirement is no more than double that required by an optimal schedule. Furthermore, our algorithm is ``space-efficient'', because when the program is executed on P processors, the space bound is PM, where M is the worst-case amount of space required over all single-processor executions of the program. We model the execution of a parallel program by a directed acyclic graph (dag), where nodes represent tasks and edges represent dependencies between tasks. A task is ready to be executed once all of its parent tasks have been executed. Each memory block is allocated by a task, and (in our General Model) released by any one task in a set of tasks specific to that memory block. We define three models derived from the General Model. In the Nested Model, all parallelism in the dag must be nested. In the Deterministic Model, the ``release set'' associated with each memory block contains a single task. The Nested Deterministic Model combines both restrictions. We give a linear-time algorithm for finding M for programs in the Nested Deterministic Model, and show that finding M for programs in the other models is NP-hard. We also give a linear-time algorithm for bounding M within a factor of two for programs in the Nested Model, provided that reference counts are used for determining when memory can be released. Since the ultimate goal of our research is the space-efficient execution of parallel functional programs, we show how our model and algorithm can be applied to functional languages. For instance, call-by-need, commonly used in functional languages, does not fit our model, so we use call-by-value. We have implemented our theory in two Java packages. The Sports package implements the scheduling algorithm for nested-parallel programs and also can sequentially compute M for programs in the Nested Deterministic Model and a bound on M (within a factor of two) for programs in the Nested Model. Using this package, we have written a second package that implements our prototype parallel functional language Flip. Flip is the first control-parallel functional language with guaranteed good bounds on the time and space required for its parallel execution.
[Tang, 1999]: Author(s): Simon Tang.

Title: . Towards Real-Time Magnetic Resonance Teleradiology Over the Internet.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 1999.

Abstract: Teleradiology involves transmitting high-resolution medical images from one remote site to another and displaying the images effectively so that radiologists can perform the proper diagnosis. Recent advances in medicine combined with those in technology have resulted in a vast array of medical imaging techniques, but due to the characteristics of these images, there are challenges to overcome in order to transmit these images efficiently. We describe in this thesis a prototype system that implements three image retrieval algorithms (Full-resolution, Scaled, and Pre-compressed) for retrieving magnetic resonance (MR) images that are stored on a web server. The MR images are displayed on a computer monitor instead of a traditional light screen. Our prototype system consists of a Java-based Image Viewer, and a web server extension component called, Image Servlet. The latter is responsible for transmitting the MR images to the Image Viewer. This thesis also investigates the feasibility of achieving real-time performance in retrieving and viewing MR images over the Internet using HTTP/1.0 protocol. We have conducted experiments to measure the round trip time of retrieving the MR images. Based on the findings from the experiments, the three image retrieval algorithms are compared and a discussion of their appropriateness for teleradiology is presented.
[Vernooy, 1999]: Author(s): Matthew Vernooy.

Title: . An Examination of Probabilistic Value Ordering Heuristics for CSPs.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, July 1999.

Abstract: Searching for solutions to constraint satisfaction problems (CSPs) is NP-hard in general. Heuristics for variable and value ordering have proven useful by guiding towards more fruitful areas of the search space and hence reducing the amount of time spent searching for solutions. Static ordering methods impart an ordering in advance of the search and dynamic ordering methods use information about the state of the search to order values or variables during the search. A previous static value ordering heuristic guides the search by ordering values based on an estimate of the number of solutions to the problem. This thesis introduces a new approximation method for static value ordering that decomposes the CSP into a set of subproblems and shows that it performs better on average than a competing method of Dechter and Pearl [Dechter-Pearl-88]. We use a probabilistic approximation method as a unifying framework for comparing several static heuristics. We examine the accuracy of the approximations as well as their ability to impart an ordering of values that requires fewer backtracks on average in the search process than random. Our empirical results show that although there is a positive correlation between the approximations and the actual probabilities, the static value-ordering methods have marginal utility as value-ordering advice in practice. Bayesian networks are probabilistic inference structures which provide a way of representing and computing joint probabilities for random variables. We combine our decomposition heuristic into a Bayesian network representation that uses information about the current state of the search to advise on the next move. Our empirical results show that this dynamic value ordering heuristic is an improvement on the static methods for sparsely constrained CSPs and detects insoluble problem instances earlier in the search. However, as the problem density increases, this dynamic method suffers similar insufficiencies as the static heuristics and does not significantly improve search performance.
[Wang, 1999]: Author(s): Wei Wang.

Title: . Predictive Modeling Based on Classification and Pattern Matching Methods.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, May 1999.

Abstract: Predictive modeling, i.e., predicting unknown values of certain attributes of interest based on the values of other attributes, is a major task in data mining. Predictive modeling has wide applications, including credit evaluation, sales promotion, financial forecasting, and market trend analysis. In this thesis, two predictive modeling methods are proposed. The first is a classification-based method which integrates attribute-oriented induction with the ID3 decision tree method. This method extracts prediction rules at multiple levels of abstraction and handles large data set and continuous numerical values in a scalable way. Since the number of distinct values in each attribute is reduced by attribute-oriented induction, the problem of favoring the attribute with a large number of distinct values in the original ID3 method is overcome. The second approach is pattern matching-based method which integrates statistical analysis with attribute-oriented induction to predict data values or value distributions of the attribute of interest based on similar groups of data in the database. The attributes which strongly influence the values of the attribute of interest are identified first by the analysis of data relevance or correlation using a statistical relevance analysis method. Moreover, by allowing users to specify their request in different concept levels, the system can perform prediction on user-desired concept levels to make the result more interesting and suitable to user's needs. This approach is domain-independent, capable to handle large volumes of data and multiple concept levels. Both proposed methods are implemented and tested. The performance study and experiments show that they work efficiently in large databases. Our study concludes that predictive modeling can be conducted efficiently at multiple levels of abstraction in large databases and it is practical at solving some large scale application problems.
[Zaiane, 1999]: Author(s): Osmar R Zaiane.

Title: . Resource and Knowledge Discovery from the Internet and Multimedia Repositories.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, March 1999. (PostScript)

Abstract: There is a massive increase of information available on electronic networks. This profusion of resources on the World-Wide Web gave rise to considerable interest in the research community. Traditional information retrieval techniques have been applied to the document collection on the Internet, and a panoply of search engines and tools have been proposed and implemented. However, the effectiveness of these tools is not satisfactory. None of them is capable of discovering knowledge from the Internet. The Web is still evolving at an alarming rate. In a recent report on the future of database research known as the Asilomar Report, it has been predicted that in ten years from now, the majority of human information will be available on the World-Wide Web, and it has been observed that the database research community has contributed little to the Web thus far. In this work we propose a structure, called a Virtual Web View, on top of the existing Web. Through this virtual view, the Web appears more structured, and common database technology is applied. The construction and maintenance of this structure is scalable and does not necessitate the large bandwidth current search engines technologies require. A declarative query language for information retrieval and networked tool programming is proposed that takes advantage of this structure to discover resources as well as implicit knowledge buried in the World-Wide Web. Large collections of multimedia objects are being gathered for a myriad of applications. The use of on-line images and video streams is becoming commonplace. The World-Wide Web, for instance, is a colossal aggregate of multimedia artifacts. However, finding pertinent multimedia objects in a large collection is a difficult task. Images and videos often convey even more information than the text documents in which they are contained. Data mining from such a multimedia corpus can lead to interesting discoveries. We propose the extraction of visual descriptors from images and video sequences for content-based visual media retrieval, and the construction of multimedia data cubes which facilitate multiple dimensional analysis of multimedia data, and the mining of multiple kinds of knowledge, including summarization, classification, and association, in image and video databases.

1998

[Cheng, 1998]: Author(s): Shan Cheng.

Title: . Statistical Approaches to Predictive Modeling in Large Databases.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, February 1998.

Abstract: Prediction, i.e., predicting the potential values or value distributions of certain attributes for objects in a database or data warehouse, is an attractive goal in data mining. To predict future events not shown in databases with high quality can help users to make smart business decisions. With the concern of both scalability and high quality of prediction, we propose a predictive modeling algorithm for interactive prediction in large databases and data warehouses. The algorithm consists of three steps: (1) data generalization, which converts data in relational databases or data warehouses into a multi-dimensional databases to which efficient analysis techniques can be applied; (2) relevance analysis, which identifies the attributes that are highly relevant to the prediction, to reduce number of attributes in prediction with the benefits in improving both efficiency and reliability of prediction; and (3) a statistical regression model, called generalized linear model, is constructed for high quality prediction. We explore two types of model featuring different problems. Moreover, with this method, a user can interact with a data mining system by presenting probes with constants at different levels of abstraction and attempt to predict values of a predicted attribute at different levels of abstraction. Also, a user may drill-down or roll-up along any attribute dimensions and then do prediction analysis. Our analysis and experimental results show that the method provides high prediction quality with modest or intermediate data generalization and it leads to efficient, interactive prediction in large databases.
[Hayden, 1998]: Author(s): Sandra C. Hayden.

Title: . WebMAS: A Multi-Agent System for Web-Based Classification.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1998.

Abstract: Agents are beginning to emerge on the Web. Agent systems, especially multi-agent systems, offer a flexible, distributed solution well-suited to the chaos of the milieu. Mobile agents, like spiders, may crawl the Web in search of relevant information. However, agents are more powerful than spiders. Agents may be endowed with additional capabilities such as intelligence, in the form of machine learning, and autonomy, supported by self-reliant planning and reasoning capabilities. This project endeavors to deploy a prototype multi-agent system which manifests a subset of agent capabilities, mobility and collaboration. Communications between the agents conform to the KQML standard, and a broker architecture facilitates interaction and matchmaking between the agents. The mobile agents employ neural nets to perform classification of newswire stories from the SGML Reuters-21578 Text Categorization corpus. The goal of the project is to design and verify the viability of this agent system architecture. The satisfactory outcome of the work indicates that this approach should be further pursued and promises that in the long term the adoption of such technologies should lead to more flexible, intelligent and autonomous information retrieval on the Web.
[johanna E. van der Heyden, 1998]: Author(s): johanna E. van der Heyden.

Title: . Magnetic resonance image viewing and the ``Screen Real Estate'' problem..

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1998.

Abstract: Medical image analysis is shifting from current film-oriented light screen environments to computer environments that involve viewing and analyzing large sets of images on a computer screen. In particular, Magnetic Resonance Imaging (MRI) studies involve multiple images and require a very large display area. This thesis examines the presentation of MRI studies on a relatively small computer screen in a manner that best suits the task of MRI viewing and the needs of the radiologist. By working interactively with MRI radiologists and observing their actions during MRI analysis in the traditional light screen environment the following key issues were determined: user control over image size, position and grouping, navigation of images and image groups, and trading off detail in context. These ``screen real estate'' issues are extensively explored in the literature but not consistently applied to medical image presentation. We apply existing techniques as well as variations of existing techniques to obtain an initial design proposal. An appropriate variable scaling layout algorithm was chosen to support the detail in context requirements along with several algorithmic variations which better suit the MRI specific requirements proposed. A user feedback study was then conducted to determine preference and degree of user enthusiasm to these proposals. The results were encouraging and response to the scaling layout pointed to improvements. A new variant of the algorithm was created to address these results. This work shows that existing guidelines with respect to presentation of information and other user interface issues, can be applied to the MRI viewing situation. We believe that by applying these principals along with existing layout adjustment and magnification techniques it is possible to improve upon the current form of medical image presentation found in existing commercial systems. The response to our user study both confirms this and indicates direction towards continued iterations of the work.
[Li, 1998]: Author(s): Sheng Li.

Title: . ActiveCBR: Integrating Case-Based Reasoning and Active Databases.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1998.

Abstract: Case-Based Reasoning (CBR) is an Artificial Intelligence (AI) technique for problem solving that uses previous similar examples to solve the current problem. Compared with rule-based expert systems, CBR has two major advantages: firstly, it uses fuzzy matching to describe similarities between the new problem and previous knowledge so as to avoid the complicated rule matching in rule-based expert systems; secondly, CBR systems are flexible, which means revising knowledge base is relatively easy. However, most of current CBR systems are passive: they require human users to activate them manually and to provide information about the incoming problem explicitly. Our solution to this problem is to integrate a CBR system with an active database system. Active databases, with the support of active rules, can perform event detecting, condition monitoring, and event handling (action execution) in an automatic manner. In this thesis, we propose an ActiveCBR architecture which builds a case-based reasoning subsystem on top of an active database, and realizes problem solving based on the data and events in a relational database. The ActiveCBR system consists of two layers. In the lower layer, the active database is rule-driven; in the higher layer, the result of action execution of active rules is transformed into feature-value pairs required by the CBR subsystem. The layered architecture separates case-based reasoning from complicated rule-based reasoning, and improves the traditional passive CBR system with the active property. Advantages of the combined ActiveCBR system include flexibility in knowledge management that an active database system lacks, and having the CBR system autonomously respond to external events that a passive CBR system lacks. This work, which is a first attempt to combine the two systems, contributes to both database research and CBR research in that it allows the combined knowledge base to be highly autonomous, well scalable, and easily changeable. We demonstrate the system efficiency and effectiveness through empirical tests.
[Mussbacher, 1998]: Author(s): Gunter Mussbacher.

Title: . Combining Case Based Reasoning and Commonality Analysis for Software Requirements Reuse.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1998.

Abstract: Requirements reuse may significantly reduce life cycle costs by improving the quality of requirements specifications, mitigating risks, and facilitating design, code, and test reuse. A novel process is presented describing how the combination of case based reasoning (CBR) techniques and the commonality analysis (CA) may be used to provide tool based support for efficient requirements reuse for a domain. Case based reasoning is a problem solving and knowledge reuse technique. The commonality analysis is a software engineering technique that highlights the variabilities and commonalities of systems within a domain. A domain, a family of systems, is a set of existing and future applications with overlapping capabilities and data. The CA/CBR system structures system artifacts based on commonalities and variabilities. The CA/CBR system uses similarity based retrieval of requirements traceability matrices to effectively create requirements documents of a new system based on a selection of high level requirements. Furthermore, the CA/CBR system takes into consideration effort estimates for the new system derived from historical data and requirements traceability. The process is evaluated through a series of experiments involving the development of a sample CA document and sample systems within a domain. Empirical results suggest that the CA/CBR system may improve the efficiency of requirements elicitation and provide a reasonable development effort estimate at the requirements phase.
[Pierzynska, 1998]: Author(s): Alicja Pierzynska.

Title: . Accounting for Electrical Phenomena in Delay Fault Testing.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, November 1998.

Abstract: Testing and design for testability are critical steps in the process of building reliable electronic circuits. This dissertation contributes to improving the quality of delay fault testing: testing for manufacturing faults that result in delays beyond circuit specifications. It highlights the importance of considering electrical phenomena when developing techniques for identifying delay faults. Previous research in delay fault testing has not considered such phenomena, and we show that when they are disregarded, faulty circuits can be declared fault-free. Our results have many implications for this field and re-open numerous problems. In this dissertation, we lay a new foundation for delay fault testing: * We demonstrate that a basic, often implicit assumption underlying research in delay fault testing is invalid due to circuit electrical phenomena. We identify three delay effects that cause this invalidation and we develop gate-level guidelines to account for these effects. * We show that our findings have a profound impact on concepts and techniques used in delay fault testing. In particular, they necessitate a revision of the delay test definition and of the criteria for test quality evaluation, both of which are fundamental in delay test procedures and techniques. * We present constructive solutions to some of the problems that result from our findings. These include a method for estimating ranges of path delays, and an algorithm for eliminating through circuit transformations one of the delay effects we identified. Our findings regarding delay modeling, as well as the method for estimating path delays, are applicable not only to delay fault testing, but also to timing analysis.
[Storey, 1998]: Author(s): Margaret-Anne D. Storey.

Title: . A Cognitive Framework for Describing and Evaluating Software Exploration Tools.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, December 1998.

Abstract: Software programs, especially legacy programs, are often large , complex and poorly documented. To maintain these programs software engineers require a variety of efficient analytical tools. Some software maintenance tools use visualizations (i.e. graphical views) to communicate information about software systems. Although many software visualization tools exist, the majority of them are not very effective in practice. Part of the problem is that they are designed in an ad hoc manner, with little empirical evaluation. They are often criticized because they try to force programmers to use a specific approach to understanding software rather than supporting their own approaches. The result is that current software visualization tools do not play as big a role in industry as was anticipated by some researchers. The tools that are used are very basic, consisting of mainly text editors and searching features. With increasingly fast computing platforms, there is great potential for the use of visualization tools to significantly improve the efficiency of software maintenance. This thesis presents an iterative approach for building software exploration tools. Software exploration tools provide graphical representations of software structures linked to textual views of the program source code and documentation. The methodology consists of several iterative phases of design, development and evaluation. The cycle starts with a framework of cognitive design elements to highlight those activities which require tool support. A prototype tool called SHriMP (Simple Hierarchical Multi-Perspective) views has been designed using this framework as a guide. SHriMP combines several visualization methods and static analysis techniques to enable a programmer to understand and document legacy software systems. It has been evaluated and compared to other options in two user studies. Observations from these studies were used to improve the cognitive framework of design elements, which in turn were used to improve the design of this and other software exploration tools. Many of the lessons learned through this adaptive approach to design are relevant for other categories of software engineering tools.
[Tam, 1998]: Author(s): Yin Jenny Tam.

Title: . Datacube: Its Implementation and Application in OLAP Mining.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, September 1998.

Abstract: With huge amounts of data collected in various kinds of applications, data warehouse is becoming a mainstream information repository for decision support and data analysis mainly because a data warehouse facilitates on-line analytical processing (OLAP). It is important to study methods for supporting data warehouses, in particular its OLAP operations, efficiently. In this thesis, we investigate efficient methods for computing datacubes and for using datacubes to support OLAP and data mining. Currently, there are two popular datacube technologies: Relational OLAP (ROLAP) and Multidimensional OLAP (MOLAP). Many efficient algorithms have been designed for ROLAP systems, but not so many for the MOLAP ones. MOLAP systems, though may suffer from sparsity of data, are generally more efficient than ROLAP systems when the sparse datacube techniques are explored or when the data sets are small to medium sized. We have developed a MOLAP system which combines nice features of both MOLAP and ROLAP. Our performance study shows that such an integrated system is faster than plain MOLAP or ROLAP systems in many cases. We also discuss the promising direction of OLAP mining (data mining integrated with OLAP). On top of this MOLAP system, we have developed some OLAP mining modules to assist users to discover different kinds of knowledge from the data. Decision makers often do not know exactly what they want when they do OLAP or mining, so OLAP mining helps them explore the data flexibly from different angles and at multiple abstraction levels.
[Wang, 1998]: Author(s): Jian Wang.

Title: . Motion-based Object Segmentation from Digital Video.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1998.

Abstract: This research is related to the MPEG-4 video coding standard, which aims to provide an object based coding standard. In MPEG-4, background and foreground objects are coded in separate layers. However, the automatic separation of foreground and background objects remains an open problem in the MPEG-4 standard. This thesis proposes a framework to solve this problem and presents some promising results. In this thesis, the major cue for object segmentation is the motion information, which is initially extracted from the MPEG motion vectors. The motion vectors in MPEG video can provide valuable motion information of macroblocks of 16 times 16 pixels, which is helpful for object segmentation. However, the problem is that these motion vectors are extracted for simple video compression without any consideration of visual objects. Sometimes they may not correspond to the true motion of the object that occupies the macroblock. A new concept kernel is introduced, which is a region of neighboring macroblocks with similar motion vectors. We propose a Kernel-based Multiple Cue (KMC) algorithm, which has a kernel detection and merge procedure and a multiple cue refinement procedure. The kernel detection and merge procedure deals with the inconsistency of the MPEG motion vectors. And the multiple cue refinement procedure, which uses both motion and color cues, is proposed to detect the part of a moving object that cannot be detected using the motion cue only. A spatial segmentation algorithm is applied to extract the object shape. Moreover, an object tracking algorithm is proposed to detect the segmented objects and their motion trajectory over multiple frames. The framework presented in this thesis successfully handles the inconsistency of the MPEG motion vectors and it uses multiple cues for object segmentation. It makes no a prior assumption about the content of the video and can deal with various camera motions as well.
[Xiao, 1998]: Author(s): Feng Xiao.

Title: . Availability and Probe Complexity of Non-Dominated Coteries.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, December 1998.

Abstract: The concept of coterie is a useful abstraction, which has practical applications in distributed systems. A coterie under a finite set U consists of pair-wise intersecting subsets (called ``quorums'') of U. Let the elements of U model the nodes of a distributed system, and assume that each node is operational with probability p. The availability of a coterie is defined as the probability that all nodes in at least one quorum are operational. Non-dominated (ND) coteries are special coteries with certain desirable properties. To analyze their availability, we classify all ND coteries under U into several classes. Each class has a unique member, called a ``regular'' ND coterie. Using the ρ-transformation introduced by Bioch and Ibaraki, we show that there is a very simple relationship among regular coteries, and that all members belonging to a class can be generated by repeated applications of the ρ-transformation to its regular member. We find the maximum and minimum availabilities for each class of ND coteries. When n=|U| is even, we show that all ND coteries in the largest subclass have the same availability, which is the highest possible. The probe complexity is the number of ``probes'' necessary to find a ``live'' quorum or to determine that no such quorum exists. In order to obtain the average-case optimal probe strategy, we design and implement six probing heuristics to find the optimal probe strategies for most ND coteries in polynomial time. Finally, we extend the study to message complexity.
[Zhu, 1998]: Author(s): Hua Zhu.

Title: . On-Line Analytical Mining of Association Rules.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1998.

Abstract: With wide applications of computers and automated data collection tools, massive amounts of data have been continuously collected and stored in databases, which creates an imminent need and great opportunities for mining interesting knowledge from data. Association rule mining is one kind of data mining techniques which discovers strong association or correlation relationships among data. The discovered rules may help market basket or cross-sales analysis, decision making, and business management. In this thesis, we propose and develop an interesting association rule mining approach, called on-line analytical mining of association rules, which integrates the recently developed OLAP (on-line analytical processing) technology with some efficient association mining methods. It leads to flexible, multi-dimensional, multi-level association rule mining with high performance. Several algorithms are developed based on this approach for mining various kinds of associations in multi-dimensional databases, including intra-dimensional association, inter-dimensional association, hybrid association, and constraints-based association. These algorithms have been implemented in the DBMiner system. Our study shows that this approach presents great advantages over many existing algorithms in terms of both flexibility and efficiency.

1997

[Adhikary, 1997]: Author(s): Junas Adhikary.

Title: . Forest Harvest Scheduling Problem: Studying Mathematical and Constraint Programming Solution Strategies.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, January 1997.

Abstract: The Forest Harvest Scheduling Problem (FHSP) is an important part of the forest resource management. It is a complex multi-criteria optimization problem. Our review of the optimization techniques applicable to forest harvesting problems in general suggests that long term scheduling is difficult because of the prohibitive size and the complexity inherent to the problem. In this thesis, we study mathematical and constraint programming as both modelling and solving techniques for such problems. We discuss how these solution strategies may accommodate the forest growth and treatment simulation model. The simulation model is an important part of the scheduling system since it makes the solution more realistic and implementable. We give a mixed integer programming (MIP) formulation for the restricted FHSP problem and test it using real-life data from the Norwegian ECOPLAN project. Since larger instances were not solved in a reasonable time, the MIP model alone is not sufficient to solve practical FHSP. We advocate the combined use of the Constraint Satisfaction Problem (CSP) model and the iterative improvement technique as a solution strategy. The simulation model can also be more easily integrated in this strategy than in the MIP model. The iterative improvement techniques will in general benefit from the high quality initial solutions. We show how a CSP formulation can be used to find ``good'' initial solutions to the problem, based on simulation results from a stand growth and treatment simulator. The initial solution generator can be used as a module in an integrated forest treatment scheduling system which will advance the state-of-the-art in forest harvesting practices.
[Arnold, 1997]: Author(s): Dirk Arnold.

Title: . Evolution of Legged Locomotion.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, July 1997. (PostScript)

Abstract: The realistic animation of human and animal figures has long been a prime goal in computer graphics. A recent, physically-based approach to the problem suggests modeling creatures as actuated articulated bodies equipped with a ``virtual brain'' which generates the control signals required by the actuators to produce the desired motion. The animation of such a creature is simply a forward simulation of the resulting motion under the laws of physics in time. While using this approach ensures physically realistic motion, there is no obvious solution to the problem of devising a control system that leads to the desired motion. Even though good results have been achieved by carefully handcrafting control systems on the basis of biomechanical knowledge and physical intuition, it is desirable to produce control system automatically. Evolutionary algorithms which iteratively improve randomly generated initial control systems have shown to be a promising approach to this problem. This thesis introduces spectral synthesis as a tool for generating control systems to be optimized in an evolutionary process and demonstrates the viability of the approach by evolving creatures for the task of legged locomotion. Other than representations of control systems that have previously been used for evolving useful behavior, spectral synthesis guarantees evolvability, improving the chances of the evolutionary search to succeed. Virtual creatures exhibiting a great variety of modes of locomotion, including hopping, crawling, jumping, and walking, have been evolved as part of this work. The incorporation of more goal-directed components remains as a future goal. A second accomplishment of this thesis is the derivation of the equations describing the effect of applying a contact force to an articulated body on its acceleration, making it possible to generalize the common algorithms for handling contacts in systems of rigid bodies to articulated bodies. The physical simulation algorithm described in this thesis allows for real time simulation of articulated creatures of up to about twenty degrees of freedom. Efficient simulation algorithms are especially important as evolutionary optimization requires the evaluation and therefore simulation of the behavior of a great number of creatures.
[Bacik, 1997]: Author(s): Roman Bacik.

Title: . Structure of Graph Homomorphisms.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, August 1997. (PostScript)

Abstract: In this thesis we study finite graphs and graph homomorphisms from both, a theoretical and a practical view point. A homomorphism between two graphs G and H is a function from the vertex set of G to the vertex set of H, which maps adjacent vertices to adjacent vertices. In the first part of this thesis we study homomorphisms, which are equitable. Suppose we have a fixed graph H. An equitable H-coloring of a graph G is a homomorphism from G to H such that the preimages of vertices of H have almost the same size (they differ by at most one). We consider the complexity of the following problem: INSTANCE: A graph G. QUESTION: Does G admit an equitable H-coloring? We give a complete characterization of the complexity of the equitable H-coloring problem. In particular, we show that the problem is polynomial if H is a disjoint union of complete bipartite graphs, and it is NP-complete otherwise. To get a better insight into a combinatorial problem, one often studies relaxations of the problem. The second part of this thesis deals with relaxations of graph homomorphisms. In particular, we define a fractional homomorphism and a pseudo-homomorphism as a natural relaxations of graph homomorphism. We show that our pseudo homomorphism is equivalent to a semidefinite relaxation, defined by Feige and Lovasz. We also show that there is a simple forbidden subgraph characterization for our fractional homomorphism (the forbidden subgraphs are cliques). As a byproduct, we obtain a simpler proof of the NP-hardness of the fractional chromatic number, a result which was first proved by Grotschel, Lovasz and Schrijver using the ellipsoid method. We also briefly discuss how to apply these results to the directed case. In the last part of this thesis we consider equivalence classes of graphs under the following equivalence: two graphs G and H are equivalent, if there exist homomorphisms from G to H and from H to G. We study the multiplicative structure of these equivalence classes, and give a necessary and sufficient condition for the existence of a finite factorization of a class into irreducible elements. We also relate this problem to some graph theoretic conjectures concerning graph product.
[Chang, 1997]: Author(s): Chia-We Chang.

Title: . Continuous Shortest Path Problems with Time Window Constraints.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1997. (PostScript)

Abstract: The shortest path problem with time window constraints and costs (SPW-Cost) consists of finding a least cost route between a source and a sink in a network G=(N,A) such that a vehicle visits each node within their specified time windows [a_i,b_i]. Each arc (i,j) in A has a positive duration d_ij and an unrestrictive cost c_ij. This problem has appeared as a sub-problem of many vehicle routing and scheduling problems, most of which are known to be NP-hard. In this thesis, we will study a variant of SPW-Cost called Continuous Shortest Path Problem with Time Window Constraints (Continuous-SPW). Unlike SPW-Cost where a vehicle is allowed to wait at a node for a time window to open, in Continuous-SPW the vehicle must move continuously in the network only passing through the nodes whose time windows are open. We will determine the complexity of this and other versions of Continuous-SPW for restricted classes of graphs that are of practical interest. Our goal is to construct sequential algorithms and determine their running time complexities. We will also provide a parallel algorithm for the general Continuous-SPW and show how these results can be extended to handle SPW-Cost problems.
[Chiu, 1997]: Author(s): Wai Man Raymond Chiu.

Title: . Interactive Data-Driven Web Applications.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, September 1997.

Abstract: Great efforts have been made to develop mechanisms for delivering sophisticated applications over the Web in the past. Numerous technologies have recently been developed which not only make the Web an effective means for hypermedia information retrieval, but also give it a capability of executing interactive and high-impact Internet applications in a powerful and efficient manner. This is particularly true for Web database access technology. Traditional approaches basically drop the database connection once an operation has finished - hence operations are independent from each other. The newer on-line approaches either keep the database connection open throughout the whole session or effectively store the states of current users and possibly other information in the client-cache, thereby yielding better performance, higher capability, and a lower level of programmatic complexity. Three basic issues are associated with Web database access technologies: (i) the efficiency of remote database access from a Web browser, (ii) the effectiveness of the graphical user interface (e.g. the level of user-friendliness and interactivity), and (iii) the effectiveness and flexibility of application development tools. This thesis investigates these three issues by comparing various architectures in order to evaluate the feasibility of using the newer technologies for developing sophisticated data-driven Web applications. To compare the newer techniques with traditional approaches, a series of quantitative and qualitative analyses will be presented, by means of experiments and sample applications.
[Gong, 1997]: Author(s): Wan Gong.

Title: . Periodic Pattern Search in Time-Related Data Sets.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, November 1997.

Abstract: For many applications such as accounting, banking, business transaction processing systems, geographical information systems, medical record book keeping, etc., the changes made on their databases over time are a valuable source of information which can direct the future operation of the enterprise. In this thesis, we will focus on relational databases with historical data or, in other words, time-related data, and try to extract from them some useful knowledge about their periodic behavior. The discovered knowledge could provide user some future guidance, to which end techniques in knowledge discovery and data warehousing become important. Knowledge discovery and data warehousing have been increasingly important in handling and analyzing large databases efficiently and effectively. We can take advantage of existing on-line analytical processing techniques widely used in knowledge discovery and data warehousing, and apply them on time-related data to solve periodic pattern search problems. The problems discussed in this presentation include two types. One is to find periodic patterns of a time series with a given period, while the other is to find a pattern with arbitrary length of period. The algorithms will be presented, along with their experimental results.
[Luo, 1997]: Author(s): Yongping Luo.

Title: . Handling Motion Processing Constraints for Articulated Figure Animation.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1997.

Abstract: The animation of articulated models such as human figures remains a major challenge because of the many degrees of freedom involved and the complex structure of the models. A particular challenge is to find ways to generate new movement from existing movement sequences produced by animating or motion capture. Techniques from the image and signal processing domain can be applied to motion sequences. We have successfully applied motion multiresolution filtering, multitarget motion interpolation and waveshaping simultaneously to many or all degrees of freedom of an articulated figure. Thus we can edit, modify and blend existing motions in ways that would be difficult or impossible to accomplish otherwise. These motion transforms result in interesting new movement sequences but may introduce constraint violations. The constraint violation problem is inherent to the motion transform techniques and cannot be avoided. In this thesis, we first show how to generally reduce the chance and degree of constraint violations by off setting motion data and applying consistent dynamic timewarping before motion transforms. One way is provided to let the animator specify constraints, and the system checks the constraints and adjusts the constraint violations in a smooth manner before the new motion is presented to the animator. Most geometric constraint violations can be handled successfully at interactive speeds with our system. We also provide some tools for the animator to modify the new motion at a relatively low level, thus giving the animator more control over constraint violation adjustment; therefore these tools can be used to handle constraints which are hard to specify in the above system.
[Niguma, 1997]: Author(s): Gordon K. Niguma.

Title: . Concept Mapping In A Multimedia, World Wide Web, Environment.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, May 1997. (PostScript)

Abstract: Traditional educational pedagogies focus primarily on the behaviorist's view of knowledge transfer from the instructor to the learner through the instructor's interpretation of meaning. However, research in education indicates that this method of teaching is not always the most effective one. Students often fail to understand the deeper meaning of concepts and relations using the behaviorist model. Conversely, the constructivist view focuses on a student's own construction and interpretation of concepts and associated relationships. The constructivist approach requires individuals to organize and structure knowledge in their own manner which leads to ``a more complete and coherent understanding'' (Scardamalia and Bereiter [ ref bib:scar-ber]). This thesis focuses on one specific tool that supports the constructivist method of learning, known as concept mapping. The purpose of the concept map is to identify key concepts and the relationships between these concepts in an instructional setting under various levels of abstraction. The learner is encouraged to think reflectively about what they have studied using the concept map. Concept maps are graphical representations created by learners which consist of polygons (to represent students' concepts and ideas) and labeled lines connected between polygons which represent relationships between concepts and ideas. We will refer to the polygons as ``nodes'' and the lines as ``links''. Traditionally, paper and pencil has been used to create concept maps but this method limits students to a flat representation of knowledge is also difficult to edit. A computer-based tool can allow easy modification of concept maps as well as providing high resolution graphics, Internet resources and multimedia to the user. This thesis will describe the design and implementation of a computer-based concept mapping tool. A small study was conducted to evaluate the ease of user interaction and the effectiveness of the tool in assisting understanding.
[Pantel, 1997]: Author(s): Christian Pantel.

Title: . A Framework for Comparing Web-Based Learning Environments.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, October 1997.

Abstract: As notions such as the information revolution and knowledge-based economies become increasingly realistic, there is considerable pressure on academic, corporate and governmental decision makers to improve both the accessibility and the quality of learning opportunities that they provide to those they serve. Many are turning to World Wide Web-based technologies as part of the solution. A Web-based learning environment is a networked computer application that enables people to learn from a distance. Learners can be physically separated from teachers and from each other, and they can participate in the learning environment at their convenience. Choosing which Web-based learning environment to adopt is an important decision as it often involves a substantial investment for organizations and significantly impacts how people will learn. In order to better understand how people learn and the possibilities for Web-based learning environments to support learning, we review the relevant educational theory. We also review the human-computer interaction literature to better understand how people can effectively use software. Intuitive and ad-hoc comparisons between competing products are not likely to lead to adopting the optimal Web-based learning environment for a particular organization. Therefore, we propose a comparison framework for Web-based learning environments which is based primarily on educational theory and human-computer interaction research. The comparison framework consists of a large number of comparison dimensions covering a broad range of issues relevant to the adoption of a Web-based learning environment. The comparison framework serves two main audiences. The primary audience consists of decision makers of organizations considering the adoption of a Web-based learning environment. We propose a methodology that these decision makers can follow when using the comparison framework to guide them in selecting the most appropriate Web-based learning environment for their organization. The secondary audience consists of developers of Web-based learning environments. By examining the comparison dimensions and by reading the literature reviews, developers may identify strengths and weaknesses of their product which can be useful in marketing and in planning future product releases.
[Peterson, 1997]: Author(s): Philip Ray Peterson.

Title: . A Genetic Engineering Approach to Texture Synthesis.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 1997.

Abstract: Much of the richness and detail found in computer graphics imagery is the result of the application of texture maps to three-dimensional geometric models. In many cases, the best approach is to generate a texture procedurally. However, the definition of a texture synthesis procedure can be a complex task. In addition to defining a functional algorithm, often a large parameter space must be explored. Techniques based on natural systems are increasingly applied as search procedures in a variety of problems. One such technique, genetic programming, has been used to create procedural texture programs. Under human guidance, both program and parameter space are explored simultaneously. This method, however, has several shortcomings when used alone. In this thesis, a genetic engineering approach is developed. Based on interactive evolution, the genetic programming paradigm is used to evolve procedural texture programs of varying size and shape, a hybrid genetic algorithm is used to explore the parameter space of target programs, and a data-flow representation is used for the direct manipulation of program structure and variables. A prototype texture design tool implementing this approach is described. The effectiveness of this prototype is discussed, and a number of examples are presented showing how it can be used successfully to design a variety of texture synthesis programs.
[Racine, 1997]: Author(s): Kirsti Racine.

Title: . Design and Evaluation of a Self Cleaning Agent for Maintaining Very Large Case Bases (VLCB).

M.Sc. Thesis, School of Computing Science, Simon Fraser University, June 1997. (PostScript)

Abstract: The objective of this thesis is to establish a theoretical and empirical framework for the design of an agent which can maintain very large unstructured case bases, ensuring that they remain current and useful. With the dramatic proliferation of case based reasoning systems in many commercial applications, many case bases are now becoming legacy data sources. While they represent a significant portion of an organization's assets, they are large and difficult to maintain. There are many sources for the complexity of the maintenance task: case bases are created over a long period of time and are updated by different people; high industry turnaround suggests that the authors of a case base may not still be with the organization; cases may be obtained from different, even geographically diverse sources; and finally, industry markets change, implying that the case bases are highly time and market dependent. These factors contribute to the difficulties in ensuring a case base is current and useful. My solution to the maintenance problem is to develop a self cleaning agent which works with users in maintaining a legacy case base in a seamless fashion. This self cleaning module features a set of user entered guidelines for detecting redundant and outdated cases. The guidelines are written in a language that is easily manipulatable by any non-expert user. As the ability to contain the knowledge acquisition problem is of paramount importance, using this system allows one to express domain expertise naturally and effortlessly. Empirical evaluations of the system prove the effectiveness of the agent in several large industrial domains.
[Stefanovic, 1997]: Author(s): Nebojsa Stefanovic.

Title: . Design and Implementation of On-Line Analytical Processing(OLAP) of Spatial Data.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, September 1997. (PostScript)

Abstract: On-line analytical processing (OLAP) has gained its popularity in database industry. With a huge amount of data stored in spatial databases and the introduction of spatial components to many relational or object-relational databases, it is important to study the methods for spatial data warehousing and on-line analytical processing of spatial data. This thesis investigates methods for spatial OLAP, by integration of nonspatial on-line analytical processing (OLAP) methods with spatial database implementation techniques. A spatial data warehouse model, which consists of both spatial and nonspatial dimensions and measures, is proposed. Methods for computation of spatial data cubes and analytical processing on such spatial data cubes are studied, with several strategies proposed, including approximation and partial materialization of the spatial objects resulting from spatial OLAP operations. Some techniques for selective materialization of the spatial computation results are worked out, and the performance study has demonstrated the effectiveness of these techniques. Spatial OLAP has been partially implemented as a part of GeoMiner, a system prototype for spatial data mining. Keywords: Data warehouse, data mining, on-line analytical processing (OLAP), spatial databases, spatial data analysis, spatial OLAP.
[Vikas, 1997]: Author(s): Narayan Vikas.

Title: . Computational Complexity of Graph Compaction.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, August 1997.

Abstract: In this thesis, we study the computational complexity of a special colouring problem for graphs, called the compaction problem. The colouring problem is a classic problem in graph theory, and well known to have various applications. The colouring problem has been generalised in literature to the graph homomorphism problem, also known as the H-colouring problem. The compaction problem is the graph homomorphism problem with additional constraints. The study of graph compaction is motivated by both theoretical and practical considerations. From a theoretical point of view, the compaction problem includes some problems open for a long time, some of which we solve in this thesis. Many of these problems are concerned with the question whether the compaction problem is polynomially equivalent to a special graph homomorphism problem called the retraction problem. From a practical point of view, we mention an application of compaction in a multiprocessor system for parallel computation, and we also establish a very close relationship between the compaction problem and the constraint satisfaction problem. The constraint satisfaction problem is well known to have an important role in artificial intelligence. We say that a graph is reflexive if each of its vertices has a loop, and irreflexive if none of its vertices has a loop. Any graph, in general, is said to be partially reflexive. In the following, let G and H be graphs. A homomorphism f : G -> H, of G to H, is a mapping f of the vertices of G to the vertices of H, such that f(g) and f(g') are adjacent vertices of H whenever g and g' are adjacent vertices of G. A compaction c: G -> H, of G to H, is a homomorphism of G to H, such that for every vertex x of H, there exists a vertex v of G with c(v) = x, and for every edge hh' of H, h neq h', there exists an edge gg' of G with c(g) = h and c(g') = h'. If there exists a compaction of G to H then G is said to compact to H. Let H be a fixed graph. The problem of deciding the existence of a compaction to H, called the compaction problem for H, and denoted as COMP-H, is the following: Instance : A graph G. Question : Does G compact to H? We show that COMP-H is NP-complete when H is a reflexive k-cycle, for all k >= 4. In particular, for k = 4, this solves a widely publicised open problem posed by Winkler in 1988. When H is a reflexive chordal graph (which includes the case of a reflexive 3-cycle), we observe that COMP-H is polynomial time solvable. We also show that COMP-H is NP-complete when H is an irreflexive even k-cycle, for all even k >= 6. Determining the complexity of COMP-H when H is an irreflexive even k-cycle, for any particular even k >= 6, has also been considered before for a long time. When H is a non-bipartite irreflexive graph (which includes the case of irreflexive odd cycles), we notice that COMP-H is NP-complete. When H is a chordal bipartite graph (which includes the case of an irreflexive 4-cycle), we point out that COMP-H is polynomial time solvable. We further show that COMP-H is NP-complete when H is a path of length k, with loops on its first and last vertices only, for all k >= 2. Using this result, we prove NP-completeness of COMP-H for some other partially reflexive graphs H also. When H is a forest with trees H_1, H_2, ..., H_s, such that the set of vertices with loops in H_i induces a connected subgraph of H_i, for all i = 1, 2, ..., s, we note that COMP-H is polynomial time solvable. We give a complete complexity classification of COMP-H when H is any (partially reflexive) graph with four or fewer vertices, showing that COMP-H is either polynomial time solvable or NP-complete. We establish a very close relationship between the compaction problem and the retraction problem, and offer evidence that it is likely to be difficult to determine whether for every reflexive or bipartite graph H, the problem COMP-H is polynomial time solvable or NP-complete.
[Wood-Gaines, 1997]: Author(s): Adam Wood-Gaines.

Title: . Modelling Expressive Movement of Musicians.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, February 1997. (PostScript)

Abstract: This thesis addresses the problem of modelling expressive movement in human figure animation. The domain of music is ideal for the study of expressive movement since there is often a relationship between the expression of music and the dynamics of a musician's body. The goal of this thesis is to provide control of kinematic expression in the animation of musical scores. Drumming movements in particular are appropriate to model because there is a wide range of movements to convey expression, and their physical quality makes them more readily observable than movements used with instruments such as the clarinet. Such a visualization is directly applicable to music education and score design, while the techniques to affect the expression could be used for other kinds of animated movement. The proposed system is SMED, a system for modelling expressive drumming. SMED reads any MIDI-encoded drum score and in real-time renders a 3D animation of a drummer's performance of that score. It allows the user to modify the frequency and amplitude of joint rotations in order to affect the perceived expression of movement. The quality of the generated movement was tested by having subjects interpret the kinematics of a number of performances. The effectiveness of SMED as a high-level tool for specifiying expressive movement was tested by having subjects manipulate the user interface to create drumming animations for several contrasting musical scores. The results of the study found that while there were a number of suggested refinements, subjects were able to recognize and interpret expressive aspects of performances and could manipulate the interface to create expressive performances with ease. SMED represents an initial example of interactive specification of expressive movement in musical performance, and provides a solid foundation for future work.
[Xia, 1997]: Author(s): Betty Bin Xia.

Title: . Similarity Search in Time Series Data Sets.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1997.

Abstract: Similarity search on time-series data sets is of growing importance in data mining. With the increasing amount of data of time-series in many applications, from financial to scientific, it is important to study the methods of retrieving similarity patterns efficiently and user friendly for business decision making. The thesis proposes methods of efficient retrieval of all objects in the time-series database with a shape similar to a search template. The search template can be either a shape or a sequence of data. Two search modules, subsequence search and whole sequence search, are designed and implemented. We study a set of linear transformations that can be used as the basis for similarity queries on time-series data, and design an innovative representation technique which abstracts the shape notion so that the user can interactively query and answer the multi-level similarity patterns. The wavelet analysis technique and the OLAP technique used in knowledge discovery and data warehousing are applied in our system. The retrieval technique we propose is efficient and robust in the presence of noise, and can handle several different notions of similarity including changes in scale and shift.
[Zhang, 1997]: Author(s): Zhong Zhang.

Title: . Static and Dynamic Feature Weighting in Case-Based Reasoning (CBR).

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1997.

Abstract: Case-based reasoning (CBR) is a recent approach to problem solving, in which domain knowledge is represented as cases. The case retrieval process, which retrieves the cases most similar to the new problem, depends on the feature-value pairs attached to cases. Different feature-value pairs may have different importance in this process, which is usually measured by what we call the feature weight. Three serious problems arise in the practical applications of CBR regarding the feature weights. First, the feature weights are assigned manually by humans, not only making them highly informal and inaccurate, but also involving intensive labor. Second, a CBR system with a static set of feature weights cannot cater to a specific user. It would be desirable to enable the system to acquire the user preferences automatically. Finally, a CBR system often functions in a changing environment, either due to the nature of the problems it is trying to solve, or due to the shifting needs of its user. We wish to have a CBR system that always adapts to the user's changing preferences in time. These three problems comprise one of the core tasks of case base maintenance problem. Our approach to these problems is to maintain feature weighting in both the static and dynamic contexts. The static feature weighting method grasps the irregular distribution information of feature-value pairs within a case base. Our intuition is that the more cases a feature-value pair is associate with, the less information it conveys. The dynamic feature weighting method examines the feature weights in a changing environment. We integrate a neural network into CBR, in which, while the reasoning part is still case based, the learning part is shouldered by a neural network. We hope that this integrated framework would be a living system, which learns a user's preferences over time, and simulates these ces with its own behavior. We propose and implement the underlying algorithms for these two feature-weighting methods. Our empirical tests produce the optimal results that we desire, and confirm our hypotheses and claims. Our work contributes much to the research in the maintenance of knowledge bases.

1996

[Fall, 1996]: Author(s): Andrew Fall.

Title: . Reasoning with Taxonomies.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, December 1996. (PostScript)

Abstract: We journey to learn, yet in travelling grow each day further and further from where we began - Wade Davis. Taxonomies are prevalent in a multitude of fields, including ecology, linguistics, programming languages, databases, and artificial intelligence. In this thesis, we focus on several aspects of reasoning with taxonomies, including the management of taxonomies in computers, extensions of partial orders to enhance the taxonomic information that can be represented, and novel uses of taxonomies in several applications. The first part of the thesis deals with theoretical and implementational aspects of representing, or encoding, taxonomies. Our contributions include (i) a formal abstraction of encoding that encompasses all current techniques; (ii) a generalization of the technique of modulation that enhances the efficiency of this strategy for encoding and reduces its brittleness for dynamic taxonomies; (iii) the development of sparse logical terms as a universal implementation for encoding that is supported by a theoretical and empirical analysis demonstrating their efficiency and flexibility. The second part explores our contributions to the application and extension of taxonomic reasoning in knowledge representation, logic programming, conceptual structures and ecological modeling. We formalize extensions to partial o rders that increase the ability of systems to express taxonomic knowledge. We develop a generalization of equality constraints among logic variables that induces a partial order among equivalence classes of variables. For graphic knowledge representation formalisms, we develop techniques for organizing the derived hierarchy among graphs in the knowledge base. Finally, we organize abstract models of landscapes in a taxonomy that provides a framework for systematically cataloging and analyzing landscape patterns.
[Fu, 1996]: Author(s): Yongjian Fu.

Title: . Discovery of multiple-level rules from large databases.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, July 1996.

Abstract: With the widespread computerization in business, government, and science, the efficient and effective discovery of interesting information from large databases becomes essential. Data mining or Knowledge Discovery in Database (KDD) emerges as a solution to the data analysis problems faced by many organizations. Previous studies on data mining have been focused on the discovery of knowledge at a single conceptual level, either at the primitive level or at a rather high conceptual level. However, it is often desirable to discover knowledge at multiple conceptual levels, which will provide a spectrum of understanding, from general to specific, for the underlying data. In this thesis, we first introduce the conceptual hierarchy, a hierarchical organization of the data in the databases. Two algorithms for dynamic adjustment of conceptual hierarchies are developed, as well as another algorithm for automatic generation of conceptual hierarchies for numerical attributes. In addition, a set of algorithms is developed for mining multiple-level characteristic, discriminant and association rules. All algorithms developed were implemented and tested in our data mining prototype system, DBMiner. The attribute-oriented induction method is extended to discover multiple-level characteristic and discriminant rules. A progressive deepening method is proposed for mining multiple-level association rules. Several variants of the method with different optimization techniques are implemented and tested. The results show the method is efficient and effective. Furthermore, a new approach to association rule mining, meta-rule guided mining, is proposed. The experiments show that meta-rule guided mining is powerful and efficient. Finally, an application of data mining techniques, cooperative query answering using multiple layered databases, is presented. Our study concludes that mining knowledge at multiple levels is both practical and desirable, and thus is an interesting research direction. Some future research problems are also discussed.
[Gupta, 1996]: Author(s): Sanjay Gupta.

Title: . A system for interfacing LIFE with database and persistent storage.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, November 1996.

Abstract: LIFE is a functional logic programming language extended with object-oriented concepts(sub-typing and inheritance). The objects in LIFE are extensible, complex, and partially ordered. LIFE can be viewed as a combination of functional, logical and imperative programming paradigms. The combination of these three different programming paradigms in LIFE provides powerful high-level expressions and facilitates specification of complex constraints on data-objects. Therefore it is ideally suited for applications in natural language processing, document-preparation, expert systems, and so on. These applications rely on large amounts of data and will require database technology for efficient storage and retrieval of data. Keeping this in mind, we extend LIFE with database interfaces for object-oriented and relational data. These interfaces are used to store LIFE facts and persistent terms. The reverse problem, conversion of relational data into LIFE as ψ-terms, has also been addressed in this thesis. We give an algorithm to generate LIFE facts from relational data. Efficacy of these approaches has been studied using real word problems arising in Geographic Information Systems and Information Retrieval Systems.
[Kabanets, 1996]: Author(s): V. Kabanets.

Title: . Recognizability equals definability for partial k-paths.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, June 1996.

Abstract: It is well-known that a language is recognizable iff it is definable in a monadic second-order logic. The same holds for sets of finite ranked trees (or finite unranked trees, in which case one must use a counting monadic second-order logic). Courcelle initiated research into the problem of definability vs. recognizability for finite graphs. Unlike the case of words and trees, recognizability does not equal definability for arbitrary families of graphs. Courcelle and others have shown that definability implies recognizability for partial k-trees (graphs of bounded tree-width), and conjectured that the converse also holds. The converse implication was proved for the cases of k=0, 1, 2, 3. It was also established for families of k-connected partial k-trees. In this thesis, we show that a recognizable family of partial k-paths (graphs of bounded path-width) is definable in a counting monadic second-order logic (CMS), thereby proving the equality of definability and recognizability for families of partial k-paths. This result is of both theoretical and practical significance. From the theoretical viewpoint, it establishes the equivalence of the algebraic and logical approaches to characterizing yet another recursively defined class of objects, that of partial k-paths. This also adds validity to Courcelle's conjecture on partial k-trees. >From the practical viewpoint, since a partial k-path is recognizable in linear time, it establishes that a problem on partial k-paths is solvable in linear time using a finite automaton iff this problem is definable in CMS.
[Mosny, 1996]: Author(s): Milan Mosny.

Title: . Semantic Information Preprocessing for Natural Language Interfaces to Databases.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, November 1996.

Abstract: A natural language interface to a database (NLID) needs both syntactic information about the structure of language and semantic information about what words and phrases mean with respect to the database. A semantic part of the NLID can implicitly or explicitly provide constraints on the input language. A parser can use these constraints to resolve ambiguities and to decrease overall response time. Our approach is to extract these constraints from the semantic description of the database domain and incorporate them semi-automatically or automatically into information directly accessible to the parser. The advantage of this approach is a greater degree of system modularity, which usually reduces complexity, reduces the number of possible errors in the system and makes it possible to develop different parts of the system concurrently by different persons and thus reducing development time. Also domain independent syntactic information can be reused from domain to domain, then customized according to the semantic information from a specific domain. To implement the idea, Abductive Equivalential Translation (AET) was chosen to describe the database related semantics. AET provides a formalism which describes how a ``literal'' logical form of an input sentence consisting of lexical predicates can be translated to a logical form consisting of predicates meaningful to the database engine. The information used in the translation process is a Linguistic Domain Theory (LDT) based on logic. We shall constrain the expressive power of LDT to suit tractability and efficiency requirements and introduce Restricted Linguistic Domain Theory (RLDT). The main step for incorporation of semantic constraints into the syntax formalism is then extensive preprocessing of the semantic information described by an RLDT into Normalized Linguistic Domain Theory (NLDT). The system uses NLDT to produce selectional restrictions that follow from the semantic description of a domain by RLDT. Selectional restrictions in general state which words can be immediately combined with which other words. Once NLDT is constructed, it can be used as a main source of information for semantic processing of a sentence. Thanks to the soundness and completeness of the normalization process, the designer of the interface has possibility to express the semantic knowledge in more declarative terms.
[Pimplapure, 1996]: Author(s): Ashish Pimplapure.

Title: . Virtual Groups: A Web Based Conferencing System for Online Education.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, July 1996.

Abstract: Collaborative learning and knowledge acquisition based on the needs of the learner have proven to be more effective than traditional learning which focuses on content and follows a rigid format. In the context of online education, collaborative learning requires a mechanism for enabling communication among the learners. Such a mechanism is usually established with the help of a conferencing system, which at a minimum supports textual messages and is preferably extendible to handle multimedia content. The conferencing system allows for the creation of ``virtual spaces'' which model the traditional classrooms, seminar rooms and even cafes for social interaction. Previous attempts at providing support for group communication have either focused on using conventional tools (such as mailing-lists or newsgroups) or developing conferencing systems which need special client software or protocols for access (e.g., FirstClass). These tools however, are not suitable for use on the Internet whose recent growth has made it a promising medium for delivering online education. Using the Internet for online education is already common and will become pervasive as Internet accessibility continues to increase. This thesis is an attempt to evaluate the feasibility of the world wide web as a medium for group communication in the context of online education. It presents the design and implementation of a web based conferencing system which uses the store-and-forward method for asynchronous communication. An important characteristic of online education systems is that they provide a mechanism for evaluating participation. Besides enabling group communication, the system also provides tools for quantitative analysis of communication (which allow the instructor to evaluate the performance of the learner) and supports interoperability with other online education tools which are being developed concurrently as part of the Virtual-U project at Simon Fraser University.

1995

[Anderson, 1995]: Author(s): Mark Charles Anderson.

Title: . Task-Oriented Lossy Compression of Magnetic Resonance Images.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, July 1995.

Abstract: Magnetic resonance tomography produces large quantities of three-dimensional medical image data. Data compression techniques can be used to improve the efficiency with which these images can be stored and transmitted, but in order to achieve significant compression gains, lossy compression techniques (which introduce distortion into the images) must be used. Conventional metrics of distortion do not measure the effect of this ``loss'' on tasks applied to the images. This thesis uses a new task-oriented image quality metric which measures the similarity between a radiologist's manual segmentation of brain lesions in raw (not compressed) magnetic resonance images and automated segmentations performed on raw and compressed images. To compress the images, a general wavelet-based lossy image compression technique, embedded zerotree coding, is used. A new compression system is designed and implemented which enhances the performance of the zerotree coder by using information about the location of important anatomical regions in the images, which are coded at different rates. Application of the new system to magnetic resonance images is shown to produce compression results superior to the conventional methods, with respect to the segmentation similarity metric.
[Barnard, 1995]: Author(s): Kobus Barnard.

Title: . Computational Color Constancy: Taking Theory into Practice.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1995.

Abstract: The light recorded by a camera is a function of the scene illumination, the reflective characteristics of the objects in the scene, and the camera sensors. The goal of color constancy is to separate the effect of the illumination from that of the reflectances. In this work, this takes the form of mapping images taken under an unknown light into images which are estimates of how the scene would appear under a fixed, known light. The research into color constancy has yielded a number of disparate theoretical results, but testing on image data is rare. The thrust of this work is to move towards a comprehensive algorithm which is applicable to image data. Necessary preparatory steps include measuring the illumination and reflectances expected in real scenes, and determining the camera response function. Next, a number of color constancy algorithms are implemented, with emphasis on the gamut mapping approach introduced by D. Forsyth and recently extended by G. Finlayson. These algorithms all assume that the color of the illumination does not vary across the scene. The results of these algorithms running on a variety of images, as well as on generated data, are presented. In addition, the possibility of using sensor sharpening to improve algorithm performance is investigated. The final part of this work deals with images of scenes where the illumination is not necessarily constant. A recent promising result from Finlayson, Funt, and Barnard demonstrates that if the illumination variation can be identified, it can be used as a powerful constraint. However, in its current form this algorithm requires human input and is limited to using a single such constraint. In this thesis the algorithm is first modified so that it provides conjunctive constraints with the other gamut mapping constraints and utilizes all available constraints due to illumination variation. Then a method to determine the variation in illumination from a properly segmented image is introduced. Finally the comprehensive algorithm is tested on simple images segmented with region growing. The results are very encouraging.
[Bawa, 1995]: Author(s): Sumeet Bawa.

Title: . Interactive Creation of Animations: An Optimization Approach to Real-time Inverse Kinematics.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 1995.

Abstract: An articulated figure, such as a human skeleton, can be described to a first approximation as a hierarchical collection of rigid segments connected at revolute joints. A wide spectrum of techniques has been tried for animating such figures. At one end of the spectrum is the interpolation of keyframes which are created by explicitly setting the joint angles. This technique gives the animator complete control over the animation, but the task of creating keyframes becomes very difficult and tedious as the number of degrees of freedom of the figure being animated increases. At the other end of the spectrum are procedural animation of specific movement patterns and simulation of models of dynamic behavior of the figures. The former is limited in its application while the latter is intractable, and provides very limited control over the animation. If keyframe creation can be automated or expedited, it can very effectively aid the task of making animations while still leaving full control with the animator. This can be achieved by automating the inverse kinematic calculation of the joint angle values required to make the figure attain a specific posture. This thesis presents the development and implementation of a system which uses an iterative nonlinear constrained optimization algorithm for solving the problem of inverse kinematics in the presence of constraints on the figure. A function of the difference in the current and desired posture of the figure, called the objective function, is minimized to allow direct manipulation of the articulated figure. The algorithm displays super-linear convergence and allows interactive manipulation of complex figures. Physical integrity of the figure is maintained and all the constraints imposed on the figure are satisfied at all times. The algorithm requires first and second order partial derivatives of the objective function and constraints, which for a general articulated figure, requires symbolic computation. Algorithms for efficient procedural evaluation of these quantities for any tree structured planar figure are developed. Each iteration of the optimization algorithm requires inversion and recalculation of matrices. Techniques developed in the field of numerical optimization, which have not been previously used for articulated figure animation, are employed to factorize various matrices. Availability of factors allows efficient inversion and recalculation of the matrices involved. Previous attempts at direct manipulation of articulated figures have been limited to manipulating one part of the figure at a time to produce the keyframes, which are analogous to snapshots of the complete movement. Our approach extends the concept to allow manipulation of multiple parts of the figure simultaneously by associating a trajectory with each part. The use of the trajectories permits development of complete movement sequences.
[Belleville, 1995]: Author(s): Patrice Belleville.

Title: . A study of convex covers in two or more dimensions.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, October 1995.

Abstract: The problem of covering polytopes using simple shapes is central to computational geometry. In particular, a lot of attention has been given to the problem of covering a simple polytope by convex pieces. We call a polytope U_k if there is a collection of k convex sets whose union is that polytope, and B_k if there is a collection of k convex subsets of the polytope whose union contains its boundary. This thesis studies several aspects of the recognition problem for B_k and U_k simple polytopes. We first give linear time algorithms to recognize B₃ and U₃ simple polygons. These algorithms are developed in three stages. In the first stage, we consider the convex subpolygons of a simple polygon P whose intersection with the boundary of P is a subset of a given set of m non-overlapping intervals. We show how to recognize the polygons whose boundary can be covered by k such subpolygons in O(k³m^2k-2 + T_M(km^k-1)) time, where T_M(n) is the time required to multiply two n times n matrices (currently known to be in o(n^2.376)). In the second stage, we characterize B₃ polygons by proving that the boundary of every B₃ polygon P can always be covered using a restricted class of convex subpolygons of P; this reduces the problem of recognizing B₃ polygons to that solved in the first stage. Finally, we show how to prune almost all of the covers that this algorithm considers to recognize B₃ and U₃ polygons in linear time. We then study U₂ polytopes in three-dimensional space (they are the same as B₂ polytopes). We prove that they can be recognized in O(n log n) time using O(n) space. We also show how to extend this algorithm to recognize U₂ polytopes in d -dimensional space in polynomial time, for every fixed value of d. Finally we present a negative result: we prove that the recognition problem for B_k or U_k polytopes in d-dimensional space is NP-hard for each fixed d ge 3 and k ge 3.
[Bruderlin, 1995]: Author(s): Armin Bruderlin.

Title: . Procedural Motion Control Techniques for Interactive Animation of Human Figures.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, March 1995.

Abstract: The animation of articulated models such as human figures poses a major challenge because of the many degrees of freedom involved and the range of possible human movement. Traditional techniques such as keyframing only provide support at a low level — the animator tediously specifies and controls motion in terms of joint angles or coordinates for each degree of freedom. We propose the use of higher level techniques called procedural control, where knowledge about particular motions or motion processing aspects are directly incorporated into the control algorithm. Compared to existing higher level techniques which often produce bland, expressionless motion and suffer from lack of interactivity, our procedural tools generate convincing animations and allow the animator to interactively fine-tune the motion through high-level parameters. In this way, the creative control over the motion stays with the animator. Two types of procedural control are introduced: motion generation and motion modification tools. To demonstrate the power of procedural motion generation, we have developed a system based on knowledge of human walking and running to animate a variety of human locomotion styles in real-time while the animator interactively controls the motion via high level parameters such as step length, velocity and stride width. This immediate feedback lends itself well to customizing locomotions of particular style and personality as well as to interactively navigating human-like figures in virtual applications and games. In order to show the usefulness of procedural motion modification, several techniques from the image and signal processing domain have been ``proceduralized'' and applied to motion parameters. We have successfully implemented motion multiresolution filtering, multitarget motion interpolation with dynamic timewarping, waveshaping and motion displacement mapping. By applying motion signal processing to many or all degrees of freedom of an articulated figure at the same time, a higher level of control is achieved in which existing motion can be modified in ways that would be difficult to accomplish with conventional methods. Motion signal processing is therefore complementary to keyframing and motion capture.
[Finlayson, 1995]: Author(s): Graham David Finlayson.

Title: . Coefficient Color Constancy.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, April 1995.

Abstract: The goal of color constancy is to take the color responses (for example camera rgb triplets) of surfaces viewed under a unknown illuminant and map them to illuminant independent descriptors. In existing theories this mapping is either a general linear 3x3 matrix or a simple diagonal matrix of scaling coefficients. The general theories have the advantage that the illuminant can be accurately discounted but have the disadvantage that nine parameters must be recovered. Conversely while the coefficient theories have only three unknowns, a diagonal matrix may only partially discount the illuminant. My staring point in this thesis is to generalize the coefficient approach; the goal is to retain its inherent simplicity while at the same time increasing its expressive power. Under the generalized coefficient scheme, I propose that a visual system transforms responses to a new sensor basis before applying the scaling coefficients. I present methods for choosing the best coefficient basis for a variety of statistical models of color responses. These models are rich enough that the generalized coefficient approach suffices for almost all possible sensor sets. To achieve color constancy the correct coefficients must be recovered. Existing algorithms can do so only when strong constraints are satisfied. For example it is often assumed that there is a white reflectance in every scene. In the second part of my thesis, I develop a new coefficient algorithm, which I call color-in perspective, based on very weak (and very reasonable) assumptions about the world. I assume only that the range of color responses induced by different reflectances varies with a change in illumination and that illumination itself can vary only within certain bounds. I tested the algorithm on real images taken with a color video camera—extremely good constancy is delivered. Indeed the degree of constancy compares favorably with the best which is theoretically possible. The methods developed in this thesis can be applied to a variety of other areas including color graphics, color reproduction and color appearance models.
[Gaur, 1995]: Author(s): Daya Ram Gaur.

Title: . Algorithmic Complexity of some Constraint Satisfaction Problems.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 1995.

Abstract: Constraint networks are a simple knowledge representation model, useful for describing a large class of problems in planning, scheduling and temporal reasoning. A constraint network is called decomposable if any partial solution can be extended to a global solution. A constraint network is called minimal if every allowed 2-tuple of assignments can be extended to a global solution. Much of the existing research in the field has been aimed at identifying restrictions on constraint networks such that the resulting network is minimal or decomposable or both. In this thesis we will examine issues related to minimal networks. We will address the complexity issues related to minimal networks. We will show that determining whether a given constraint network is minimal is NP-Complete. We also show that given a 2-tuple finding a solution which contains this edge is NP-Complete in a minimal network. We show that there exists a greedy algorithm for finding a solution to a subclass of minimal network. The recognition problem for this class is of the same complexity as the recognition problem for decomposable networks. We use a result of Feige and Lovasz to show that there exists another class of constraint satisfaction problems for which determining the satisfiability is polynomial. The recognition problem for this class is also NP-Complete. Next we address the weighted constraint satisfaction problem. In the weighted constraint satisfaction problem, associated with each assignment is a cost. The goal is to find a consistent assignment with minimum cost. We will show that for minimal graphs and 0/1 weights the weighted constraint satisfaction problem is NP-Complete.
[Gemmell, 1995]: Author(s): D. James Gemmell.

Title: . Support For Continuous Media in File Servers.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, April 1995.

Abstract: Continuous media (CM), such as audio and video fundamentally differ from traditional text and numeric data in that they have large data transfer rate and storage space requirements, and real time deadlines must be met during their storage and retrieval. This thesis covers the issues involved in designing file servers that support continuous media. A rigorous model of the real time requirements is presented, and lower bounds are demonstrated for parameters such as buffer space. The sorting set disk scheduling algorithm is proposed, which balances disk latency reduction against successive read latencies. The sorting set approach has the advantage of being a generalization of previous approaches, as well as yielding improved performance in certain cases. Conventional approaches to file system implementation are analysed for suitabiolity in CM file servers, and new, or hybrid, solutions are proposed.
[Gourley, 1995]: Author(s): René Gourley.

Title: . Tensor Representations and Harmony Theory: A Critical Analysis.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, July 1995.

Abstract: Harmony theory and tensor representations have been proposed as a means by which connectionist models can accept formal languages. Their proponents aim to provide a neural explanation of the productivity and systematicity of cognitive processes, without directly implementing symbolic algorithms. Via tensor representation, this theory interprets the activation vector of a connectionist system as a parse tree for a string in a particular context free language. Harmony theory apparently describes how to construct a network whose stable equilibria represent valid parse trees. This thesis presents a detailed analysis of tensor representations and harmony theory. Over the course of this exposition, errors in the original formulation are identified and improvements are proposed. The thesis then goes on to examine some major issues confronting harmony theory. The first issue is that of input and output which have not been satisfactorily defined by harmony theorists. Secondly, we examine the very large size of the networks. Finally, this thesis inspects harmony theory relative to its own goals and shows that the constructed networks admit stable equilibria that do not represent valid parse trees. Thus, harmony theory is unable to support its advocates' bold claims.
[Gupta, 1995]: Author(s): Ranabir Gupta.

Title: . Modeling Changes in Information Systems.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, November 1995.

Abstract: Databases are increasingly viewed as critical components in design and planning applications, where changes are integral to the domain. Traditional data models, however, enable the description of snapshots of a domain at particular points in time. Recent research has produced data models aimed at storing a series of snapshots, but reasoning with changes and with the interrelationships of objects as they change, typically requires means external to the database. This approach is not suitable for real-world applications where the relationships among objects in different dynamic states is of primary importance, as opposed to the static relationships at a particular point in time. Such applications demand data models to capture the integrity constraints associated with change histories. This thesis investigates the use of techniques from formal languages to build models of dynamic information systems. Dynamic information systems model the structure and behavior of objects as their existence, observable properties, and states evolve over time. We propose a user interface language to model histories as first-class entities, and a computational formalism to reason with these models. Data structures are then proposed to aid reasoning with sets of histories. Extending dynamic information systems, we consider the case when the needs and world-view of an observer change over time, requiring changes to the representation of the domain (its schema) from her perspective. We thus propose the concept of adaptive information systems to reason with representations as an observers? needs evolve over a period of time.
[Larman, 1995]: Author(s): Craig Larman.

Title: . Learning from knowledge systems.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, ?? 1995.
[Mackiewich, 1995]: Author(s): Blair Mackiewich.

Title: . Intracranial Boundary Detection and Radio Frequency Correction in Magnetic Resonance Images.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1995.

Abstract: Magnetic resonance imaging (MRI) is a noninvasive method for producing three-dimensional tomographic images of the human body. MRI is most often used for the detection of tumors, lesions, and other abnormalities in soft tissues, such as the brain. Several techniques for automatically segmenting brain tissues in MRI scans of the head have recently been developed. One goal of segmentation is to automatically or semi-automatically detect lesions in the brains of multiple sclerosis patients. The number and size of lesions indicate the progression of the disease in the patient. Therefore, automatic lesion detection may significantly aid in the analysis of treatments. Segmentation is problematic due to radio frequency inhomogeneity (image intensity variation) caused by inaccuracies in the magnetic resonance scanner and by nonuniform loading of the scanner coils by the patient. The segmentation algorithms also have difficulty dealing with tissues outside the brain, such as skin, fat, and bone. Consequently, intensity correction and the removal of non-brain tissues are mandatory for successful automatic segmentation. A new method for automatic intracranial boundary detection and radio frequency correction in MRI is described in this thesis. The intracranial boundary detection method isolates the brain using nonlinear anisotropic diffusion. It then uses active contour models to find the brain's edge. The radio frequency correction technique employs a fast homomorphic filter to reduce low-frequency intensity variation in voxels within the brain. The new intracranial boundary detection method proved effective on five MRI data sets from two different MRI scanners. The radio frequency correction technique reduced intensity variation due to radio frequency inhomogeneity in the three MRI data sets on which it was tested.
[Peters, 1995]: Author(s): David B. Peters.

Title: . Bounds for Communication Problems in Interconnection Networks under a Linear Cost Model.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, December 1995.

Abstract: In the analysis of communications under the assumption of linear cost and in the context of a distributed memory interconnection network, many authors have noted that a good method for sending a message of length n along a path of length m is to divide the message into some collection of equal or nearly equal packets, and pipeline the communication. We formalize and prove that this notion is optimal in the context of an interconnection network using store and forward communications under the linear cost model. Armed with this proof technique and bounds, we attack problems of broadcasting and gossiping in the contexts of a ring of processors and a complete interconnection of processors. We acquire a variety of new upper bounds, and several lower bounds to match.
[Tong, 1995]: Author(s): Frank C. H. Tong.

Title: . Reciprocal-Wedge Transform: A Space-variant Image Representation.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, August 1995.

Abstract: The problems in computer vision have traditionally been approached as recovery problems. In active vision, perception is viewed as an active process of exploratory, probing and searching activities rather than a passive re-construction of the physical world. To facilitate effective interaction with the environment, a foveate sensor coupled with fast and precise gaze control mechanism becomes essential for active data acquisition. In this thesis, the Reciprocal-Wedge Transform (RWT) is proposed as a space-variant image model. The RWT has its merits in comparison with other alternative foveate sensing models such as the log-polar transform. The concise matrix representation makes it enviable for its simplified computation procedures. Similar to the log-polar transform, the RWT facilitates space-variant sensing which enables effective use of variable-resolution data and the reduction of the total amount of the sensory data. Most interestingly, its property of anisotropic mapping yields variable resolution primarily in one dimension. Consequently, the RWT preserves linear features and performs especially well on translations in the images. A projective model is developed for the transform, lending it to potential hardware implementation of RWT projection cameras. The CCD camera for the log-polar transform requires sensing elements of exponentially varying sizes. In contrast, the RWT camera achieves variable resolution with oblique image plane projection, thus alleviating the need for non-rectangular tessellation and sensitivity scaling on the sensing elements. A camera model making use of the available lens design techniques is investigated. The RWT is applied to motion analysis and active stereo to illustrate the effectiveness of the image model. In motion analysis, two types of motion stereo are investigated, namely, longitudinal and lateral motion stereo. RWT motion stereo algorithms are developed for linear and circular ego motions in road navigation, and depth recovery from moving parts on an assembly belt. The algorithms benefit from the perspective correction, linear feature preservation and efficient data reduction of the RWT. The RWT imaging model is also shown to be suitable for fixation control in active stereo. Vergence and versional eye movements and scanpath behaviors are studied. A computational interpretation of stereo fusion in relation to disparity limit in space-variant imagery leads to the development of a computational model for binocular fixation. The unique oculomotor movements for binocular fixation observed in human system appears natural to space-variant sensing. The vergence-version movement sequence is implemented for an effective fixation mechanism in RWT imaging. An interactive fixation system is simulated to show the various modules of camera control, vergence and version. Compared to the traditional reconstructionist approach, active behavior is shown to be plausible.
[Vodarek, 1995]: Author(s): George Vodarek.

Title: . VSAM: a Simulator-based Debugger and Performance Analysis Tool for SAM.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1995.

Abstract: This thesis describes a virtual simulator-based software debugging and performance analysis system (VSAM) for the Structured Architecture Machine (SAM). SAM is a distributed-function multiprocessor computer designed to execute APL efficiently. The purpose of VSAM is to help researchers investigate the behavior of the SAM architecture and to support the exploration of alternative designs. Object-oriented techniques are used to represent the hierarchical structure of the hardware thereby facilitating instrumentation and modification of the architecture. VSAM is implemented in C++ under OS/2 and utilizes multi-tasking extensively. The core of VSAM is a behavioral simulator of SAM. The simulator is a faithful functional model of SAM down to the register/bus component level. A full-featured debugger interface is provided for each processor. The debugger includes novel features for dealing with multiple processors, functional units, and data presentation. VSAM also provides a general instrumentation facility which uses OS/2 pipes to connect sensors embedded in the simulator to display windows. The simulator design is discussed in detail and presented in the context of alternative simulation techniques and other microprocessor simulators. The use of VSAM is demonstrated on SAM benchmarks and the results are discussed.
[Xie, 1995]: Author(s): Zhaohui Xie.

Title: . Query Evaluation in Deductive and Object-Oriented Databases.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, January 1995.

Abstract: The development of deductive and object-oriented database pap(DOOD) systems by integration of the object-oriented paradigm with the deductive paradigm represents a promising direction in the construction of the next generation of database systems. This thesis addresses the issues of query evaluation in DOOD systems and presents some promising approaches to the problems in DOOD query evaluation. First, a DOOD data model and its query language are presented to demonstrate the salient features supported by DOOD and to serve as research vehicles for the investigation of DOOD query evaluation. After a comparative survey on query evaluation methods for relational databases, deductive databases and object-oriented databases, the impact of DOOD models and languages on query evaluation is discussed. A list of open and not well-solved problems is identified as a research guide towards DOOD query evaluation, which in turn motivates the research efforts presented in this thesis. An efficient navigation structure, called the join index hierarchy, is proposed to handle the problem of ``pointer-chasing'' or ``gotos' on disks'' in exploring logical relationships among complex objects. Effective optimization strategies are introduced to employ the constraint conditions expressed in the form of complex selection and join conditions for efficient set-oriented navigations and to exploit the common navigations among a query and encapsulated methods for efficient query evaluation. The query-independent compilation and chain-based evaluation, developed for deductive recursive query evaluation, are extended to process a class of DOOD recursions.

1994

[Cukierman, 1994]: Author(s): Diana Cukierman.

Title: . Formalizing the Temporal Domain with Hierarchical Structures of Time Units.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1994.

Abstract: We investigate a formal representation of calendars and time units as restricted temporal entities for reasoning about activities. Calendars can be considered as repetitive, cyclic temporal objects. We examine the characteristics of time units as particular classes of time intervals, and provide a categorization of relations among them. The motivation for this work is to ultimately be able to reason about schedulable, repeated activities, such as going to a specific class every Tuesday and Thursday during a semester. Calendar Structures are defined as an abstract hierarchical structure of time units. We investigate the structural and mathematical properties of this framework and the specific relations among the units that compose it. We propose this formal apparatus as a system of measures which subsumes calendars and any other system that can be based on discrete units and a repetitive containment relation. One of the abstractions introduced in this thesis is that of relations among classes of intervals. This is an expansion of the interval algebra framework defined by J. F. Allen and used throughout temporal reasoning, particularly in scheduling applications.
[Ding, 1994]: Author(s): Jianbin Ding.

Title: . A Channel Allocation Scheme for Multimedia On-demand Systems.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, February 1994.

Abstract: Rapid advances in high speed networks have fueled the development of multimedia on-demand services. However, because of the high bandwidth and low delay requirements of such services, conventional networks cannot provide a guaranteed quality-of-service (QOS). The thesis proposes a reservation-based channel allocation scheme and an associated control mechanism that provide the requested QOS. In general, there are two important QOS requirements for multimedia on-demand applications: deadlines for frame delivery and initial wait time. In our model, we consider observing deadlines as a necessity and initial wait time as of only secondary importance. Given this requirement and the available buffer space at the destination, we study the buffer underflow and overflow problems for both fixed-size and variable-size media, which enables us to calculate the minimum and maximum transfer rates. Based on this range, our channel allocation scheme reserves bandwidth in the form of uniformly distributed time slots. The resulting system satisfies both bandwidth and real-time requirements of multimedia on-demand applications, and also simplifies the channel establishing procedure. The associated node controller, transmission scheduler, and other related problems are also discussed.
[Dorner, 1994]: Author(s): Brigitte Dorner.

Title: . Chasing the Colour Glove: Visual Hand Tracking.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, June 1994.

Abstract: I present a visual hand tracking system that can recover 3D hand shape and motion from a stream of 2D input images. The hand tracker was originally intended as part of a computer interface for (American) sign language signers, but the system may also serve as a general purpose hand tracking tool. In contrast to some previous 2D-to-sign approaches, I am taking the 3-dimensional nature of the signing process into account. My main objective was to create a versatile hand model and to design an algorithm that uses this model in an effective way to recover the 3D motion of the hand and fingers from 2D clues. The 2D clues are provided by colour-coded markers on the finger joints. The system then finds the 3D shape and motion of the hand by fitting a simple skeleton-like model to the joint locations found in the image. This fitting is done using a nonlinear, continuous optimization approach that gradually adjusts the pose of the model until correspondence with the image is reached. My present implementation of the tracker does not work in real time. However, it should be possible to achieve at least slow real-time tracking with appropriate hardware (a board for real-time image-capturing and colour-marker detection) and some code optimization. Such an `upgraded' version of the tracker might serve as a prototype for a `colour glove' package providing a cheap and comfortable—though maybe less powerful—alternative to the data glove.
[Dykstra, 1994]: Author(s): Christine J. Dykstra.

Title: . 3D Contiguous Volume Analysis for Functional Imaging.

Ph.D. Thesis, Simon Fraser University, April 1994.

Abstract: To maximize the information gained from 3D medical imaging, some form of image analysis is required. In Positron Emission Tomography and Single Photon Emission Computed Tomography, analysis methods tend either to focus on a few regions of interest or to rely on registration to anatomical data. The first approach may miss information which falls outside the predefined regions and the second could introduce registration error. In the new analysis method presented here, attention is focused on the characteristics of functional data: the solid volumes of high activity, and their positions relative to each other and to the rest of the image. For this work, the entire image is used and nothing else. This method of functional image analysis finds the contiguous volumes, in each hemisphere, of all voxels with values above a threshold, and tracks them as the threshold is decreased and the volumes merge together. The technique has been found to be fast, reliable and unbiased. Application of the method to 53 F¹⁸ fluorodeoxyglucose scans of normals has found a consistent spatial pattern of regions of high activity. Output from it provides an easy way to analyze disease images and compare them to normals. Large data sets can be quickly analyzed and the results viewed easily. When applied to several diseases, the results showed significant differences from normals. We have found that the method provides a quick and reliable way of extracting important features from functional data and comparing them to other images.
[Huang, 1994]: Author(s): Charlie Chengyu Huang.

Title: . Query Optimizations for a Spatiotemporal Query Processing System.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, april 1994.

Abstract: Database front-end systems and Application Specific Query Languages are commonly used for various applications to best fit in users's special needs and performance requirements. A front-end system, the VPD system, is presented in this thesis. It is created for the Vancouver Police Department to query and display the crime distribution in the city of Vancouver. Determined by its usage, the VPD system mainly features spatiotemporal queries. The query language proposed for the VPD system is called ESQL (Extended SQL). This thesis concentrates on query optimizations for the VPD system. The VPD system is implemented on both Relational DBMS Sybase and Object-oriented DBMS Objectstore. Query performances on these two platforms are compared and factors affecting the performances are found out through a series of systematic experiments and penetrating analysis. Certain techniques are applied to improve the VPD system's performance for processing general queries on Objectstore DBMS. In order to further enhance the speed of processing spatiotemporal queries, a multi-key index tree, the KDB+-tree is proposed. Based on the KDB+-tree indexing mechanism, the Spatiotemporal Query Processor (STQP) is introduced to the VPD system. The STQP can process any ESQL query yet favors spatiotemporal queries. It makes use of all the query optimization techniques proposed in the thesis and provides efficient query processing for the VPD system.
[Kolotyluk, 1994]: Author(s): Eric Kolotyluk.

Title: . Using X.500 to Facilitate the Creation of Information Systems Federations.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, June 1994.

Abstract: X.500 is the international standard for a world-wide automated directory system, the Directory, which enables people and automated systems to search for information such as people, places, systems, services, etc. However, much of the information that is expected to be found in the directory already exists in corporate and institutional databases as well as other information sources. In the last ten years or so, computing experts have begun creating federations of information systems in order to give us easier access to a variety of information sources. These range from simple but powerful network grazers like Archie and Veronica to more sophisticated heterogeneous database projects like Interbase, Myriad, ORECOM and others. The issues of interfacing the X.500 Directory with existing information sources are explored-in effect, the X.500 Directory is seen as a federation manager of information systems. In particular, the integration of X.500 methods with relational database technology is studied in the context of what is possible and what makes sense. To test these ideas an existing Directory Service Agent has been modified to access a commercial relational database management system, allowing a directory administrator to map the tuples of a relation into the entries of the directory. Results from this effort have revealed challenges not only in the conceptual design of such a system, such as information schema translation, but also in the pragmatics of system design, such as the registration of external information sources within the Directory itself. Finally, this thesis speculates on the ultimate scope of X.500 as an information systems federation manager, where the global information network could be headed and what we might see or invent along the way.
[Ng, 1994]: Author(s): Vincent Ng.

Title: . Concurrent Access to Spatial Data.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, August 1994.

Abstract: Spatial data consist of points, lines, rectangles, polygons, and volumes, etc. The structure by means of which to organize and access such data is important to the performance of database systems which support applications in computer vision, computer-aided design, solid modeling, geographic information system and computational geometry, etc. In the past, little work has been done in developing concurrency control algorithms to access spatial data. The thesis investigates concurrent operations on spatial index structures. We build our work on existing spatial data structures and concurrency control algorithms. Some of the concurrency control algorithms we designed has been implemented to demonstrate their effectiveness. We first study known index structures for point data, B+-tree, R-tree, and K-D-B tree. To overcome some drawbacks of the existing index structures, we propose a new index structure, called quad-B tree, which combines the advantages of the B-tree and the quadtree. We control concurrent access to these index structures, using the lock-coupling or link technique. We then study two index structures for rectangular data, R-tree and quad-R tree. We discuss different approaches to supporting concurrent operations on an R-tree, namely, the simple, lock-modify, lock-coupling, give-up and the link approaches. We compare the search performance of the first three approaches based on their implementations. The link technique is adopted for the R-tree to support recovery after system failures. Finally, we use the quad-R tree to solve the problem of ordering amongst spatial objects.
[O'Neill, 1994]: Author(s): Melissa Elizabeth O'Neill.

Title: . A Data Structure for More Efficient Runtime Support of Truly Functional Arrays.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 1994.

Abstract: Functional languages often neglect the array construct because it is hard to implement nicely in a functional language. When it is considered, it is usually from the context of converting imperative algorithms into functional programs, rather than from a truly functional perspective. In an imperative language, changes to an array are done by modifying array elements, destroying their original values. Unless special measures are taken to support destructive update, an array update in a functional language must produce a new array, without destroying the old one. Since most other functional data structures do not support destructive update, it would be desirable not to have to make a special case of arrays, especially since many kinds of algorithms (backtracking algorithms being a simple example) may find having multiple versions of a data-structure useful. Our goal is to be able to provide a functional array interface where array operations are reasonably cheap. We will look existing techniques that have been used in the past to address this problem area, and then present a new runtime technique that offers very good all-round performance, and can be used where other array mechanisms fail.
[Spencer, 1994]: Author(s): Curtis C. Spencer.

Title: . Circuit-Switched Structured Communications on Toroidal Meshes.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, February 1994.

Abstract: Standard communication patterns may be grouped into two broad classifications, information disseminations or `to-all' operations, and information permutations or `one-to-one' operations. Information collections or `from-all' operations are grouped with disseminations since they are simply the inverse operations. In this thesis, we develop algorithms for each of the data movement patterns within both classifications on cycles, and 2- and 3-dimensional toroidal meshes. Our algorithms take advantage of a multiple port model with circuit- switched routing and virtual channels. A linear cost model is employed in our analysis of these algorithms which takes into consideration start-up, switching and propagation costs. The operations in the information dissemination classification that we develop algorithms for are: broadcasting, scattering (gathering), gossiping and multi-scattering. Those which we develop algorithms for in the information permutation classification include: all the global 1-, 2- and 3-dimensional permutations (e.g. reflections, rotations, translations, and transpositions), as well as several translation-based permutations. In addition, lower bounds to these problems are set forth and a comparison is made between the multiple port algorithms and their single port counterparts. The techniques used to perform the dissemination operations are based upon those which were developed for use with the one-port model. The technique used for the permutation operations is based on breaking each transformation down into simpler transformations. We show that 1-dimensional transpositions and translations can be efficiently combined or used to perform all global permutations as well as each of the special case permutations
[Steiner, 1994]: Author(s): Thomas W. Steiner.

Title: . Leaf Locking Database Concurrency Control.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1994.

Abstract: A new database concurrency control mechanism based on locking the leaves of a binary tree is proposed. This is a modification of tree locking with all the data items in leaf nodes and the interior nodes of the tree used only for concurrency control. It is shown that this technique has greater possible concurrency than ordinary tree locking. Furthermore, concurrency is greater than that exhibited by the two-phase locking technique. Simulation results on the least favourable workload indicate that leaf locking results in a 30 increase in transaction throughput as compared to two-phase locking.
[Strickland, 1994]: Author(s): James Adrian Strickland.

Title: . An efficient randomized algorithm for truck scheduling.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, March 1994.

Abstract: Many truck scheduling problems are computationally intensive. Furthermore, the behaviour of sites to be serviced by the truck is often unpredictable, making it impossible to define an ``exact'' solution to the problem. This thesis examines one such practical scheduling problem. The problem is shown to be difficult to solve. Possible approaches to solving the problem are discussed, followed by the description of a randomized heuristic algorithm which was developed to efficiently solve the problem. The implementation of the algorithm in a simulation program is described, followed by the simulation results which support the claims made about the algorithm. A powerful ``Algorithm Performance Visualization Tool'' is then described. This tool was constructed to allow the algorithm development to proceed more quickly. With minor modifications it could be used to aid in the development of algorithms for other vehicle scheduling problems.
[Teo, 1994]: Author(s): Chor Guan Teo.

Title: . A Knowledge-Based Procedural Approach to the Animation of Human Hand Grasping.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, March 1994.

Abstract: Although computer animation of articulated figures has been the focus of extensive research in computer graphics, the study of the animation of hand grasping has been rather limited. However, hand grasping has been extensively studied in the fields of kinesiology and robotics. Traditionally, researchers in the field of robotics have used analytical methods to solve the problem of hand grasping. In recent years, a knowledge-based approach, which uses information obtained from motor control studies, has increasingly gained acceptance. Studies in kinesiology have shown that humans tend to use a pre-determined set of hand configurations. This makes the search for a suitable grasp posture tractable. Thus, it is possible to use the information obtained from these two fields in the development of an approach to the animation of human hand grasping. In this thesis, the main objective is to investigate and develop tools to support the modelling and animation of human hand grasping. A hybrid procedural knowledge-based approach is used to construct an animation system, which will serve as the platform for determining the effectiveness of these tools.
[Ueberla, 1994]: Author(s): Joerg Peter Ueberla.

Title: . Analyzing and Improving Statistical Language Models for Speech Recognition.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, May 1994.

Abstract: A speech recognizer is a device that translates speech into text. Many current speech recognizers contain two components, an acoustic model and a statistical language model. The acoustic model indicates how likely it is that a certain word corresponds to a part of the acoustic signal (e.g. the speech). The statistical language model indicates how likely it is that a certain word will be spoken next, given the words recognized so far. Even though the acoustic model might for example not be able to decide between the acoustically similar words ``peach'' and ``teach'', the statistical language model can indicate that the word ``peach'' is more likely if the previously recognized words are ``He ate the''. Current speech recognizers perform well on constrained tasks, but the goal of continuous, speaker independent speech recognition in potentially noisy environments with a very large vocabulary has not been reached so far. How can statistical language models be improved so that more complex tasks can be tackled? This is the question addressed in this thesis. Since the knowledge of the weaknesses of any theory often makes improving the theory easier, the central idea of this thesis is to analyze the weaknesses of existing statistical language models in order to subsequently improve them. To that end, we formally define a weakness of a statistical language model in terms of the logarithm of the total probability, LTP, a term closely related to the standard perplexity measure used to evaluate statistical language models. This definition is applicable to many probabilistic models, including almost all of the currently used statistical language models. We apply our definition of a weakness to a frequently used statistical language model, called a bi-pos model. This results, for example, in a new modeling of unknown words which improves the performance of the model by 14 to 21 percent. Moreover, one of the identified weaknesses has prompted the development of our generalized N-pos language model, which is also outlined in this thesis. It can incorporate linguistic knowledge even if it extends over many words and this is not feasible in a traditional N-pos model. This leads to a discussion of what knowledge should be added to statistical language models in general and we give criteria for selecting potentially useful knowledge. These results show the usefulness of both our definition of a weakness and of performing an analysis of weaknesses of statistical language models in general.
[Zhan, 1994]: Author(s): Jibin Zhan.

Title: . Object-oriented Query Language Specification and Query Processor Generation.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, June 1994.

Abstract: In this thesis, we proposed a systematic approach for query language customization which can lead to Customized Query Languages (CQL) and different query front ends on new generation database systems in the heterogeneous database environment. Our approach is based on a core Object-Oriented data model that is derived from the data models of Object Management Group (OMG, 1990), Object Database Management Group (ODMG-93) and some other Object-Oriented and Extended Relational Database systems, like O2, Orion, Postgres, and SQL3. Object-Oriented methodology is used in designing and presenting the data model and query model, i.e. all the basic components in the data model and query model are abstracted into meta-objects and meta-types, for which some basic characteristics and operations are defined. Special default constructors for all the meta-types are provided to represent both the default semantics and syntactic appearance of the corresponding components in out default Object Definition Language (ODL) and Object Query Language (OQL). Query Language specifiers can provide their own ``constructors'' for these meta-types to override the default ones in order to tailor the syntactic appearance and/or semantics. New components and operations corresponding to some new components in the specified query languages can also be defined in a similar way as the new types and functions defined in C++. Following this approach, a non-procedural specification language is proposed which leads to automatic generation of query language specific processor in LEX and YACC.

1993

[Ammann, 1993]: Author(s): Manuel M. Ammann.

Title: . Towards Program Manipulation in-the-Large.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, October 1993.

Abstract: Program manipulation is concerned with modifying source code in a systematic, language-based, and conceivably highly automated fashion. Program manipulation concepts have a considerable potential of becoming the basis of new, high-level development and maintenance programming techniques. Program manipulation in-the-large extends the capabilities of program manipulation to a collection of program modules, thus allowing for high-level modifications with a scope larger than one module. With the increasing number of large, structured software projects, in-the-large capabilities of manipulation tools are becoming crucial for most applications.Our work in program manipulation focuses on practical software tools that perform language-based source code manipulations at a very high level and in-the-large. We demonstrate the concept of program manipulation in-the-large with the examples of automated inter-module renaming, project reorganizing, and abstraction modification. Prototype tools have been developed for the Modula-3 environment to automate the renaming of identifiers and to move program entities between modules.In order to find clues on how to reorganize a project, we propose four static software metrics that indicate whether a project consisting of a set of program modules is well-structured. The metrics are based on the principle of vocabulary hiding and have been implemented for target projects written in Modula-3. Tests suggest the proposed metrics are useful to discover poorly structured project modules.
[Barney, 1993]: Author(s): Jeffrey Scott Barney.

Title: . Scatter Correction for Positron Volume Imaging.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 1993.
[Bremner, 1993]: Author(s): David Bremner.

Title: . Point Visibility Graphs and Restricted-Orientation Polygon Covering.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 1993.

Abstract: A visibility relation can be viewed as a graph: the uncountable graph of a visibility relationship between points in a polygon P is called the point visibility graph (PVG) of P. In this work we explore the use of perfect graphs to characterize tractable subproblems of visibility problems. Our main result is a characterization of which polygons are guaranteed to have weakly triangulated PVGs, under a generalized notion of visibility called Ø-visibility. Let Ø denote a set of line orientations. Rawlins and Wood call a set P of points Ø-convex if the intersection of P with any line whose orientation belongs to Ø is either empty or connected; they call a set of points Ø-concave if it is not Ø-convex. Two points are said to be Ø-visible if there is an Ø-convex path between them. A polygon is Ø-starshaped if there is a point from which the entire polygon is Ø-visible. Let Ø' be the set of orientations of minimal Ø-concave portions of the boundary of P. Our characterization of which polygons have weakly-triangulated PVGs is based on restricting the cardinality and span of Ø'. This characterization allows us to exhibit a class of polygons admitting an O(n⁸) algorithm for Ø-convex cover. We also show that for any finite cardinality Ø, Ø-convex cover and Ø-star cover are in NP, and have polynomial time algorithms for any fixed covering number. Our results imply previous results for the special case of Ø= set 0,90 of Culberson and Reckhow, and Motwani, Raghunathan, and Saran. Two points are said to be link-2 visible if there is a third point that they both see. We consider the relationship between link-2 Ø-convexity and Ø-starshapedness, and exhibit a class of polygon/orientation set pairs for which link-2 Ø-convexity implies Ø-starshapedness.
[Chin, 1993a]: Author(s): Hong Wai Chin.

Title: . A Spatial-Temporal Analysis of Matching in Feature-Based Motion Stereo.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1993.

Abstract: Motion stereo is a stereo image acquisition method which takes successive images of a scene from a moving camera. The important assumption is that the motion is a simple translation and its parameters are known. Matching in motion stereo is fundamentally similar to conventional binocular stereo matching, where the major issue is the correspondence problem. With few constraints, stereo matching results are inherently noisy, multiple and ambiguous. Most previous motion stereo approaches have explored various constraints in ad hoc manners. Moreover, only two image frames are processed at any given time. After initial matching between the first pair, the third and subsequent images are merely used as a confirmation or refinement to the previously matched result. This thesis presents a new matching approach which integrates multiple evidence and obtains matching correspondences from the entire motion stereo sequence. Our approach is based on the observation that on the Epipolar Plane Image (EPI) corresponding feature points lie along the same linear path, known as EPI path. A voting scheme is developed to collect evidence. Each possible corresponding pair of feature points on the EPI image forms a hypothetical line, which suggests a potential match with an associated disparity value d. Accordingly, a vote is cast to a 3-D (xyd) voting space. Peaks can be detected in the voting space if many votes are correctly registered to their EPI path. A cooperative algorithm based on the disparity gradient is presented for filtering out false peaks in the voting space. Weighted support and suppression factors are used around a neighboring area of a peak in the xyd space. Since the inclusion of the entire sequence of images in motion stereo exacerbates the surface occlusion, a cooperative algorithm based on the analysis of several occlusion models taking into account of vote-counts and occlusion patterns is proposed.
[Chin, 1993b]: Author(s): Kenward Chin.

Title: . A Connectionist Approach to Acquiring Semantic Knowledge Using Competitive Learning.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, January 1993.

Abstract: Recent work in the field of cognitive science has involved the use of connectionist networks for learning semantics from simple English utterances. While significant results have been obtained, many such networks embody architectures which have obvious deficiencies. One deficiency is the use of the back propagation learning algorith. This algorithm requires that continual feedback be provided during training. Though back propagation is an effective technique, it has the drawback of not being a plausible explanation of human language acquisition, since humans do not typically receive continual corrective feedback while learning language. Another deficiency is the failure of some systems to provide a link between the semantics discovered from input sentences and the real-world objects referred to in the input sentences. Also, many systems require that the knowledge acquired should be represented according to a pre-determined representational scheme. The work presented here is an attempt to provide a connectionist basis for correcting these deficiencies. Firstly, the use of the competitive learning strategy frees the system from requiring continual feedback and from requiring a pre-determined representational scheme. Secondly, the system's task is specifically to learn the associations between the words in input sentences and the real-world concepts to which they refer.
[Hagen, 1993]: Author(s): Eli Hagen.

Title: . A flexible American Sign Language interface to deductive databases.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1993.

Abstract: This thesis investigates interfaces to deductive databases in order to allow deaf people easy consultation using their native language (in particular, American Sign Language), and develops one specific approach to the problem. Our approach consists of developing a multiple-valued logic system which serves both as the internal representation of American Sign Language (ASL) and as the database consultation language. We exemplify our ideas with a concrete system that assumes preprocessing of visual images translating them into a written form. Our system then translates this written form into a rigorously defined logical system with multiple truth values which allow richness of expression. For instance, this system can differentiate between different kinds of plural and detect failed presuppositions. The translation in terms of our logical system can then be used directly to consult deductive databases. While our exemplifying focus is database consultation in ASL, it is clear that the logical system developed around this application has other possible uses in natural language processing in general, and database interfacing in particular.
[Hu, 1993]: Author(s): Xiaohua Hu.

Title: . Conceptual Clustering and Concept Hierarchies in Knowledge Discovery.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, January 1993.

Abstract: Knowledge discovery is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. Knowledge discovery from a database is a form of machine learning where the discovered knowledge is represented in a high-level language. The growth in the size and number of existing databases far exceeds human abilities to analyse the data, which creates both a need and an opportunity for extracting knowledge from databases. In this thesis, I propose two algorithms for knowledge discovery in database systems. One algorithm finds knowledge rules associated with concepts in the different levels of the conceptual hierarchy; the algorithm is developed based on earlier attribute-oriented conceptual ascension techniques. The other algorithm combines a conceptual clustering technique and machine learning. It can find three kinds of rules, characteristic rules, inheritance rules, and domain knowledge, even in the absence of a conceptual hierarchy. The two algorithms are implemented as a component of the database learning system (DBLEARN) using C under Sybase/Unix environment. The test of DBLEARN on NSERC's grant information system shows that our method can discover many meaningful knowledge rules very quickly. The application of knowledge discovery in database is very wide. I will discuss how to apply DBLEARN to a lot of data-intensified areas such as Hospital's patient information system, customer database of telephone company , airplane company and bank, inventory system of department store and so on to find some intesesting rules hidden among the data, and how the people in these companies can use these learned rules to help them.
[Huang, 1993]: Author(s): Yue Huang.

Title: . Intelligent query answering by knowledge discovery techniques.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1993.

Abstract: Knowledge discovery in databases facilitates querying database knowledge, cooperative query answering and semantic query optimization in database systems. In this thesis, we investigate the application of discovered knowledge, concept hierarchies, and knowledge discovery tools for intelligent query answering in database systems. A knowledge-rich data model is constructed to incorporate discovered knowledge and knowledge discovery tools. Queries are classified into data queries and knowledge queries. Both types of queries can be answered directly by simple retrieval or intelligently by analyzing the intent of query and providing generalized, neighborhood or associated information using stored or discovered knowledge. Techniques have been developed for intelligent query answering using discovered knowledge and/or knowledge discovery tools, which includes generalization, data summarization, concept clustering, rule discovery, query rewriting, lazy evaluation, semantic query optimization, etc. Our study shows that knowledge discovery substantially broadens the spectrum of intelligent query answering and may have deep implications on query processing in data- and knowledge-base systems. A prototyped experimental database learning system, DBLEARN, has been constructed. Our experimental results on direct answering of data and knowledge queries are successful with satisfactory performance.
[Jackson, 1993]: Author(s): Ken Jackson.

Title: . Functional programming applied to parallel combinatorial search.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, October 1993.

Abstract: Functional programs are often more concise, more amenable to formal reasoning, and better suited to parallel execution than imperative programs. This work investigates the application of functional programming to parallel combinatorial search programs such as branch-and-bound or alpha-beta. We develop an abstract data type called improving intervals that can be used to write functional search programs. Programs that use improving intervals are simple because they do not explicitly refer to pruning; all pruning occurs within the data type. The programs are also easily annotated so that different portions of the search space are searched in parallel. The search programs are verified using approximate reasoning: a method of program transformation that uses both equational and approximation properties of functional programs. Approximate reasoning is also used to verify an implementation of improving intervals. Parallel functional programs have deterministic results. In some cases, permitting some non-determinism in the functional search programs can result in more pruning. We define a restricted form of non-determinism called partial determinism that permits a program to return a set of possible results but requires that the set of results be consistent. Partial determinism can improve the performance of the search programs while guaranteeing consistent results. We also show how approximate reasoning can be used to reason about partially deterministic programs.
[Kane, 1993]: Author(s): Jave O. Kane.

Title: . Line Broadcasting.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1993.

Abstract: Broadcasting is the process of transmitting information from an originating node (processor) in a network to all other nodes in the network. A local broadcasting scheme only allows a node to send information along single communication links to adjacent nodes, while a line broadcasting scheme allows nodes to use paths of several communication links to call distant nodes. Local broadcasting is not in general sufficient to allow broadcasting to be completed in ceiling(log n) phases, the minimum time possible for broadcasting in a network of n nodes when no node is involved in more than one communication at any given time; line broadcasting is always sufficent. An optimal line broadcasting scheme is a minimum time scheme that uses the smallest possible total number of communication links. In this thesis, we investigate line broadcasting in cycles and toruses. We give a complete characterization of optimal line broadcasting schemes in cycles, determine the exact cost of line broadcasting in cycles, and develop efficient methods for constructing optimal line broadcasting schemes in cycles and toruses. We conjecture that our torus schemes are optimal. If minimum-time broadcasting using n-1 local calls is possible from any originator in a network, then the network is a minimum broadcast graph. We define minimum line broadcast graphs, a generalization of minimum broadcast graphs in which we can complete minimum-time broadcasting using some fixed total extra length. We find minimum line broadcast graphs for small n, and find several important families of minimum line broadcast graphs.
[Kodric, 1993]: Author(s): Sandi Kodric.

Title: . On the Constituent Structure of Slovene.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1993.

Abstract: Many natural languages exhibit a much higher degree of freedom than English in the ordering of constituents within a clause. In order to use these so-called `free word order' languages in natural language processing applications, we need grammar models that are adequate from both a linguistic and a computational point of view. I examine Slovene, one such language, and propose that it is best treated by flattening the traditional hierarchical syntactic structure. I argue that there is little empirical evidence for the finite verb phrase constituent in the clause, and I show that several problems disappear if this assumption is rejected. Instead, I present a model whereby the verb combines with its subject and other complements in a single step. In both finite and nonfinite clauses, there is only one verbal projection in the syntactic structure. The clause structure is described within the constraint-based formalism of Head-driven Phrase Structure Grammar (HPSG) which is not confined to binary branching and which includes separate specifications of immediate dominance and linear precedence. This analysis of the Slovene clause avoids discontinuous constituency and it accounts for the invariable second position of the clitic cluster by local linear precedence constraints. I discuss some computational consequences and suggest that the weakened structural constraints do not necessarily result in less efficient parsing. I show how an existing chart parser for HPSG can be adapted to process grammars of this kind.
[Lu, 1993]: Author(s): Wei Lu.

Title: . A Deductive and Object-Oriented Approach for Spatial Databases.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, June 1993.

Abstract: With the rapid development of deductive and object-oriented database technology, it is promising to explore the application of deductive and object-oriented techniques in the development of spatial databases. This thesis investigates the design and implementation of deductive and object-oriented spatial databases (DOOSDB). Several important issues on such spatial databases are studied, including modeling complex spatial objects, spatial data manipulation functionality, a spatial deductive query language, and extensibility of the system. This thesis contributes to the studies on spatial query optimization and processing in DOOSDB in the following aspects: (1) a method for compilation of deduction spatial rules and expressions is proposed with simplification of compiled queries using relational and geo-relational algebra. (2) an algorithm for spatial query plan generation and selection using a dynamic connection graph analysis; (3) techniques for set-oriented optimization and processing of computationally-intensive spatial operators and methods; and (4) a spatial join indexing technique using information associated with frequently used spatial join operations. This thesis presents an integrated view of a deductive and object-oriented spatial database system and provides an effective mechanism for spatial data handling and efficient algorithms for spatial query processing.
[MacDonald, 1993]: Author(s): Glenn MacDonald.

Title: . Isomorphism and Layout of Spiral Polygons.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 1993.

Abstract: Let P be a polygon having n vertices, exactly r of which are reflex. Then P is called spiral if its vertices may be labelled so that, counterclockwise around the boundary, u 1 ; u 2 ; : : : ; u r are all reflex and occur consecutively and v 1 ; v 2 ; : : : ; v n-r are all convex and also occur consecutively. If we consider P to be the union of the actual polygon boundary and the interior region, then in P, a point b is said to be visible to a point a if the line segment ab does not intersect the exterior of P. Two polygons are considered to be isomorphic if there is a one-to-one mapping between their points that preserves visibility. This thesis establishes necessary and sufficient conditions for two spiral polygons to be isomorphic and gives an O(n^2) detection algorithm. Also, it is shown that there exists an O(r log r) bit canonical representation for the visibility structure of a spiral polygon. This thesis also investigates the layout of spiral polygons. If P is a spiral having its vertices numbered as described above, then let l 0 and l r+1 be the lines orthogonal to \000 \000 ! u 0 u r+1 that pass through u 0 and u r+1, respectively. Then P is called a banana spiral if none of the three lines \000 \000 ! u 0 u r+1, l 0 and l r+1 intersect the interior of P. This thesis constructively shows that every canonical representation may be laid out as a banana spiral. An orthogonal polygon is one whose edges are all either horizontal or vertical. This thesis establishes necessary and sufficient conditions for a canonical representation to be realized as an orthogonal spiral polygon and gives an inductive construction algorithm.
[Mah, 1993]: Author(s): Sang Yem Mah.

Title: . A constraint-based reasoning approach for behavioural motion control in computer animation.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, March 1993.
[Mitchell, 1993]: Author(s): David Geoffrey Mitchell.

Title: . An Empirical Study of Random SAT.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1993.
[Neilson, 1993]: Author(s): Carl Neilson.

Title: . Achieving Strong Consistency in a Replicated File System.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1993.
[Nickolov, 1993]: Author(s): Radoslav Nickolov.

Title: . Broadcasting and Scattering in Cube-Connected Cycles and Butterfly Networks.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1993.

Abstract: The process of sending a message from one node of a communication network to all other nodes is called broadcasting when the message is the same for all nodes and scattering when each of the nodes receives a different message. In this thesis we prove several upper bounds on the time to broadcast and scatter in the Cube-Connected Cycles (CCCd) and Butterfly (BFd) interconnection net- works. We use the linear cost model of communica- tion, in which the time to send a single message from one node to its neighbour is the sum of the time to establish the connection and the time to send the data proportional to the length of the message. Our algorithms use pipelining in parallel along several disjoint (spanning) trees. We show how to construct 2 arc-disjoint spanning trees of depths 2d+floor(d/2)+2 and 3 arc-disjoint spanning trees of depths 3d+3 in CCCd, and 2 and 4 arc-dis- joint spanning trees in BFd, of depths d+floor(d/2)+1 and 2d+1 respectively, and compare the broadcast- ing times for different lengths of the broadcasted message. Our scattering algorithms consist of two phases. During the first phase we scatter along perfectly balanced binary subtrees of CCCd and BFd. In the second phase we scatter in parallel in all cycles of CCCd and BFd using several origina- tors. The times to scatter are close to the exist- ing lower bounds for both graphs. Algorithms are presented for full-duplex and half-duplex links and processor-bound and link-bound communication.
[Petreman, 1993]: Author(s): Cheryl Michele Petreman.

Title: . Slicing Up Bonsai: Adding Cutting Planes to Mixed 0-1 Programs.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1993.

Abstract: We investigate a method of reducing the effort required to solve resource-constrained scheduling problems using mixed-0/1 linear programming. In particular we examine a formulation of the decision as to whether two activities of variable duration will be serialized or not. From this subset of the constraint system, we derive cutting plane constraints which are expressed in terms of variable upper and lower bounds. We describe the integration of these cuts into bonsai — a system which implements branch and bound search with partial arc consistency. Each time the arc consistency routines are able to restrict variable domains, the cutting plane is tightened to reflect the tighter bound. The computational results show a decrease in the average number of subproblems required to solve large examples. However, there is also an increase in the total number of pivots.
[Sidebottom, 1993]: Author(s): Gregory Allan Sidebottom.

Title: . A Language for Optimizing Constraint Propagation.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, November 1993.

Abstract: This thesis describes the projection constraints (PCs), a language for compiling and optimizing constraint propagation in the numeric and Boolean domains. An optimizing compiler based on PCs has been implemented in Nicolog, a constraint logic programming (CLP) language. In Nicolog, like other CLP languages such as CHIP, Echidna, CLP(BNR) (BNR Prolog), cc(FD), and clp(FD), domains for variables are explicitly represented and constraint processing is implemented with consistency algorithms. Nicolog compiles each constraint into a set of arc revision procedures, which are expressed as PCs. Instead of using full arc revision based on enumeration, Nicolog uses regions where functions are monotonic to express arc revision procedures in terms of interval computations and branching constructs. Nicolog compiles complex constraints directly, not needing to approximate them with a restricted set of basic constraints or to introduce extra variables for subexpressions. The Nicolog compiler can handle a very general class of constraints, allowing an arbitrary mixture of integer, real, and Boolean operations with a variety of domain representations. The only requirement is that for each domain, it must be possible to compute a set of intervals whose union contains that domain. Nicolog also lets the user program using PCs directly making it possible to implement sophisticated arc revision procedures. This thesis shows that PCs are a simple, efficient, and flexible way to implement consistency algorithms for complex mixed numeric and Boolean constraints. Emperical results with a prototype Nicolog implementation show it can solve hard problems with speed comparable to the fastest CLP systems.
[Welman, 1993]: Author(s): Chris Welman.

Title: . Inverse Kinematics and Geometric Constraints for Articulated Figure Manipulation.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, March 1993.

Abstract: Computer animation of articulated figures can be a tedious process, due to the amount of data which must be specified at each frame. Animation techniques range from simple interpolation between keyframed figure poses to higher-level algorithmic models of specific movement patterns. The former provides the animator with complete control over the movement, whereas the latter may provide only limited control via some high-level parameters incorporated into the model. Inverse kinematic techniques adopted from the robotics literature have the potential to relieve the animator of detailed specification of every motion parameter within a figure, while retaining complete control over the movement, if desired. This work investigates the use of inverse kinematics and simple geometric constraints as tools for the animator. Previous applications of inverse kinematic algorithms to computer animation are reviewed. A pair of alternative algorithms suitable for a direct manipulation interface are presented and qualitatively compared. Application of these algorithms to enforce simple geometric constraints on a figure during interactive manipulation is discussed. An implementation of one of these algorithms within an existing figure animation editor is described, which provides constrained inverse kinematic figure manipulation for the creation of keyframes.
[Xia, 1993]: Author(s): Jinshi Xia.

Title: . Attribute-Oriented Induction in Object Oriented Databases.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, September 1993.

Abstract: Knowledge discovery in databases is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data such that the extracted knowledge may facilitatte deductive reasoninf and query processing in database systems. this branch of study has been ranked among the most promising topics for database research for the 1990s. Due to the dominating influence of relational databases in many application fields, knowledge discovery from databases has been largely focused on relational databases. The gradual adoption of em object-oriented database systems has expressed a need for the study of knowledge discovery from object-oriented databases as well. em Object-oriented databases (OODBs) are concerned with complex data structures and diverge greatly from relational database systems. In order to effectively conduct knowledge discovery in an OODB, existing relational algorithms need be modified accordingly to take full advantage of the object-oriented data model. The attribute-oriented indution method has been successful for knowledge discovery in relational databases and we choose this method to study the new demands OODBs impose on a learning algorithm. In this thesis, we study the characteristics of the object-oriented data model and their effects on the attribute-oriented induction algorithm. We extend the attribute-oriented induction method to object-oriented paradigms, focusing on handling complex attributes, and present an algorithm for learning characteristic rules in an OODB. We follow the em least commitment principle and break down complex objects into primitive ones and then apply attribute-oriented generalization techniques. Learning in databases with a cycled class composition hierarchy is specifically addressed.
[Zhou, 1993]: Author(s): Wei Zhou.

Title: . How spatial data models and DBMS platforms affect the performance of spatial join.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1993.

Abstract: This thesis studies how the performance of a common spatial database operation, called spatial join, can be affected by different data models and DBMS platforms. We consider three spatial data models: Relational, BLOB (Binary Large Object Block) and Parent-Child Pointer model, which have different degrees of pointer involvement at the database schema level. ObjectStore and Sybase are chosen as representatives of object-oriented and relational DBMS. Our R-tree based spatial join algorithm, which is optimized in the step of polygon overlap checking, is presented. We measure the performance of this spatial join algorithm against five combinations of data models and DBMS platforms. Among other findings, the experimental results show that the Relational model does poorly on either DBMS platform, the running time being 3 to 4 orders of magnitude worse than the others. We introduce a technique called application caching that bridges the gap between the storage data structure (the data model) and the application data structure. By applying this technique to the algorithm runnning against the Relational and the BLOB data models, we show how the effect of data model/DBMS platform on performance of spatial join can be neutralized.

1992

[Cai, 1992]: Author(s): Biaodong Cai.

Title: . Towards adaptive concurrency control in database systems.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, March 1992.
[Chen, 1992]: Author(s): Xiaobing Chen.

Title: . Extracting functional dependencies and synonmyms from relational databases.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, November 1992.
[Dudra, 1992]: Author(s): Timothy J. Dudra.

Title: . A Hierarchical Approach to Local File System Design for a Transputer-Based Multiprocessor.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1992.

Abstract: The rate at which processor power is increasing is not being matched by similar increases in mass storage performance, nor is this disparity expected to be rectified. This problem is especially acute in multiprocessor architectures and has come to be known as the I/O bottleneck crisis. Vast quantities of data are required for full utilization of a multiprocessor and this data must be provided to that processor, either directly or indirectly, via high speed networks, from some mass storage medium. Disk caching and prefetching have been proposed as good (possibly short-term) techniques for reducing the impact of the I/O bottleneck on multiprocessor throughput. An impressive amount of research has been invested in analyzing the effects of these techniques on the performance of tightly coupled shared memory multiprocessors; Application of these methods to loosely coupled multiprocessors has not been pursued with the same fervour. This thesis presents the design and performance of a disk cache system for use within networks of Inmos transputers. The intent of this system is to provide a local file system for the transputer network. This system is intended to be a building block in the development of a distributed file system for the transputer multiprocessor. The Transputer Auxiliary Storage System supports advanced cache management features such as user directed prefetching, opportunistic write back, disk interleaving, multi-threaded server support and RAM disks. The design and implementation issues encountered during the creation of this storage system are discussed. Also, the performance of the system is analyzed for various workloads. The results confirm the positive effect of the system upon sustained I/O performance into and out of the transputer network.
[Fan, 1992]: Author(s): Hong Fan.

Title: . Spatial Join: A Study of Complex Spatial Operation and its Underlying Spatial Indexing Methods.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1992.

Abstract: In recent literature, there has been extensive research on simple spatial operations such as point location and range query, as well as comparative studies on spatial indexing methods (SIM) for simple objects based on simple spatial operations. The thesis tackles the problem of polygon spatial join, which is one of the most complex spatial operations on complex objects which are simple polygons. Polygon spatial join can be defined as finding all pairs of polygonal objects that overlap each other over their boundaries from two given polygonal data sets. Spatial join is used extensively in geographical information systems, where geographical data is organized by layer, and a join of the layers creates synthesized information of the same geographical area. It can also be directly extended to realize polygon overlay, which is also a very important complex operation in GIS. We solve the problem by extensively utilizing popular SIMs: PM Quadtree and R-tree such that complex objects and object relations can be handled efficiently as well. This is based on the observation that spatial join relies on the object spatial occupancy, and these SIMs decompose space from which the spatial data is drawn in a way that spatial properties of spatial objects can be developed and stored. We design algorithms for spatial join based on PM Quadtree and R-tree as well as algorithms with no spatial index involved for comparisons. We also present Grid Coordinate System (GCS): a SIM for simple spatial objects which is a kind of Grid File based on the object spatial occupancy instead of on the transformed multidimensional point space. Both GCS and R-tree are shown to be empirically superior to PM Quadtree with respect to the spatial join operation. Comparative studies of the three SIMs under spatial join are also presented. We make use of ObjectStore which is an object-oriented database system as the storage manager for the spatial data. Empirical results are obtained through extensive experiments on the random polygonal nets. We generate the polygonal net for the studies in such a way that it can be adjusted through parameters regarding size, shape and distribution of the composing polygonal data. The polygonal net is represented by a vector data model designed as a multi-filestorage-saving structure enhanced with indexing capability.
[Finlayson, 1992]: Author(s): Graham David Finlayson.

Title: . Colour Object Recognition.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 1992.

Abstract: Since colour characterises local surface properties and is largely viewpoint insensitive it is a useful cue for object recognition. Indeed, Swain and Ballard have developed a simple scheme, called colour-indexing, which identifies objects by matching colour-space histograms. Their approach is remarkably robust in that variations such as a shift in viewing position, a change in the scene background or even object deformation degrade recognition only slightly. Colour-indexing fails, however, if the intensity or spectral characteristics of the incident illuminant varies. This thesis examines two different strategies for rectifying this failure. Firstly we consider applying a colour constancy transform to each image prior to colour-indexing (colours are mapped to their appearance under canonical lighting conditions). To solve for the colour constancy transform assumptions must be made about the world. These assumptions dictate the types of objects which can be recognised by colour-indexing + colour constancy preprocessing. We review several colour constancy algorithms and in almost all cases conclude that their assumptions are too limiting. The exception, a discrete implementation of Forsyth's CRULE, successfully solves the colour constancy problem for sets of simple objects viewed under constant illumination. To circumvent the need for colour constancy preprocessing and to recognise more complex object sets we consider indexing on illuminant invariants. Three illuminant invariants—volumetric, opponent and ratio—are examined. Each characterises local surface properties, is largely viewpoint insensitive and is independent of both the intensity and spectral characteristics of the incident illuminant. We develop an algorithm, called Colour constant colour-indexing, which identifies objects by matching colour ratio-space histograms. In general our algorithm performs comparably with colour-indexing under fixed illumination, but substantially better than colour-indexing under varying illumination.
[Hamilton, 1992]: Author(s): Howard John Hamilton.

Title: . Specification of Inductive-Inference Problems for Machine Discovery.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, July 1992.

Abstract: We describe a framework for inductive-inference problem specification (called the IIPS framework) and use it to specify several Machine-Discovery problems. Inductive inference is the process of creating a general rule based on specific examples. Machine discovery is the field which uses computational techniques to simulate the discovery of known scientific results or to discover new results. We contend that Machine-Discovery programs should be analyzed in terms of the inductive-inference problems that they attempt. The process of specifying inductive-inference problems is described intuitively using three ``players'': a problem maker, a presenter, and an inducer. A new framework is necessary because Machine-Discovery problems require types of interactions between presenter and inducer which cannot be specified within previous frameworks for inductive inference. The use of the IIPS framework for specifying Machine-Discovery problems is illustrated using BACON.1, the earliest program that performs ``scientific rediscovery,'' BACON.3, and KEPLER, a program recently described by Wu. A new algorithm, called DATAX-2, is described for the problem attempted by KEPLER and shown to be more efficient. The illustrative problems highlight the importance of the Assumption of Sufficiency, which can be regarded as the extra information that enough examples have been examined. The effect of knowing this information on the difficulty of an inductive-inference problem is analyzed with reference to sequence-extrapolation problems.
[Harder, 1992]: Author(s): Ian A. Harder.

Title: . Spectral Analysis of Interreflection.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1992.

Abstract: Interreflection (or mutual illumination) occurs when two or more object surfaces are illuminated both by a light source and the light reflected from other surfaces. As the distance or angle between two interreflecting surfaces decreases, the intensity of interreflected light increases, with a corresponding shift in colour known as colour bleeding. For computer vision algorithms that assume spatially invariant surface reflectances, this plays a confounding role. As an example, in the presence of interreflection, ``shape-from-shading'' methods will incorrectly reconstruct surfaces such that the orientation of their surface normals will appear to be closer to the direction of the illuminant than they actually are. Rather than treating interreflection as noise, surface colours can be analysed to provide additional information such as the illuminant spectra and surface shading. In this thesis, a finite dimensional model is employed to recover the surface spectral reflectances of two interreflecting Lambertian surfaces under a known illuminant. The resulting reflectances are used to construct colour basis vectors for linear decomposition of colour channel intensities for each surface, from which the coefficients of the no-bounce colour components (shading fields) are extracted. The robustness of this simple, straightforward algorithm is tested on both synthetic and real interreflecting planar surfaces, and the improvement the recovered shading field provides over image intensity is demonstrated using a simplified shape-from-shading scheme.
[Harms, 1992]: Author(s): Daryl Dean Harms.

Title: . A Symbolic Algebra Environment for Research in Network Reliability.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, September 1992.

Abstract: The development of reliable systems is becoming of increasing importance. When a system can be conceptualized at either the physical or logical level as consisting of networks of components which may fail, we often use a basic connectivity-based model of reliability to obtain some insights into its expected behaviour. We focus our attention on the graph theoretic model in which edges have statistically independent operation probabilities and vertices do not fail. The measure we look at is the probability that a path exists between a pair of distinguished vertices in this environment. The exact calculation of this measure is #P-complete in general. This has motivated work on developing techniques for obtaining approximations of the exact value. We concentrate on methods which deliver actual bounds on the exact value. These are of varying degrees of sophistication but it appears that no one method is better than all others in all situations. For this thesis we implemented each of these techniques and integrated them into a system where their performance can be compared, and in which the information obtained from them can be easily interchanged and combined. One novelty of this system is that it is developed in sc maple (a general purpose computer algebra environment). We indicate the impact this environment had on our research. Through the use of this system we have already developed a number of new techniques and refinements to existing techniques. These include renormalization, a new a-priori and postoptimization upper bounding technique; enhancements to the Chari-Provan bounds and extension of them and the Kruskal-Katona bound to postprocessing techniques; a heuristic for obtaining orderings for the non-crossing cuts bound; and a general methodology which, among other things, allows for the productive utilization of bounds for this connectivity measure to help improve bounds on more sophisticated performability measures.
[Kaller, 1992]: Author(s): Damon M. Kaller.

Title: . Output Sensitive Algorithms to Compute Higher-Order Voronoi Diagrams in Euclidean D-Space.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1992.

Abstract: The order-k Voronoi diagram (denoted V^S_k) of a set S of n points in Euclidean d-space Re ^d is a cell complex which partitions Re ^d. Each cell is a convex polytope which is associated with a k-subset T subset S, and corresponds to the region of space for which every element of T is at least as close as any element of S-T. We present algorithms which compute V^S_k in a non-incremental manner: that is, V^S_k-1 is not needed as a preliminary step in the computation of V^S_k. The first algorithm enumerates all v vertices of V^S_k for a nondegenerate point set—along with the information on which polytopes each vertex lies. From this, the entire facial graph of the diagram may be derived. The approach is to move from vertex to vertex along edges, until all of the vertices have been visited. The algorithm has running time θ(d²n + d³ log n) per vertex. The second algorithm enumerates all polytopes along with their facets, and does not require that the input point set be nondegenerate. This is motivated by the problem of reference set thinning in pattern recognition. It can be shown that only the facet information of the order-k Voronoi diagram of the reference set is necessary for thinning under the k-nearest neighbor decision rule. An order-k Voronoi polytope may be expressed as the intersection of k(n-k) constraints—the nonredundant ones determine the facets of the polytope. A two stage approach is used in the second algorithm to find all of the nonredundant constraints. In stage 1, a subset of ``relevant'' points of S is found: each such point lies on some hypersphere which separates T from S-T. This spherical separability problem in Re ^d is equivalent to a linear separability problem in Re ^d+1, and also equivalent to an extreme point problem in Re ^d+1. In stage 2, the constraints generated by the relevant points are tested for nonredundancy. This, too, is equivalent to an extreme point problem. Linear programming techniques are used to solve the extreme point problems. The running time of the algorithm can be bounded by O(3^d²n + dk log n) per facet. The high dimension-dependent constant in the latter algorithm makes it unappealing from a practical point of view. The constant derives from Megiddo's (modified) linear-time linear programming technique. A more practical algorithm is obtained by techniques based on Dantzig's simplex method, which is well-known empirically to run in linear expected time, despite its exponential worst-case performance. This ``practical'' facet enumeration algorithm has been implemented, and some experimental results are presented.
[Macdonald, 1992]: Author(s): Peter Douglas Macdonald.

Title: . A Logical Framework for Model-Based Diagnosis with Probabilistic Search.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, November 1992.
[Mahajan, 1992]: Author(s): Sanjeev Mahajan.

Title: . Infinite Games, Online Problems and Amplification.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, August 1992.

Abstract: This thesis consists of 2 parts. The first consists of joint work published with Xiaotie Deng and the second is joint work published with Arvind Gupta In an Online problem, requests come online, and they need to be answered without knowing the future requests. Competitive ratio is a performance measure for an Online algorithm, which measures how well the algorithm performs against an optimal algorithm which knows all the requests in advance. It is the worst case ratio of the cost incurred by the Online algorithm versus the cost of the optimal offline algorithm. Ben-David, Borodin, Karp, Tardos and Wigderson(1990) initiated a systematic study of randomization in Online problems. They formalized Online problems as request-answer games, and also clarified several issues regarding randomization in Online problems. They argued that several papers on randomized algorithms for Online problems had used different notions of adversary. The different adversaries were then identified and formalized: oblivious adversary, adaptive online adversary, adaptive offline adversary. Among these, oblivious adversary is the weakest and adaptive offline adversary is the strongest. Among the several seminal theorems, they showed the following beautiful and simple theorem: If there exists randomized online strategy for a problem that is α competitive against an adaptive offline adversary, then there exists an α competitive deterministic strategy. A natural question that arises in this context is whether this theorem can made constructive. We show that it cannot. In fact, we show that there exists an online problem such that there is a very simple computable randomized strategy that is 1-competitive, but no deterministic computable strategy that is α-competitive for any finite α. We also show an interesting game-theoretic result which asserts that the BBKTW theorem is the tightest possible. In my thesis, I also consider the following issue: Consider a random boolean formula that approximately realizes a boolean function. Amplification (first proposed by Valiant) is a technique wherein several independent copies of the formula are combined in some manner to prove the existence of a formula that exactly computes the function. Valiant used amplification to produce polynomial size formulas for the majority function over the basis { land , lor }. Boppana then showed that Valiant achieved best possible amplification. We use amplification to show the existence of small formulas for majority when the basis consists of small(fixed) majority gates. The obtained formula size are optimal modulo the amplification method.
[Merks, 1992]: Author(s): Eduardus Antonius Theodorus Merks.

Title: . Acer: Manipulation Principles Applied to Language Design.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, April 1992.

Abstract: Programming language design is explored from the viewpoint that support for program manipulation is a fundamental guiding concern. Three general areas of language design are identified as being of particular significance in terms of support for manipulation, namely the mapping between concrete syntax and abstract syntax, the mapping between static semantics (context-dependent syntax) and abstract syntax, and the mapping between equivalent language constructs. Abstract syntax-tree nodes and their context-dependent relations are the unifying concept in this realm. A particular programming language, Acer, based on the typeful programming language Quest, is designed and implemented to illustrate how support for manipulation is enhanced. Acer is a general-purpose, imperative language with a full accompaniment of modern language features, as well as a number of novel features (e.g., persistent storage). Its concrete syntax is designed to meet strict requirements, e.g., every node gives rise to a token so that it is visible for selection or annotation. Its abstract syntax is similarly strict and provides node representations for all semantic objects. Hence, semantic relations are simply relations on nodes and semantics-preserving transformations, such as folding and unfolding, are supported as simple transformations of node structure. Acer's support for manipulation demonstrates the benefits of designing abstract syntax first and treating concrete syntax as a particular way of viewing abstract syntax. It also demonstrates that a concrete syntax can be designed which is both natural in appearance and yet highly constrained. And, perhaps most importantly, it demonstrates that imperative languages can support the same kinds of powerful transformations supported by functional languages, e.g., all expressions can be folded.
[Mezofenyi, 1992]: Author(s): Mark Maurice Joseph Mezofenyi.

Title: . External Sorting Using Transputer Networks.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1992.

Abstract: Sorting is an extremely common application for computers. A sort is external when the amount of data being sorted is too large to fit entirely within main memory. This thesis begins with a discussion of a well-known external sorting algorithm and enhancements of it. For a random input file stored on a single disk the maximum theoretical sort rate of the enhanced version of this algorithm is half the i/o rate of the disk. However, other factors affect the actual sort rate, such as the speed of the cpu, the amount of main memory, the speed of the i/o subsystem, the number of auxiliary disks available, the size of the records being sorted, and the size of the input file. We describe an implementation of this external sort algorithm on a single transputer with a fixed amount of main memory. Using a single disk for both the input and output files we explore the effects of varying the software-dependent factors and the number of auxiliary disks on the sort rate. We find that a single transputer is, in most cases, not able to achieve the theoretical limit imposed by the i/o rate of a single disk. In an attempt to attain this limit we implemented a parallel extension of the above external sort. We found that additional transputers can be used to bring the performance of the sort up to the theoretical single-disk limit for the largest files storable in our environment. We also show that, by using a storage node which contains more than a single disk, the limiting factor for the parallel version of the single-input single-output sort on the largest files in our environment becomes the speed of a transputer link. The thesis concludes with a discussion of methods to parallelize further the external sort by using multiple storage nodes for both the input and output files, thus surmounting the bottleneck imposed by the inter-transputer link speed.
[Pattabhiraman, 1992]: Author(s): Thiyagarajasarma (Pat) Pattabhiraman.

Title: . Aspects of Salience in Natural Language Generation.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, August 1992.

Abstract: This dissertation examines the role of salience in natural language generation (NLG). The salience of an entity, in intuitive terms, refers to its prominence, and is interpreted as a measure of how well an entity stands out from other entities and biases the preference of the generator in selecting words and complex constructs. Salience has been used as a content selection heuristic in an NLG system by Conklin & McDonald. However, the pervasive nature of salience-induced effects in NLG has not been accorded due attention. Through an analysis of previous work in diverse disciplines, we show the variety of salience effects in NLG in such processes as content selection, syntactic linearization and reference generation. Next, we classify several important determinants of salience, corresponding to different factors contributing to the salience score of an entity. We then delineate two theoretically-significant categories: canonical salience and instantial salience. Psycholinguistic and cognitive evidence is drawn to characterize canonical salience as a built-in preference in the general conceptual- and linguistic knowledge of the speaker. Instantial salience refers to the salience of specific objects in the context of NLG. Psycholinguistic results of Osgood & Bock are highlighted to suggest the multiplicative interaction between canonical and instantial salience. This interaction is captured in a decision-theoretic formalization that models canonical salience as probabilities, instantial salience as utilities, and the selection criterion as maximization of expected utility. This formalization is used to illustrate the relative rarity of passive voice sentences over active voice sentences in English. Next we consider the phenomena of basic level and entry level preference in naming objects. We argue that basic level preference in taxonomic concept knowledge is a form of canonical salience. By establishing the stark similarity of the conclusions of the psychological experiments of Jolicoeur, Gluck & Kosslyn to those of Osgood & Bock, the dynamics of preferring names for objects is conceptualized as an interaction between canonical and instantial salience. We demonstrate the suitability of decision theory to model this interaction. Our model of salience interactions is further reinforced in a study of simile generation. We model property salience as information-theoretic redundancy and equivalently, as expected utility. To avert miscommunication and anomaly, we propose two different cost measures (antipodal to utility) using probabilistic knowledge and intrinsicness, and develop net expected utility as the decision criterion for selecting objects of comparison in similes. We conclude in this thesis that the multi-aspect notion of salience provides subtle, quantitative control in language generation decisions, and captures interesting and significant effects such as graded judgments and relative frequencies of linguistic constructs.
[Peters, 1992]: Author(s): Kathleen A. Peters.

Title: . The Design of a Change Notification Server for Clients of a Passive Object-Oriented Database Management System.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, July 1992.
[Wiebe, 1992]: Author(s): Bruce Wiebe.

Title: . Modelling Autosegmental Phonology with Multi-Tape Finite State Transducers.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1992.

Abstract: Phonology may be briefly defined as the study of sound patterns in spoken language. One of the most well-known computational models of phonology, Koskenniemi's 2-level phonology, is based on an underlying linguistic theory that has been superseded by autosegmental phonology, which began with the work of Goldsmith. There is a need for computational models that are faithful to this more recent theory. Such a model can form the basis of a computational tool that can quickly and accurately check the validity of a phonological analysis on a large amount of phonetic data, freeing the linguist from the tedious and error-prone task of doing this by hand. This thesis presents a new computational model of phonology that is faithful to standard autosegmental theory, that has clearly adequate expressive power, and that is suitable as the basis for a tool for phonological analysis. It follows on very recent efforts by Kornai and Bird & Ellison to model autosegmental phonology. The model is based on a view of phonology that sees phonological representations as data and phonological rules as procedures that manipulate them. It models rules using multi-tape state-labelled finite transducers (MSFTs), a natural extension of finite state transducers obtained by adding multiple input & output tapes. MSFTs are shown to be powerful enough to express a wide range of autosegmental rules. We also investigate the class of formal languages accepted by multi-tape state-labelled finite automata (MSFAs) when their input tapes are considered to encode a single word in parallel. This class is quite large, including some languages that are not context free. Given that our model is faithful to autosegmental theory, this gives an upper bound on the computational power required to model autosegmental phonological rules.
[Wu, 1992]: Author(s): Ju Wu.

Title: . Design and Implementation of a Dynamic Spatial Query Language.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, September 1992.
[Zhu, 1992]: Author(s): Lifang Zhu.

Title: . Enforcement of Integrity Constraints in Deductive Databases.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, November 1992.

Abstract: Integrity constraint (ic) enforcement forms an essential component in deductive database processing. Some interesting methods which enforce integrity constraints have been proposed by Topor, Lloyd, Decker, Kowalski, Sadri, Soper, Martens, Bruynooghe, Yum and Henschen. In this thesis we further analyze and develop efficient simplification algorithms and methods for the enforcement of integrity constraints in recursive deductive databases. We combine theorem-proving methods with compilation techniques in our approach. Theorem-proving methods are used to prune the size of the integrity constraint checking space and compilation techniques are also used to derive necessary implicit modifications and evaluate the simplified integrity constraint set against the actual database. Synchronous and asynchronous chain recursions are discussed. By exploiting the hierarchical structure of a deductive database, we can precompile or partially precompile integrity constraints and ic-relevant rules to simplify integrity constraint checking and validate some modifications by static qualitative analysis. By analyzing predicate connection and variable binding, and compiling recursive rules independently, we can simplify ic-relevant queries and generate efficient checking plans. Some asynchronous and synchronous chain recursive integrity checking relevant queries can be simplified to non-recursive or simpler queries. Efficient processing algorithms are developed for integrity checking and derivation of implicit modification. To perform integrity checking against the actual database we utilize the `affected graph' of a modification. We achieve by focusing our attention only on the part of the database which is affected by the update and relevant to integrity constraints.

1991

[Bawtree, 1991]: Author(s): Hugh Alexander Bawtree.

Title: . Restructuring the Run Time Support of a Distributed Language.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, October 1991.
[Gurski, 1991]: Author(s): Pamela Gurski.

Title: . Fractal-Based Texture Segmentation of Digital X-Ray Mammograms.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, May 1991.
[Kurn, 1991]: Author(s): Andrew Marcus Lear Kurn.

Title: . A Tool in the Lab: A Computer Programming Language that Operates on Context-Free Sentences.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, December 1991.

Abstract: Context-free grammar is the most commonly used method of specifying the structure of formal languages, including mathematical expressions and computer programming languages. Although these languages may be more complex than what is implied by their context-free grammar, the method is so elegant and powerful that it has gained universal acceptance. This work describes a new programming language, called G language, and the philosophy motivating its design. The language is designed especially for operating on sentences from context-free grammars, and thus finds application in computer language processing and symbolic algebra. G is imperative and uses a store of cells, which is to say that its statements are usually executed serially and that variables may be changed so as to have different values at different times. The cell used here has a rather more complex structure than in conventional languages, which is the most fundamental difference between them and G language. Several other unusual features are also present. For one, G unifes the notion of data type with grammar, with the following consequences: First, types become first-class data, which can be constructed and then used for the creation of cells of the new types. Second, any datum may be fixed during the execution of a program. The language also includes a facility for writing literal sentences and for attaching annotations to cells.
[Manaf, 1991]: Author(s): Afwarman Manaf.

Title: . Design and implementation of automotive model-based diagnosis using the Echidna constraint reasoning system.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 1991.
[Sidebottom, 1991]: Author(s): Gregory Allan Sidebottom.

Title: . Hierarchical Arc Consistency Applied to Numeric Processing in Constraint Logic Porgramming.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, November 1991.
[Wang, 1991]: Author(s): Qiang Wang.

Title: . Efficient Evaluation of Functional Recursive Query Programs in Deductive Databases.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, January 1991.
[Wu, 1991]: Author(s): Eric Qian Wu.

Title: . Transformation and Benchmark Evaluation for SQL Queries.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, November 1991.
[Zhang, 1991]: Author(s): Jie Zhang.

Title: . Display Performance of Graphic Interface with Geographic Information Systems.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1991.

1990

[Choi, 1990]: Author(s): Amelia Yin Ling Choi.

Title: . A bi-level object-oriented data model for geographic information system applications.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, July 1990.
[Dyck, 1990]: Author(s): John Michael Dyck.

Title: . Syntactic manipulation systems for context-dependent languages.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1990.
[Fu, 1990]: Author(s): Ada Wai-Chee Fu.

Title: . Enhancing concurrency and availability for database systems.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, March 1990.
[Levinson, 1990]: Author(s): Catherine Mary Levinson.

Title: . A Technique for solving geometric optimisation problems in 2 and 3 dimensions.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, July 1990.
[Murray, 1990]: Author(s): Donald C. Murray.

Title: . A real-time transputer-based data acquisition system.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, March 1990.
[Ovans, 1990]: Author(s): Russell David Ovans.

Title: . An Object Oriented Constraint Satisfaction System Applied to Music Composition.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1990.

Abstract: Constraint satisfaction problems (CSPs) are important and ubiquitous in artificial intelligence and generally require some form of backtracking search to solve. This thesis provides a methodology for solving CSPs using object-oriented programming and describes how music composition can be formulated as a CSP. Firstly, we have built, using an object-oriented programming language, a constraint satisfaction system that is applicable to any CSP. By defining a methodology for the conversion of an abstract model of a CSP, namely the constraint graph, into a network of co-operating objects, we have isolated some useful abstractions of constraint programming. This system can solve CSPs involving constraints of any arity, and frees the programmer from the details of tree-search and constraint propagation. The second contribution of this thesis is an observation that music composition can be formulated strictly as a CSP. Recent attempts to develop expert systems for music composition have centred mainly around the rule-based approach, which we argue is frequently inefficient due to its reliance on chronological backtracking as the control method. Conversely, when music composition is viewed as a CSP, the complexity of the problem, the usefulness of the musical constraints, and the relationships among notes all become manifest in the form of a constraint graph. Most importantly, consistency techniques can be exploited in an effort to reduce backtracking and thus provide a more efficient procedure for the generation of compositions. The synthesis is the generation of contrapuntal music by modeling first species counterpoint as a CSP and implementing its solution in our constraint satisfaction system. Using this approach, we have undertaken an analysis of the rules of first species counterpoint by measuring their individual effect on constraining the number of compositions belonging to the genre. [NB: an updated version of a portion of this thesis is published by SFU as a technical report: CSS-IS TR 92-02. This report is available via ftp as a compressed (.Z) tar file (.tar.Z) (which contains multiple postscript files) on host fas.sfu.ca under /pub/css/techreports/1992. This report subsumes (and corrects) many of the results presented in the thesis.]
[Tate, 1990]: Author(s): Kevin James Tate.

Title: . A Multiresolution Approach to Range-Guided Stereo Matching.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 1990.
[Till, 1990]: Author(s): Berndt Christian Till.

Title: . The Nervous System of Caenorhabditis elegans: A Stochatic Approach.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1990.
[Vogel, 1990]: Author(s): Carl M. Vogel.

Title: . Inheritance Reasoning and Head-Driven Phrase Structure Grammar.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1990.

Abstract: Inheritance networks are a type of semantic network which represent both strict (classical implication) and defeasible (non-classical) relationships among entities. We present an established approach to defeasible reasoning which defines inference in terms of the construction of paths through a network. Much of literature on inheritance is concerned with specifying the most ``intuitive'' system of path construction. However, when considering a fundamental feature of these approaches–the status accorded to redundant links–we find that topological considerations espoused in the literature are insufficient for determining the valid inferences of a network. This implies that the ``intuitiveness'' of a particular method depends upon the domain being represented. Though Touretzky has demonstrated that it is unsound in some cases, the path-preference algorithm known as SHORTEST PATH REASONING, is actually the most intuitive algorithm to use when reasoning about the inheritance network which represents most of the conceptual structure of Head-Driven Phrase Structure Grammar (HPSG). In this thesis we describe the HPSG formalism and detail the inheritance hierarchy which we abstract from it. The network itself is interesting because it is cyclic and because it contains SUPERNODES. We specify the content of nodes (information structures encoded as attribute-value matrices) and the interpretation of links (the relative pseudocomplement relation) in the resulting inheritance hierarchy. The process of reasoning over the hierarchy is demonstrated, and the implications of this work for researchers in both unification grammars and inheritance reasoners are discussed. In particular, when it is applied to the inheritance network for HPSG, an inheritance reasoner functions as a parser for the grammar formalism. To the inheritance reasoning researcher this provides a semantically nontrivial application for representation using inheritance networks against which arguments about the intuitiveness of more complex path construction algorithms may be tested.
[Wang, 1990]: Author(s): Yuemin Wang.

Title: . Finding executable paths in protocol conformance testing.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 1990.
[Xu, 1990]: Author(s): Xiaomei Xu.

Title: . Extending Relational Database Management Systems for Spatiotemporal Information.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, July 1990.
[Yao, 1990]: Author(s): Benguang Yao.

Title: . On Extending the Generalised Hough Transform.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, July 1990.

1989

[Cai, 1989]: Author(s): Yandong Cai.

Title: . Attribute oriented induction in relational databases.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1989.
[cheung Chau, 1989]: Author(s): Siu cheung Chau.

Title: . Fault-tolerance in multi-computer networks.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, July 1989.
[Dykstra, 1989]: Author(s): Christine J. Dykstra.

Title: . The use of radon transforms in fully 3-dimensional positron volume imaging – a feasibility study.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 1989.
[Gunson, 1989]: Author(s): James Russell Gunson.

Title: . Multicasting in a high-level language.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 1989.
[Hutchings, 1989]: Author(s): Edward Hutchings.

Title: . Polynomial convergence in interior methods for linear programming problems.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1989.
[Kafhesh, 1989]: Author(s): Masoud Rostam Kafhesh.

Title: . VLSI-based hypermesh interconnection networks for array processing.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, June 1989.
[Kastelic, 1989]: Author(s): William Paul Kastelic.

Title: . The pair tree : a parallel architecture for image representation based on symmetric recursive indexing.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, January 1989.
[Khanna, 1989]: Author(s): Bakul G. Khanna.

Title: . A dynamic priority protocol for real-time applications using a token ring.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, February 1989.
[Laughlin, 1989]: Author(s): Robert G. Laughlin.

Title: . Approximating harmonic amplitude envelopes of musical instrument sounds with principal component analysis.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, October 1989.
[McDonald, 1989]: Author(s): Kenneth M. McDonald.

Title: . Smallest paths in polygons.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, May 1989.
[Morawetz, 1989]: Author(s): Claudia Morawetz.

Title: . A high-level approach to the animation of human secondary movement.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 1989.
[Yap, 1989]: Author(s): Steven Foo Loy Yap.

Title: . Tableau interpretation for a MILP problem using an interactive approach.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, January 1989.

1988

[Almstrom, 1988]: Author(s): Christopher P. Almstrom.

Title: . Distributed computation of transitive closure.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1988.
[Bailey, 1988]: Author(s): Douglas Bruce Bailey.

Title: . The use of semantic information in a distributed data structure.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1988.
[Bruderlin, 1988]: Author(s): Armin W. Bruderlin.

Title: . Goal-directed, dynamic animation of bipedal locomotion.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, November 1988.
[Cheung, 1988]: Author(s): David Wai-Lok Cheung.

Title: . A study of the availability and serializability in a distributed database system.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, January 1988.
[Chew, 1988]: Author(s): Clifford Francis Chew.

Title: . Design of SQUIREL.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, May 1988.
[Coady, 1988]: Author(s): Yvonne Coady.

Title: . An investigation of type-specific optimistic, pessimistic, and hybrid concurrency control.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1988.
[Cumming, 1988]: Author(s): David Cumming.

Title: . Cellsman: a simulation program based on cellular automata.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, March 1988.
[Dykes, 1988]: Author(s): Leland R. Dykes.

Title: . Enhancing the manipulative capabilities of a syntax-based editor.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, September 1988.
[Ezeife, 1988]: Author(s): Christiana Ezeife.

Title: . Design of a suitable target language for natural language interfaces to relational databases.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, March 1988.
[Gamble, 1988]: Author(s): Donald James Gamble.

Title: . EPLE: an effective procedural layout environment.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 1988.
[Haftevani, 1988]: Author(s): Garnik Bobloian Haftevani.

Title: . Efficient kernel-level reliable multicast communication in distributed systems.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1988.
[Ho, 1988]: Author(s): Jian Ho.

Title: . Chromatic aberration: a new tool for colour constancy.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 1988.
[Johnston, 1988]: Author(s): William Brent Johnston.

Title: . An empirical examination of exact algorithms for the cardinality Steiner problem.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1988.
[Joseph, 1988]: Author(s): Stefan W. Joseph.

Title: . Elimination of wasteful operations in natural language accesses to relational databases using a knowledge-based subsystem.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 1988.
[Klimo, 1988]: Author(s): Helena Klimo.

Title: . Computing convex hulls in higher dimensions.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1988.
[Massicotte, 1988]: Author(s): Pierre Massicotte.

Title: . Gerenation of conceptual graphs from functional structures.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1988.
[Mauro, 1988]: Author(s): David Joseph Mauro.

Title: . A real-time graphics conferencing system.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 1988.
[Wu, 1988]: Author(s): Paul L. C. Wu.

Title: . The design and performance study of binary transitive closure algorithms.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, July 1988.

1987

[Adolph, 1987]: Author(s): William Stephen Adolph.

Title: . Design of digital circuits using prototypical descriptions.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, May 1987.
[Ashraf, 1987]: Author(s): Mahboob Ashraf.

Title: . Stability of load sharing in a distributed computer system.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, September 1987.
[Brown, 1987]: Author(s): Charles Grant Brown.

Title: . Generating Spanish clitics with constrained discontinuous grammars.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, August 1987.
[Chung, 1987]: Author(s): Tony Chin Wah Chung.

Title: . An approach to human surface modelling using cardinal splines.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, February 1987.
[Datar, 1987]: Author(s): Rajendra R. Datar.

Title: . Test Sequence Generation for Network Protocols.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, May 1987.
[Groeneboer, 1987]: Author(s): R. Christine Groeneboer.

Title: . Tableau-based theorem proving for a conditional logic.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1987.
[Hamilton, 1987]: Author(s): Sharon Joyce Hamilton.

Title: . Using modal structures to represent extensions to epistemic logics.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1987.
[Hoskin, 1987]: Author(s): James D. Hoskin.

Title: . An APL subset interpreter for a new chip set.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 1987.
[Merks, 1987]: Author(s): Eduardus A. T. Merks.

Title: . Compilation using multiple source-to-source stages.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 1987.
[Mok, 1987]: Author(s): Simon Hon Ming Mok.

Title: . Disk I/O performance of linear recursive query processing.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, July 1987.
[Neumann, 1987]: Author(s): Dean W. Neumann.

Title: . An investigation of holographic and stereographic techniques for the three-dimensional display of medical imagery.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, May 1987.
[Prakash, 1987]: Author(s): Shiv Prakash.

Title: . Guiding design decisions in RT-level logic synthesis.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 1987.
[Ridsdale, 1987]: Author(s): Garfield John Ridsdale.

Title: . The Director's Apprentice: animating figures in a constrained environment.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, July 1987.
[Salehmohamed, 1987]: Author(s): Mohamed Salehmohamed.

Title: . Experimental analysis of LAN sorting algorithms.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 1987.
[Swinkels, 1987]: Author(s): Godfried M. Swinkels.

Title: . Schematic Generation.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1987.
[Terry, 1987]: Author(s): Brian William Terry.

Title: . Grammar-Based File Structure.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, November 1987.
[Tong, 1987]: Author(s): Frank C. H. Tong.

Title: . Specularity Removal for Shape-from-Shading.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, October 1987.
[Wang, 1987]: Author(s): Xiao Wang.

Title: . Query processing for distributed main memory database systems.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1987.
[Wu, 1987a]: Author(s): Joseph Chiu Leung Wu.

Title: . Program Debugging with Toolkits.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, September 1987.
[Wu, 1987b]: Author(s): Wuyi Wu.

Title: . Heuristic bounds for automated logic synthesis.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, January 1987.

1986

[Cumming, 1986]: Author(s): Steven G. Cumming.

Title: . Axiomatising the logic of distributed computation.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, June 1986.
[en Xie, 1986]: Author(s): Shun en Xie.

Title: . Incremental construction of 3-D models of a scene from sequentially planned views.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, December 1986.
[Franklin, 1986]: Author(s): Paul Franklin.

Title: . Optimal rectangle covers for convex rectilinear polygons.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, March 1986.
[Hall, 1986]: Author(s): Gary W. Hall.

Title: . Querying cyclic databases in natural language.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1986.
[Kao, 1986]: Author(s): Mimi A. Kao.

Title: . Turning null responses into quality responses.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1986.
[Lee, 1986]: Author(s): Anson Yiu Cho Lee.

Title: . Optimizing hardware utilization of parallel-processing bit-map graphics displays.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1986.
[Li, 1986]: Author(s): Brenda Yuk-Yee Li.

Title: . Graph theoretic controlled rounding.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, March 1986.
[Ling, 1986]: Author(s): Franky Siu Ming Ling.

Title: . Byzantine agreement and network failures.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 1986.
[Mahajan, 1986]: Author(s): Sanjeev Mahajan.

Title: . Sequential and parallel optimization algorithms for recursive graphs.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1986.
[Pun, 1986]: Author(s): Philip Sau-Tak Pun.

Title: . Distributed algorithms for cycle finding and matching.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 1986.
[Speakman, 1986]: Author(s): Tony Speakman.

Title: . Load sharing strategies for a distributed computer system.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1986.
[Strzalkowski, 1986]: Author(s): Tomasz Strzalkowski.

Title: . A theory of stratified meaning representation for natural language.

Ph.D. Thesis, School of Computing Science, Simon Fraser University, July 1986.
[Sy, 1986]: Author(s): Carrie Sy.

Title: . Efficient schedulers in multiversion database systems.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 1986.
[Zorbas, 1986]: Author(s): John Zorbas.

Title: . Applications of set addition to computational geometry and robot motion planning.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, May 1986.

1985

[Chilka, 1985]: Author(s): Pradeep Chilka.

Title: . PERT: A pipelined engine for ray tracing graphics.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, May 1985.
[Fu, 1985]: Author(s): Ada Wai-Chee Fu.

Title: . Parallel matroid algorithms.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, May 1985.
[Gaudet, 1985]: Author(s): Severin Gaudet.

Title: . A parallel architecture for ray tracing.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, May 1985.
[Gudaitis, 1985]: Author(s): John Jonas Gudaitis.

Title: . Evaluation of some distributed function architectures for array processing data manipulation.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, July 1985.
[Hwang, 1985]: Author(s): Enoch Oi-Kee Hwang.

Title: . Experimental analysis of broadcasting algorithms to perform set operations.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, July 1985.
[Kloster, 1985]: Author(s): Stephen C. Kloster.

Title: . ELFS: English Language From SQL.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1985.
[Pellicano, 1985]: Author(s): Paul N. E. Pellicano.

Title: . Detection of specularities in colour images using local operators.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1985.
[Popowich, 1985]: Author(s): Fred P. Popowich.

Title: . Unrestricted gapping grammars: theory, implementations, and applications.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, July 1985.
[Samson, 1985]: Author(s): Louise Samson.

Title: . Graphic design with colour using a knowledge base.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, March 1985.
[Slawson, 1985]: Author(s): Frederick John Slawson.

Title: . Polygon partitioning for electron beam lithography of integrated circuits.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, May 1985.
[Styan, 1985]: Author(s): Roy Hamilton Styan.

Title: . Preservation of code quality in source-to-source translation: Bliss -> C.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1985.
[wah Ling, 1985]: Author(s): Daniel Kwai wah Ling.

Title: . Polling and receiving in graphs.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1985.

1984

[Bleackley, 1984]: Author(s): Philip Scott Bleackley.

Title: . Some design considerations for commercial exploitation of (interactive) microcomputer based decision support systems.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1984.
[Cheung, 1984]: Author(s): David Wai-Lok Cheung.

Title: . Site-optimal termination protocols for network partitioning in a distributed database.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1984.
[cheung Chau, 1984]: Author(s): Siu cheung Chau.

Title: . New methods to generate minimal broadcast networks and fault-tolerant minimal broadcast networks.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 1984.
[Lo, 1984]: Author(s): Arthur Pui Wing Lo.

Title: . A diagrammatic reasoning approach to automatic wire routing.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, April 1984.
[Martin, 1984]: Author(s): Frederick J. Martin.

Title: . Towards the automated evaluation of interactive systems: a quality-assessment approach.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1984.
[Yong, 1984]: Author(s): Pau Yen Yong.

Title: . Minimization of Page Fetches in Query Processing in Relational Databases.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, January 1984.

1983

[Bryant, 1983]: Author(s): Edwin C. Bryant.

Title: . Computer interpretation of computed tomography scans of sawlogs.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, July 1983.
[Hadley, 1983]: Author(s): Robert Francis Hadley.

Title: . A natural language query system for a Prolog database.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, December 1983.
[Leung, 1983]: Author(s): Kam Nang Hareton Leung.

Title: . Spline collocation for solving time discretized parabolic problems in one space variable.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1983.

1982

[Black, 1982]: Author(s): Patrick Allen Black.

Title: . Query processing on distributed database systems.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, January 1982.
[Farrag, 1982]: Author(s): Abdel Aziz Farrag.

Title: . Improved multi-version scheme for controlling concurrent accesses to a database system.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, August 1982.
[Krause, 1982]: Author(s): Max Martin Krause.

Title: . Perfect hash function search with applications to computer lexicon design.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, February 1982.
[Snyder, 1982]: Author(s): Warren Stuart Snyder.

Title: . MAPLE: a Multiprocessor APL machine.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, February 1982.

1981

[Strothotte, 1981]: Author(s): Thomas Strothotte.

Title: . Fast raster graphics using parallel processing.

M.Sc. Thesis, School of Computing Science, Simon Fraser University, June 1981.