Research

Team members: M. H. Alalfi J. R. Cordy and T.R. Dean

In 2016, I completed a project with four years contact position as senior research scientist with NECSIS Automotive Partnership, the Network of Engineering Complex Software Intensive Systems, an academic-industrial collaboration involving General Motors, IBM Canada, Malina Software Corp. and several Canadian universities. Together with Professor James Cordy (PI) and the team from Queen’s university we developed a framework for model pattern engineering. The developed framework comprises three phases corresponding to three broad tasks: discovery, formalization and application. The discovery stage identifies common submodel patterns in a large example model set obtained from our industrial partners at General Motors. The formalization phase organizes and generalizes the identified concrete patterns into a formal taxonomy of generic sub-patterns that can cover the models of our example set. The Application phase validates our taxonomy against a large set of other models in the original and related domains, and apply the formalization to the tasks of automating model generation and deployment. our developed framework and tools received a wide interest from our academic and industrial partners and was released for production use by GM. The framework and the developed tool, SIMONE, collect thousands of simulink\stateflow models, parse them, clean them using several modules of filtration, sorting and other normalization steps, then our analysis which mainly uses an adapted algorithm from textual data mining extracts and identifies model patterns. The patterns were then visualized using SIMNAV another tool we developed to visualize the analysis results. To date, the project generated twelve research papers at premier conferences and journals. Tools developed in support for the project, SimNav, Simone, and SimGraph.

The Framework was first developed and tested on Simulink models, after which, and to experiment with the approach generalizability, it was extended to handle different type of models, namely stateflow and UML behavioural models. For those extensions, two MSc students were trained, one toward model clone detection for stateflow and the other student on model clone detection for UML behvioural modes. Both students were under my co-supervision with Profs James Cordy and Thomas Dean.

Clone Detection in Matlab Stateflow Models

In stateflow models, users are allowed to embed Statecharts as components in a Simulink model. These state machines contain nested states, an action language that describes events, guards, conditions, actions, and complex transitions. As Stateflow has become increasingly important in Simulink models for the automotive sector, we extend the above mentioned work on clone detection of Simulink models to Stateflow components. While Stateflow models are stored in the same file as the Simulink models that host them, the representations differ. Our approach incorporates a pre-transformation that converts the Stateflow models into a form that allows us to use the SIMONE model clone detector to identify candidates and cluster them into classes. In addition, we push the results of the Stateflow clone detection back into the Simulink models, improving the accuracy of the clones found in the host Simulink models. We validated our approach on the MATLAB Simulink/Stateflow demo set. Our approach showed promising results on the identification of Stateflow clones in isolation, as well as integrated components of the Simulink models that are hosting them.

Detecting Patterns of Access Control Security Risks in interactive Systems

In UML behavioural models, little has been done on similarity in the dynamic behaviour of interactive systems. This project targets the identification of near-miss interaction clones in reverse-engineered UML sequence diagrams. Our goal is to identify patterns of interaction (“conversations”) that can be used to characterize and abstract the run-time behaviour of web applications and other interactive systems. In order to leverage existing robust near-miss code clone technology, our approach is text-based, working on the level of XMI, the standard interchange serialization for UML. Clone detection in UML behavioural models, such as sequence diagrams, presents a number of challenges— first, it is not clear how to break a continuous stream of interaction between lifelines (representing the objects or actors in the system) into meaningful conversational units. Second, unlike programming languages, the XMI text representation for UML is highly non-local, using attributes to reference-related elements in the model file remotely. In this work, we use a set of contextualizing source transformations on the XMI text representation to localize related elements, exposing the hidden hierarchical structure of the model and allowing us to granularize behavioural interactions into conversational units. Then we adapt NICAD, a robust near-miss code clone detection tool, to help us identify conversational clones in reverse-engineered behavioural models. These conversational clones are then analysed to find worrisome interactions that may indicate security access violations.

Using Mutation Analysis for a Model-Clone Detector Comparison Framework

In order to validate our results from the model pattern engineering project, and to compare results from our approach with other state of the art techniques that target the same research questions, it deemed necessary to design a framework for that purpose. We have developed a mutation-analysis based model-clone detection framework that attempts to automate and standardize the process of comparing multiple Simulink model-clone detection tools or variations of the same tool. By having such a framework, new research directions in the area of model-clone detection can be facilitated as the framework can be used to validate new techniques as they arise. We begin by presenting challenges unique to model-clone tool comparison including recall calculation, the nature of the clones, and the clone report representation. We propose our framework, which we believe addresses these challenges. This is followed by a presentation of the mutation operators that we plan to inject into our Simulink models that will introduce variations of all the different model clone types that can then be searched for by each respective model-clone detector. This framework trained one PhD student where I served as a second supervisor for his work. I mainly proposed the research idea, and worked with the student through the development process of the framework. I co-authored 3 papers with the student on this work.

Variability Identification and Representation for Automotive Simulink Models

Identification for model patterns engineering leads to the need for the identification and representation of variability in models, to that end we developed a semi-automated framework for identifying and representing different kinds of variability in Simulink models. Based on the observed variants found in similar subsystem patterns inferred using Simone, a text-based model clone detection tool, we propose a set of variability operators for Simulink models. By applying these operators to six example systems, we are able to represent the variability in their similar subsystem patterns as a single subsystem template directly in the Simulink environment. The product of our framework is a single consolidated subsystem model capable of expressing the observed variability across all instances of each inferred pattern. The process of pattern inference and variability analysis is largely automated and can be easily applied to other collections of Simulink models. The framework is aimed at providing assistance to engineers to identify, understand, and visualize patterns of subsystems in a large model set. This understanding may help in reducing maintenance effort and bug identification at an early stage of the software development.

We developed a security analysis framework for dynamic web applications. A reverse engineering process is performed on an existing dynamic web application to extract a Role-based access-control security model. A formal analysis is applied on the recovered model to check access-control security properties. This framework can be used to verify that a dynamic web application conforms to access control polices specified by a security engineer. The framework comprises the following sub-projects:

Automated Testing of Role-based Access Control Security Models in Dynamic web applications: Two approaches and two tools

Designed and implemented an approach to automatically constructed a Role-based access control security model from the recovered structural and behavioral models. We use TXL to implement the automatic model to model transformation and composition. The generated model is also represented in the UML 2.1 exchange format, XMI 2.1. In the last part, we developed, based on model-to-model transformation approach, a tool to transform the semi- formal UML 2.1 security model into a formal model to ease the process of verifying the system against security properties.

Fine-grained Dynamic Analysis of Web Applications for Cyber-Security: Three approaches and three tools:

Designed and implemented an approach to automatically instrument dynamic web applications using source transformation technology, and to recover a sequence diagram from execution traces generated by the resulting instrumentation. Using an SQL database to store generated execution traces, our approach automatically filters traces to reduce redundant information that may complicate program understanding. The dynamic analysis is supported by our developed automated instrumentation coverage approach to decrease the percentage of false positives. In support of this approach we proposed a set of new coverage metrics, specialized for dynamic web applications. In addition, we performed a great deal of analysis on the embedded database interaction in the host application. This includes automated distilling of the SQL embedded system, analyzing it, and modeling it as a part of the whole system.

Reverse Engineering of UML-ER Diagrams from Relational Database Schemas with Application to Security Analysis: Approach and a tool:

Designed and implemented an automated transformation from an SQL (DDL) schema to an open XMI 2.1 UML adapted class model. The adapted model is a tailored UML class model to represent the basic ER diagram components, including entities, attributes, relations, and primary keys. 2. In this project, we investigate the use of a clone detector to identify known Android malware. We collect a set of Android applications known to contain malware and a set of benign applications. We extract the Java source code from the binary code of the applications and use NiCad, a near-miss clone detector, to find the classes of clones in a small subset of the malicious applications. We then use these clone classes as a signature to find similar source files in the rest of the malicious applications. The benign collection is used as a control group. In our evaluation, we successfully decompile more than 1000 malicious apps in 19 malware families. Our results show that using a small portion of malicious applications as a training set can detect 95% of previously known malware with very low false positives and high accuracy at 96.88%. Our method can effectively and reliably pinpoint malicious applications that belong to certain malware families.

Creative Research in Security and Software Engineering Technology

Model-Driven Software Engineering for Automotive Systems (Model Pattern Engineering: Discover, Catalog and Formalize Sub-model Patterns)-Data analytics project