Midnight Researcher Notes: February 2009

Thursday, February 26, 2009

Introduction to Mining Software Engineering Data

Tao Xie | North Carolina State University | xie@csc.ncsu.edu

Ahmed E. Hassan | University of Victoria | ahmed@uvic.ca

Part I

Mining Software Engineering Data goals:

Transform static record-keeping SE data to active data.
Make SE data actionable by uncovering hidden patterns and trends.

Uses of mining SE data:

Gain empirically-based understanding of software development.
Predict, plan, and understand various aspects of a project.
Support future development and project management activities.

Types of SE Data:

Historical data

Used primarily for record-keeping activities (checking the status of a bug, retrieving old code)

Version or source control:
- cvs, subversion, perforce.
- Store changes to the data

Bug systems:
- bugzilla, GNATS, JIRA.
- Follow the resolution of defects.

Mailing lists:
- Mbox
- Record rationale for decisions throughout the life of a project.

Multi-run and Multi-site data
- Execution traces
- Deployment logs

Software Maintenance Activities

Perfective: add new functionality
Corrective: fix faults
Adaptive: new file formats, refactoring

Source Control Repositories

A source control system tracks changes to ChangeUnits.

Examples of ChangeUnits:

File
Function
Dependency (e.g. Call )

For each ChangeUnit, it tracks the developer, time, change message, co-changing Units.

Change Propagation

Measuring Change Propagation

We want:

High precision to avoid wasting time
High recall to avoid bugs

Guiding Change Propagation

Mine association rules from change history.

Use rules to help propagate changes:

Recall as high as 44%
Precision around 30%

High precision and recall reached in < 1mth

Prediction accuracy improves prior to a release (i.e., during maintenance phase)

Code Sticky Notes

Traditional dependency graphs and program understanding models usually do not use historical information.

Static dependencies capture only a static view of a system - not enough detail!

Development history can help understand the current structure (architecture) of a software system.

Studying Conway's Law

"The structure of a software system is a direct reflection of the structure of the development team"

Predicting Bugs

Studies have shown that most complexity metrics correlate well with LOC ! (Lines of Code)

Noteworthy findings:

Previous bugs are good predictor of future bugs.
The more a file changes, the more likely it will have bugs in it.
Recent changes affect more the bug potential of a file over older changes (weighted time damp models)
Number of developers is of little help in predicting bugs.
Hard to generalize bug predictors across projects unless in similar domains.

Example 1 : using imports in Eclipse to predict bugs

71% of files that import compiler packages, had to be fixed later on.
14% of all files that import ui packages, had to be fixed later on.

Example 2 : don't program on fridays

Percentage of bug-introducing changes for eclipse, most high in Friday.

Classifying changes as Buggy or Clean

Given a change can we warn a developer that there is a bug in it ?

Recall/Precision in 50-60% range.

Project Communication - Mailing lists

Most open source projects communicate through mailing lists or IRC channels.

Rich source of information about the inner workings of large projects.

Discussion cover topics such as future plans, design decision, project policies, code or path reviews.

Social network analysis could be performed on discussion threads.

Social Network Analysis

Mail list activity

Strongly correlates with code change activity.
Moderately correlates with document change activity.

Social network measures (in-degree, out-degree, between's) indicate that committers play much more significant roles in the mailing list community that non-committers.

Immigration rate of developers

When will a developer be invited to join a project?

Expertise vs. interest

The patch review process

Two review styles

RTC : Review-then-Commit
CTR : Commit-then-Review

80% patches reviewed within 3.5 days and 50% reviewed in < 19 hrs

Measure a team's morale around release time

Study the content of messages before and after release.

Use dimensions from a psychometric text analysis tool.

Program Source Code

Code Entities

Mining API Usage Patterns

How should an API be used correctly?

An API may serve multiple functionalities --> Different styles of API usage

"I know what type of object I need, but I don't know how to write the code to get the object"

Can we synthesize jungloid code fragments automatically?
Given a simple query describing the desired code in terms of input and output types, return a code segement.

"I know what method call I need, but I don't know how to write code before and after this method call"

Relationships between Code Entities

Mine framework reuse patterns
- Membership relationships
  - A class contains membership functions
- Reuse relationships
  - Class inheritance / instantiation
  - Function invocations / overriding

Mine software plagiarism
- Program dependence graphs

Program Execution Traces

Method-Entry/Exit States

Goal: Mine specifications (pre/post conditions) or object behavior (object transition diagrams)

State of an object: values of transitively reachable fields.

Method-entry state: Receiver-object state, method argument values.

Method-exit state: Receiver-object state, updated method argument values, method return value.

Other Profiled program states

Goal: detect or locate bugs.

Values of variables at certain code locations

Object/static field read/write

Method-call arguments

Method returns

Sampled predictions on values of variables

Executed Structural Entities

Goal: Locate bugs.

Executed branches/paths, def-use pairs.

Executed function/method calls.

Group methods invoked on the same object

Profiling options

Execution hit vs. count

Execution order (sequences)

Part II

How can you mine Software Engineering data?

Overview of data mining techniques

Association rules and frequent patterns

Classification

Clustering

Misc.

Association Rules

Example:

Finding highly correlated method call pairs.

Check the revisions (fixes to bugs), find the pairs of method calls whose confidences have improved dramatically by frequent added fixes.

Those are the matching method call pairs that may often be violated by programmers

Conflicting Patterns

999 out of 1000 times spin_lock is followed by spin_unlock

The single time that spin_unlock does not follow may likely be an error.

We can detect an error without knowing the correctness rules.

Detect Copy-Paste Code

Apply closed sequential pattern mining techniques.

Customizing the techniques:

A copy-paste segment typically does not have big gaps.
- Use a maximum gap threshold to control.

Output the instances of patterns ( i.e., the copy-pasted code segments) instead of the patterns.
Use small copy-pasted segments to form larger ones.
Prune false positives: tiny segments, unmappable segments, overlapping segments, and segments with large gaps.

Find Bugs in Copy-Pasted Segments

For two copy-pasted segments, are the modifications consistent?

Identifier a in segment S1 is changed to b in segment S3 3 times, but remains unchanged once - likely a bug
The heuristics may not be correct all the time

The lower the unchanged rate of an identifier, the more likely there is a bug.

Mining Rules in Traces

Mining association rules or sequential patterns S --> F, where S is a statement and F is the status of program failure.

The higher the confidence, the more likely S is faulty or related to a fault.

Using only one statement at the left side of the rule can be misleading, since a fault may led by a combination of statements.

Frequent patterns can be used to improve.

Mining Emerging Patterns in Traces

A method executed only in failing runs is likely to point to the defect.

Comparing the coverage of passing and failing program runs helps.

Mining patterns frequent in failing program runs but infrequent in passing program runs.

Sequential patterns may be used.

Classification

Classification: A 2-step Process

Model construction: describe a set of predetermined classes

Training dataset: tuples for model construction
- Each tuples/sample belongs to a predefined class

Classification rules, decision trees, or math formulae

Model application: classify unseen objects

Estimate accuracy of the model using an independent test set.
Acceptable accuracy --> apply the model to classify tuples with known class labels.

Supervised learning (Classification)

Supervision: objects in the training data set have labels
New data is classified based on the training set

Unsupervised learning (Clustering)

The class labels of training data are unknown
Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data.

GUI-Application Stabilizer

Given a program state S and an event e, predict whether e likely results in a bug

Positive samples: past bugs
Negative samples: not bug reports

A K-NN based approach

Consider the k closest cases reported before
Compare sum 1/d for bug cases and not-bug cases, where d is the similarity between the current state and the reported states.
If the current state is more similar to bugs, predict a bug.

Clustering

What is clustering ==> group data into clusters.

Similar to one another within the same cluster.

Dissimilar to the objects in other clusters.

Unsupervised learning: no predefined classes.

Clustering and Categorization

Software categorization

Partitioning software systems into categories

Categories predefined - a classification problem

Categories discovered automatically - a clustering problem

Software Categorization - MUDABlue

Understanding source code

Use Latent Semantic Analysis (LSA) to find similarity between software systems.
Use identifiers (e.g., variable names, function names) as features
- "gtk_window" represents some window
- The source code near "gtk_window" contains some GUI operation on the window.

Extracting categories using frequent identifiers
- "gtk_window", "gtk_main", and "gpointer" --> GTK related software system
- Use LSA to find relationships between identifiers

Other Mining Techniques

Automation/grammar/regular expression learning

Searching/matching

Concept analysis

Template-based analysis

Abstraction-based analysis

Sunday, February 22, 2009

Center of Innovation & Competitiveness (INCOM) - Nile University

Nile University Center for Innovation and Competitiveness (NU/INCOM) is primarily focused on identifying, researching and promoting innovation practices that have improved competitiveness at the company, industry and country levels, with special emphasis on Egypt and the MENA region. Innovation is one of the most important competitive priorities in firms and in nations, and it is a major driving force for change in today's world. It is critical in the formulation of successful manufactoring strategies for nations, at the micro and macro levels, and in enhancing their economic development and global market positioning. Innovation guides any business, small, medium or large, in its ability to successfully compete in the global market, thus impacting the international competitiveness of firms and nations.

Strategic policy planning and implemention in modern governments requires technological foresight and database and system modeling capabilities upon which policies and initiatives for national priorities can be determined. In this area INCOM will focus on developing database and modeling tools to aide government agancies in:

Developing technology foresight capability
Policy analysis capability
Strategic policy evaluation capabilities

Strategies for competitiveness in business firms require efficient and effective use of technological and business resources and keen understanding of the global market and its dynamics. CIC will focus on the firms' basic competitive priories including:

Providing innovative products/services that compete favorably with competition.
Predictive benchmarking of competitor's products and services.
Producing products/services with high quality performance standards.
Producing and distributing products/services at a competitive price.
Meeting delivery scheduling and reacting quickly to customer schedule changes.
Reacting to changes in market needs and in product requirements.
Offering a broad platform of services to boost customer satisfaction.

Through NU Executive Development Center, INCOM will also provide training and research services to encourage and facilitate entrepreneurship capabilities in small, medium and large business enterprises in a global context.

Thursday, February 5, 2009

Bug Counts vs. Test Coverage

What to Do When Bug Counts Don’t Speak for Themselves?

Bug counts on a project speak volumes about the quality of testing for a particular product and how vigorous the test team is working to "assure quality." Bug counts are invariably a primary area of test metrics that are reported to management. What is the rationale behind drawing so much attention to the number of bugs being found through the course of a project?
I have heard it said that QE’s job is to find bugs. If this is the assumption of management, bug counts will be an important indicator to them that QE is doing its job. They expect to see bug counts rise dramatically in the early stages of testing, and they expect to see the find rate decrease as the project comes to an end. These are management’s statistical expectations when they believe bug counts are a metric to assess quality of testing.
If high bug counts, then, are an indicator that quality is going up, low bug counts can be seen as an indicator that something just isn’t right with the testing process. Management might imagine different problems that are preventing bugs from being found:

Test coverage isn’t complete; maybe major areas of functionality aren’t being tested.
Testing is only scratching the surface of all functionality, not digging in to the real complexities of the code.
Our testers just aren’t that good.

Management might see red flags when bug counts are low, but a number of causes may contribute to low bug counts. On the second or third iteration of a product, the bulk of the defects may have been found on an earlier cycle. Or especially good development practices may have been implemented: strong unit testing, code reviews, good documentation, and not working developers to death. These are supposed to result in lower bug counts.
Ultimately, however, QE will justify low bug counts when it can justify its test coverage. If the product under test is being tested with thorough coverage, the bug count should be treated only as a supporting statistic, not the primary one. After all, we all know that a quality product hasn’t been reached when a certain bug count is reached. Quality is achieved when test coverage is maximized and bug finds decrease to a minimum.
There are several things you can do when bug counts are low and management is questioning the quality of testing:

Take stock. Call a meeting with your test team, go through the areas of test, possibly even some test cases themselves, and get a general feel for how much test coverage you really have. Maybe you’ll discover that an area of test really is being missed. Perhaps there is some misunderstanding of who should be testing what and some functionality fell between the cracks. Brainstorm more testing methods and techniques, and generate ideas of how your team can broaden the testing efforts. Before going to other groups or departments, get a solid understanding of where your team is in the process.
Talk to development. Go over your current test coverage with development, and see if they have any input on areas you might also investigate. Ask them what the trouble spots are, if they can suggest lower-level tests that may ferret out more bugs, and possibly even conduct a test case review with them. On my last project, we sent out the test cases of a certain functionality to the appropriate developer for review. Though many times developers can be reluctant to help testers, demonstrate to them that it is in their best interest that we thoroughly test their code—if it’s solid, they have nothing to worry about.
Communicate with management. When bug counts are low, use test coverage to justify them. This doesn’t mean dismissing the fact that the bug count is low. It means using the bug count as an indicator to do some analysis into the testing practices you are doing, and verifying that high test coverage is being achieved. If it is, explain to management your findings. Demonstrate by solid metrics that you are performing thorough testing, that you can’t force bug counts to go up, and that maybe—just maybe—a low bug count means you’ve got a quality product on your hands!

One thing to bear in mind: while you can use the above methods during testing cycles to understand and cope with a low bug count, the ideas are still applicable before testing even begins, while test cases are being written for a project, and while development is still in full swing. Good test coverage is something to be planned ahead of time, and having gone through the effort of mapping coverage and functional test cases early in the project, you will prevent yourself from spending valuable testing cycles repeating tasks.
While low bug counts can cause people in both development and management to question the effectiveness of the testing, do not be defensive about it. Use it as a trigger to prove what you should already know—your testing efforts are appropriate, effective, and your coverage is maximized. Don’t let your bug counts do the talking—your test coverage should say it all.

7 Habits of Highly Insecure Software

Habit # 1: Poorly Constrained Input

By far, the number one cause of security vulnerabilities in software stems from the failure to properly constrain input. The most infamous security vulnerability resulting from this habit is the buffer overflow. Buffer overflows happen when application developers use languages (like C and C++) that allow them to allocate a fixed amount of memory to hold some user-supplied data. This usually doesn’t present a problem when input is properly constrained or when input strings are of the length that developers expected. When data makes it past these checks, though, it can overwrite space in memory reserved for other data, and in some cases force commands in the input string to be executed. Other unconstrained input can cause problems, too, like escape characters, reserved words, commands, and SQL (Structured Query Language) statements.

Habit # 2: Temporary Files
Usually we think of the file system as a place to store persistent data; information that will still be there when the power is shut off. Applications, though, also write out temporary files—files that store data only for a short period and then are deleted. Temporary files can create major security holes when sensitive data is exposed. Common (inappropriate) uses of temp files include user credentials (passwords), unencrypted but sensitive information (CD-keys), among others.

Habit # 3: Securing Only the Most Common Access Route
How many ways could you open a text document in Windows? You could double-click on the file in Windows Explorer; or open your favorite text editor, and type the file name in the open dialog; or type the file name into an Internet Explorer window. The truth is, if you put your mind to it, you could think of at least a dozen ways to open that file. Now imagine implementing some security control on that document. You would have to think of every possible access route to the document, and chances are, you’re likely to miss a few. Developers fall into this dilemma too. When requirements change, or when a new application version is being developed, security controls are often “added-on” to an application. Also, when a security bug is reported, developers may patch the application to fix the particular input sequence reported and still leave other, underused access routes unprotected. The result: the reappearance of supposedly fixed bugs or alternate access routes that bypass security mechanisms.

Habit # 4: Insecure Defaults
We are all guilty of the mortal sin of clicking “Next” or “Finish” on an installation wizard without reading the details and just accept recommended configurations. But is it a sin? The application’s developers and testers know more about the application than we do, so it seems natural not to worry about awkward installation options and just accept defaults. Most users think this way and I can’t say that I blame them. So what does this mean for security-conscious testers? It means that we need to ensure security out of the box. We have to make sure that default values err on the side of security, and that insecure configurations are appropriately explained to users.

Habit # 5: Trust of the Registry and File System Data
When developers read information from the registry, they trust that the values are accurate and haven’t been tampered with maliciously. This is especially true if their code wrote those values to the registry in the first place. One of the most extreme vulnerabilities is when sensitive data, such as passwords, is stored unprotected in the registry. We have found that passwords, configuration options, CD keys, and other sensitive data are often stored unencrypted in the registry—ripe for the reading.

Habit # 6: Unconstrained Application Logic
It’s pretty clear that we need to examine individual functions to make sure that they are secure. If a feature used in a Web browser is not supposed to allow the reading of any file except a cookie, then there’s a pretty good chance that a test case was run to verify that. Features are not likely to be as well constrained when they are combined or when commands are executed in a loop. Constraining loops can be an exquisitely difficult programming task. Many denial of service attacks are made possible by getting some benign function (such as one that writes out a cookie) to execute over and over again and consume system resources.

Habit # 7: Poor Security Checks with Respect to Time
The ideal situation is that every time sensitive operations are performed, checks are made to ensure they will succeed securely. If too much time lapses between time-of-check and time-of-use, then the possibility for the attacker to get in the middle of such a transaction must be considered. It is the old “bait and switch” con applied to computing: Bait the application with legitimate information, and then switch that information with illegitimate data before the application notices.

Using these seven habits as a guideline for your software project will help ensure a successful outcome. There’s no such thing as 100 percent bug free software. Our goal, however, is to get as close as possible.