Accepted papers
Full papers
- Edit Constraints on Microaggregation and Additive Noise. Isaac Cano and Vicenc Torra.
Abstract Privacy preserving data mining and statistical disclosure control propose several perturbative methods to protect the
privacy of the respondents. Such perturbation can introduce inconsistencies to the sensitive data. Due to this, data editing techniques are used in order
to ensure the correctness of the collected data before and after the anonymization. In this paper we propose a methodology to protect microdata based on
noise addition that takes data edits into account. When the addition of noise causes a constraint to fail, a process of noise swapping preserves the
edit constraint. We check its suitability against the constrained microaggregation, a method for microaggregation that avoid the introduction of
such inconsistencies.
Link to full paper in PDF format.
- Quadratic Error Minimization in a Distributed Enthronement with Privacy Preserving. Gerald Gavin and Julien Velcin.
Abstract In this paper, we address the issue of privacy preserving data-mining. Specifically, we consider a scenario where each member
j of T parties has its own private database. The party j builds a private classifier h_j for predicting a binary class variable y. The aim of this
paper consists in aggregating these classifiers h_j in order to improve the individual predictions. Precisely, the parties wish to compute an efficient
linear combinations over their classifier in a secure manner.
Link to full paper in PDF format.
- Secure Top-k Subgroup Discovery. Henrik Grosskreutz, Benedikt Lemmen and Stefan Ruping.
Abstract Supervised descriptive rule discovery techniques like subgroup discovery are quite popular in applications like fraud
detection or clinical studies. Compared with other descriptive techniques, like classical support/confidence association rules, subgroup discovery
has the advantage that it comes up with only the top-k patterns, and that it makes use of a quality function that avoids patterns uncorrelated with the
target. If these techniques are to be applied in privacy-sensitive scenarios involving distributed data, precise guarantees are needed regarding the
amount of information leaked during the execution of the data mining. Unfortunately, the adaptation of secure multi-party protocols for classical
support/confidence association rule mining to the task of subgroup discovery is impossible for fundamental reasons. The source is the different
quality function and the restriction to a xed number of patterns -- i.e. exactly the desired features of subgroup discovery. In this paper,
we present a new protocol which allows distributed subgroup discovery while avoiding the disclosure of the individual databases. We analyze the
properties of the protocol, describe a prototypical implementation and present experiments that demonstrate the feasibility of the approach.
Link to full paper in PDF format.
- ASAP: Automatic Semantics-Aware Analysis of Network Payloads. Tammo Krueger, Nicole Kramer and Konrad Rieck.
Abstract Automatic inspection of network payloads is a prerequisite for effective analysis of network communication. Security
research has largely focused on network analysis using protocol specifications, for example for intrusion detection, fuzz testing and forensic analysis.
The specification of a protocol alone, however, is often not sufficient for accurate analysis of communication, as it fails to reflect individual
semantics of network applications. We propose a framework for semantics-aware analysis of network payloads which automatically extracts semantics-aware
components from recorded network traffic. Our method proceedsby mapping network payloads to a vector space and identifying communication templates
corresponding to base directions in the vector space. We demonstrate the efficacy of semantics-aware analysis in different security applications:
automatic discovery of patterns in honeypot data, analysis of malware communication and network intrusion detection.
Link to full paper in PDF format.
- SBAD: Sequence Based Attack Detection via Sequence Comparison. Ching-Hao Mao, Hsing-Kuo Pao, Christos Faloutsos and Hahn-Ming Lee.
Abstract Given a stream of time-stamped events, like alerts in a network monitoring setting, how can we isolate a sequence of alerts
that form a network attack? We propose a Sequence Based Attack Detection (SBAD) method, which makes the following contributions: (a) it automatically
identifies groups of alerts that are frequent; (b) it summarizes them into a suspicious sequence of activity, representing them with graph structures;
and (c) it suggests a novel graph-based dissimilarity measure. As a whole, SBAD is able to group suspicious alerts, visualize them, and spot anomalies at
the sequence level. The evaluations from three datasets -- two benchmark datasets (DARPA 1999, PKDD 2007) and a private dataset (Acer 2007) gathered from a
Security Operation Center in Taiwan -- support our approach. SBAD can achieve around 80% accuracy or higher in the DARPA and Acer datasets and 90%
accuracy in the PKDD dataset, which is superior to the best result of the PKDD 2007 challenge. The method performs well even without the help of the IP
and payload information. No need for privacy information as the input makes the method easy to plug into existing system such as an intrusion detector.
To talk about e±ciency, the proposed method can deal with large-scale problems, such as processing 300K alerts within 20 mins on a regular PC.
Link to full paper in PDF format.
- Temporal Defences for Robust Recommendations. Neal Lathia, Stephen Hailes and Licia Capra.
Abstract Recommender systems are vulnerable to attack: malicious users may deploy a set of sybils (pseudonymous, automated entities)
to inject ratings in order to damage or modify the output of Collaborative Filtering (CF) algorithms. To protect against these attacks, previous
work focuses on designing sybil prole classification algorithms, whose aim is to find and isolate sybils. These methods, however, assume that
the full sybil profiles have already been input to the system. Deployed recommender systems, on the other hand, operate over time, and recommendations may
be damaged while sybils are still injecting their profiles, rather than only after all malicious ratings have been input. Furthermore, system
administrators do not know when their system is under attack, and thus when to run these classication techniques, thus risking to leave their recommender
system vulnerable to attacks. In this work, we address the problem of temporal sybil attacks, and propose and evaluate methods for monitoring global,
user and item behaviour over time, in order to detect rating anomalies that reflect an ongoing attack.
Link to full paper in PDF format.
- Large Margin Multiclass Gaussian Classification with Differential Privacy. Manas Pathak and Bhiksha Raj.
Abstract As increasing amounts of sensitive personal information is aggregated into data repositories, it has become important to
develop mechanisms for processing the data without revealing information about individual data instances. The differential privacy model provides a
framework for the development and theoretical analysis of such mechanisms. In this paper, we propose an algorithm for learning a discriminatively
trained multi-class Gaussian classifier that satisfies differential privacy using a large margin loss function with a perturbed regularization term.
We present a theoretical upper bound on the excess risk of the classifier introduced by the perturbation.
Link to full paper in PDF format.
- Privacy-Preserving Protocols for Eigenvector Computation. Manas Pathak and Bhiksha Raj.
Abstract In this paper, we present a protocol for computing the principal eigenvector of a collection of data matrices belonging
to multiple semi-honest parties with privacy constraints. Our proposed protocol is based on secure multi-party computation with a semi-honest arbitrator
who deals with data encrypted by the other parties using an additive homomorphic cryptosystem. We augment the protocol with randomization
and obfuscation to make it difficult for any party to estimate properties of the data belonging to other parties from the intermediate steps. The
previous approaches towards this problem were based on expensive QR decomposition of correlation matrices, we present an efficient algorithm
using the power iteration method. We analyze the protocol for correctness, security, and efficiency.
Link to full paper in PDF format.
- Content-based Filtering in On-line Social Networks. Marco Vanetti, Elisabetta Binaghi, Barbara Carminati, Moreno Carullo and Elena Ferrari.
Abstract This paper proposes a system enforcing content-based message filtering conceived as a key service for On-line Social Networks
(OSNs). The system allows OSN users to have a direct control on the messages posted on their walls. This is achieved trough a flexible rule-based system,
that allows a user to customize the filtering criteria to be applied to their walls, and a Machine Learning based soft classifier automatically producing
membership labels in support of content-based filtering.
Link to full paper in PDF format.
Short/Position papers and open problems
- Privacy-Preserving Data Mining via Importance Weighting. Charles Elkan.
Abstract This paper presents a fundamentally new approach to allowing learning algorithms to be applied to a dataset while keeping
the records in the dataset confidential. Let D be the set of records to be kept private, and let E be a fixed set of records from a similar domain that
is already public. The idea is to compute and publish a weight w(x) for each record x in E that measures how representative it is of the records
in D. Data mining on E using these importance weights is then approximately equivalent to data mining directly on D. The dataset D is used
privately to compute the weights, but not revealed in any other way.
Link to full paper in PDF format.
- Classifier Evasion: Models and Open Problems. Blaine Nelson, Benjamin Rubinstein, Ling Huang, Anthony Joseph and Doug Tygar.
Abstract As a growing number of software developers apply machine learning to make key decisions in their systems, adversaries are
adapting and launching ever more sophisticated attacks against these systems. The near-optimal evasion problem considers an adversary that searches for a
low-cost negative instance by submitting a minimal number of queries to a classifier, in order to effectively evade the classifier. In this position
paper, we posit several open problems and variants of the near-optimal evasion problem. Solutions to these problems would significantly advance
the state of the art in secure machine learning.
Link to full paper in PDF format.