1. Long Research Papers

An Unsupervised Attribute Clustering Algorithm for Unsupervised Feature Selection
Pei-Yuan Zhou and Keith C.C. Chan

FACTORBASE: Multi-Relational Structure Learning with SQL All The Way
Zhensong Qian and Oliver Schulte

Improved Risk Predictions via Sparse Imputation of Patient Conditions in Electronic Medical Records
Budhaditya Saha, Sunil Kumar Gupta and Svetha Venkatesh

Multi-Objective Clustering Ensemble for High-Dimensional Data based on Strength Pareto Evolutionary Algorithm (SPEA-II)
Abdul Wahid, Xiaoying Gao and Peter Andreae

EXP3 with Drift Detection for the Switching Bandit Problem
Robin Allesiardo and Raphaël Féraud

Learning Better while Sending Less: Communication-Efficient Online Semi-Supervised Learning in Client-Server Settings
Han Xiao, Shou-De Lin, Mi-Yen Yeh, Phillip Gibbons and Claudia Eckert

Behavioral Entropy and Profitability in Retail
Riccardo Guidotti, Michele Coscia, Dino Pedreschi and Diego Pennacchioli

The Layered Structure of Company Share Networks
Andrea Romei, Salvatore Ruggieri and Franco Turini

Evaluating and Predicting Energy Consumption of Data Mining Algorithms on Mobile Devices
Carmela Comito and Domenico Talia

On-line Detection of Continuous Changes in Stochastic Processes
Kohei Miyaguchi and Kenji Yamanishi

Random-shapelet : an algorithm for fast shapelet discovery
Xavier Renard, Maria Rifqi, Walid Erray and Marcin Detyniecki

Multi-class Learning using Data Driven ECOC with Deep Search and Re-Balancing
Nathalie Japkowicz, Vincent Barnabe-Lortie, Shawn Horvatic and Jie Zhou

Nonparametric Discovery of Online Mental Health-Related Communities
Bo Dao, Thin Nguyen, Dinh Phung and Svetha Venkatesh

Improved Algorithms for Exact and Approximate Boolean Matrix Decomposition
Yi Sun, Shiwei Ye, Yuan Sun and Tsunehiko Kameda

Hermes_sem: a Semantic-aware Framework for the Management and Analysis of our LifeSteps
Nikos Pelekis, Stylianos Sideridis and Yannis Theodoridis

P-N-RMiner: A Generic Framework for Mining Interesting Structured Relational Patterns
Jefrey Lijffijt, Eirini Spyropoulou and Tijl De Bie

Relational Active Learning for Link-Based Classification
Luke McDowell

Calibration of One Class SVM for MV-set estimation
Albert Thomas, Vincent Feuillard and Alexandre Gramfort

Beyond Two-sample-tests: Localizing Data Discrepancies in High-dimensional Spaces
Frederic Cazals and Alix Lhéritier

Discovering Characterizing Regions For Consumer Products
Shashwat Mishra, Vincent Leroy and Sihem Amer-Yahia

The NIST Data Science Program
Bonnie J. Dorr, Craig Greenberg, Peter Fontana, Mark Przybocki, Marion Le Bras, Cathryn Ploehn, Oleg Aulov, Martial Michel, E. Jim Golden and Wo Chang.

Sentiment and stock market volatility predictive modelling - a hybrid approach
Rapheal Olaniyan, Daniel Stamate, Doina Logofătu and Lahcen Ouarbya

Deep Feature Synthesis: Towards Automating Data Science Endeavors
James Max Kanter, Kalyan Veeramachaneni

2. Short Research Papers

Context Weight Considered For Implicit Feature Extracting
Jie Chen

Mining Actionable Combined Patterns Satisfying Both Utility and Frequency Criteria
Jingyu Shao, Junfu Yin, Wei Liu and Longbing Cao

Temporal Needleman Wunsch
Haider Syed and Amar Das

Document Similarity Analysis via Involving Both Explicit and Implicit Semantic Relatedness
Qianqian Chen, Liang Hu, Jia Xu, Wei Liu and Longbing Cao

Random Walk Based Context-Aware Activity Recommendation for Location Based Social Networks
Hakan Bagci and Pinar Karagoz

A Model-Based Approach for Identifying Spammers in Social Networks
Farnoosh Fathaliani and Mohamed Bouguessa

Scalable Relational Learning for Large Heterogeneous Networks
Ryan Rossi and Rong Zhou

An Effective and Economic Bi-level Approach to Ranking and Rating Spam Detection
Sihong Xie, Qingbo Hu, Jingyuan Zhang and Philip S Yu

Spherical Wards clustering and generalized Voronoi diagrams
Marek Śmieja and Jacek Tabor

Dynamics of Multi-Campaign Propagation in Online Social Networks
Thejaswi M, Sriniketh Vijayaraghavan, Avinash Das and P Santhi Thilagam

A Context-Aware Approach to Detection of Short Irrelevant Texts
Sihong Xie, Jing Wang, Mohammad S Amin, Baoshi Yan, Anmol Bhasin, Philip S Yu and Clement T Yu

Mining High-Utility Itemsets with Various Discount Strategies
Jerry Chun-Wei Lin, Wensheng Gan, Philippe Fournier-Viger, Tzung-Pei Hong and Vincent S. Tseng

Modeling Temporal Dependencies in Data Using a DBN-LSTM
Raunaq Vohra, Kratarth Goel and J. K. Sahoo

Compression Rate Distance Measure for Time Series
Vinh T. Vo and Duong Tuan Anh

Reliable Predictions under Concept Drift: the Droplets Algorithm
Pierre-Xavier Loeffel, Christophe Marsala and Marcin Detyniecki

IOHMM for Location Prediction with Missing Data
Jiawei Hu, Yanfeng Wang and Ya Zhang

Efficient metric learning for the analysis of motion data
Babak Hosseini and Barbara Hammer

Modeling Recurrent Distributions in Streams using Possible Worlds
Michael Geilke, Andreas Karwath and Stefan Kramer

Information Preserving and Locally Isometric&Conformal Embedding via Tangent Manifold Learning
Alexander Bernstein, Alexander Kuleshov and Yury Yanovich

Integrating Spatial Information into Probabilistic Relational Models
Rajani Chulyadyo and Philippe Leray

A New Consensus Function based on Dual-Similarity Measurements for Clustering Ensemble
Tahani Alqurashi and Wenjia Wang

Quantification in Social Networks
Letizia Milli, Anna Monreale, Giulio Rossetti, Dino Pedreschi, Fosca Giannotti and Fabrizio Sebastiani

Learning Hyperparameter Optimization Initializations
Martin Wistuba, Nicolas Schilling and Lars Schmidt-Thieme

A Time-Varying Graph Unifying Model
Klaus Wehmuth, Artur Ziviani and Eric Fleury

Cohesion Based Co-location Pattern Mining
Cheng Zhou, Boris Cule and Bart Goethals

Exploiting Feature Relationships Towards Stable Feature Selection
Iman Kamkar, Sunil Gupta, Dinh Phung and Svetha Venkatesh

Sensor Network Partitioning based on Homogeneity
Yi Xu, Zhongfei Zhang, Yaqing Zhang and Philip S. Yu

The HIM glocal metric and kernel for network comparison and classification
Giuseppe Jurman, Roberto Visintainer, Michele Filosi, Samantha Riccadonna and Cesare Furlanello

Scalable Extraction of Timeline Information from Road Traffic Data using MapReduce
Ardi Imawan, Fadhilah Putri, Seonga An, Han-You Jeong and Joonho Kwon

An Accurate Rating Aggregation Method for Generating Item Reputation
Ahmad Abdel-Hafez and Yue Xu

Exploiting Big Data in Time Series Forecasting: A Cross-Sectional Approach
Claudio Hartmann, Martin Hahmann, Frank Rosenthal and Wolfgang Lehner

LDA Based Semi-supervised Learning from Streaming Short Text
Ji-De Chen and Hung-Yu Kao

Exploring Distributed RDF Resources using Formal Concept Analysis
Mehwish Alam and Amedeo Napoli

Discovering and Tracking Influencer-Influencee Relationships Between Online Communities
Belkacem Chikhaoui, Mauricio Chiazzaro, Patrick Gallinari and Shengrui Wang

Learning From Missing Data Using Selection Bias in Movie Recommendation
Claire Vernade and Olivier Cappé

Hierarchical Label Partitioning for Large Scale Classification
Raphael Puget and Nicolas Baskiotis

Time Series Contextual Anomaly Detection for Detecting Market Manipulation in Stock Market
Koosha Golmohammadi and Osmar Zaïane

Detecting Human Emotion via Speech Recognition by Using Speech Spectrogram
Sathit Prasomphan

An Approach to Cover More Advertisers in Adwords
Amar Budhiraja and P. Krishna Reddy

Selecting representative instances from datasets
Hamid Mirisaee, Ahlame Douzal and Alexandre Termier

3. Long Application Papers

Matrix factorization approach to behavioral mode analysis from acceleration data
Yehezkel S. Resheff

A Mixture Model Clustering Approach for Temporal Passenger Pattern Characterization in Public Transport
Anne-Sarah Briand, Etienne Côme, Mohamed Khalil El Mahrsi and Latifa Oukhellou

TSMH Graph Cube: A Novel Framework for Large Scale Multi-Dimensional Network Analysis
Pengsen Wang, Bin Wu and Bai Wang

From One Star to Three Stars: Upgrading Legacy Open Data Using Crowdsourcing
Satoshi Oyama, Yukino Baba, Ikki Ohmukai, Hiroaki Dokoshi and Hisashi Kashima

Financial Crisis and Global Market Couplings
Wei Cao, Yves Demazeau and Longbing Cao

A Talent Management Tool using Propensity to Leave Analytics
Karthikeyan Natesan Ramamurthy, Moninder Singh, Yichong Yu, Jessica Aspis, Matthew Iames, Michael Peran and Qin Held

A mining based approach for efficient enumeration of algebraic structures
Majid Khan, Nazeeruddin Mohammad, Shahabuddin Muhammad and Asif Ali

4. Short Application Papers

Hiring Decisions Based On Social Network Profiles: Data Mining Judgements To Identify Key Factors
Yoram Bachrach

A Clustered Multitask Learning for Predicting Bleeding and other Perioperative Outcomes
Che Ngufor, Sudhindra Upadhyaya, Dennis Murphree, Daryl Kor and Jyotishman Pathak

Data Science Foundry for MOOCs
Sebastien Boyer, Ben U. Gelman, Benjamin Schreck, Kalyan Veeramachaneni

A Text Clock Context Informations based Multiple Web Contents Extraction
Wonmoon Song and Myungwon Kim

Label Noise
Bryce Nicholson, Victor Sheng and Jing Zhang

Fast and user-friendly non-linear principal manifold learning by method of elastic maps
Andrei Zinovyev and Alexander Gorban

Predictive Reliability Mining for Early Warnings in Populations of Connected Machines
Karamjit Singh, Gautam Shroff and Puneet Agarwal

When Cyberathletes Conceal Their Game: Clustering Confusion Matrices to Identify Avatar Aliases
Olivier Cavadenti, Victor Codocedo, Jean-Francois Boulicaut and Mehdi Kaytoue

Duration Models for Activity Recognition and Prediction in Buildings using Hidden Markov Models
Antonio Ridi, Nikos Zarkadis, Christophe Gisler and Jean Hennebert

Towards Predictable and Risk-Free Enterprise Systems
Mayank Shrivastava, Maitreya Natu and Vaishali Sadaphal

“The harsh rule of the goals”: Big Data analytics and football team success
Paolo Cintia, Luca Pappalardo, Dino Pedreschi, Fosca Giannotti and Marco Malvaldi

5. Special Session - Big Behavioral Data Analytics (BBDA)

Profit Maximizing Logistic Regression Modeling for Customer Churn Prediction
Eugen Stripling, Seppe Vanden Broucke, Katrien Antonio, Bart Baesens and Monique Snoeck

Inferring User Activities from Spatial-Temporal Data in Mobile Phones
Gunarto Sindoro Njoo, Xiao Wen Ruan, Kuo-Wei Hsu and Wen-Chih Peng

Tracking the Evolution of Community Structures in Time-Evolving Social Networks
Etienne Gael Tajeuna, Mohamed Bouguessa and Shengrui Wang

Mining relationships in learning-oriented social networks
María Estrella Sousa Vieira, Jose Carlos López Ardao, Manuel Fernández Veiga, Miguel Rodríguez Pérez and Cándido López García

Predicting Online Video Engagement Using Clickstreams
Everaldo Aguiar, Saurabh Nagrecha and Nitesh Chawla

6. Special Session - Big Data, Distributed Technologies and Intelligent Agents (BDIA)

Data-driven Semantic Concept Analysis for Automatic Actionable Ontology Design
Vladimir Gorodetsky and Olga Tushkanova

MapReduce-based K-Prototypes Clustering Method for Big Data
Mohamed Aymen Ben Hajkacem, Chiheb Eddine Ben N’cir and Nadia Essoussi

Cluster-Based Data Oriented Hashing
Sanaa Chafik, Imane Daoudi, Mounim A. El Yacoubi and Hamid El Ouardi

7. Special Session - Bioinformatics, Health and Medical Analytics (BHMA)

Design of an NGS MicroRNA Predictor Using Multi-layer Hierarchical MapReduce Framework
Ren-Hao Pan, Lin-Yu Tseng, I-En Liao, Chien-Lung Chan, K. Robert Lai and Kai-Biao Lin.

An ensemble of machine learning and anti-learning methods for predicting tumour patient survival rates.
Chris Roadknight and Uwe Aickelin

Patient Classification based on Expanded Query using 5-gram Collocation and Binary Tree
Jaya Sil and Indrani Bhattacharya

Improved Approach for Protein Function Prediction by Exploiting Prominent Proteins
Satheesh Kumar and P. Krishna Reddy

MIAT: A Novel Attribute Selection Approach to Better Predict Upper Gastrointestinal Cancer
Avi Rosenfeld

Modeling Heterogeneous Clinical Sequence Data in Semantic Space for Adverse Drug Event Detection
Aron Henriksson, Jing Zhao, Henrik Boström and Hercules Dalianis

A Data-driven Analytics Approach in the Study of Pneumonia’s Fatalities
Maribel Yasmina Santos, António Carvalheira and Artur Teles de Araújo

Cascading Adverse Drug Event Detection in Electronic Health Records
Jing Zhao, Aron Henriksson and Henrik Bostrom

Characterizing chronic disease and polymedication prescription patterns from electronic health records
Martí Zamora, Manel Baradad, Ester Amado, Sílvia Cordomí, Esther Limón, Juliana Ribera, Marta Arias and Ricard Gavaldà

Ensemble of Deep Long Short Term Memory Networks for Labelling Origin of Replication Sequences
Urminder Singh, Sucheta Chouhan, A Krishnamachari and Lovekesh Vig

Anomaly Detection in ECG Time Signals through Long Short-Term Memory based Recurrent Neural Network Architecture
Sucheta Chauhan and Lovekesh Vig

8. Special Session - Data Oriented Constructive Mining and Multi-Agent Simulation (DOCMAS)

Error Detection of Oceanic Observation Data Using Sequential Labeling
Satoshi Ono, Haruki Matsuyama, Ken-Ichi Fukui and Shigeki Hosoda

9. Special Session - Emotion and Sentiment in Intelligent Systems and Big Social Data Analysis (SentISData)

Sentiment Analysis as a Text Categorization Task: A Study on Feature and Algorithm Selection for Italian Language
Berardina Nadja De Carolis, Stefano Ferilli, Domenico Redavid and Floriana Esposito

Debate on Political Reforms in Twitter: A Hashtag-driven Analysis of Political Polarization
Cristina Bosco, Mirko Lai, Patti Viviana and Daniela Virone

Hot Spot Detection - an Interactive Cluster Heat Map for Sentiment Analysis
Patrick Hennig, Philipp Berger, Maximilian Brehm, Bastien Grasnick, Jonathan Herdt and Christoph Meinel

Monitoring the Twitter sentiment during the Bulgarian elections
Jasmina Smailovic, Janez Kranjc, Miha Grcar, Martin Znidarsic and Igor Mozetic

Using Emotions to Predict User Interest Areas in Online Social Networks
Yoad Lewenberg, Yoram Bachrach and Svitlana Volkova

Detecting Irony and Sarcasm in Microblogs: The Role of Expressive Signals and Ensemble Classifiers
Elisabetta Fersini, Federico Alberto Pozzi and Enza Messina

10. Special Session - Exploratory Computing (EC)

IQ4EC: Intensional Answers as a Support to Exploratory Computing
Mirjana Mazuran, Elisa Quintarelli and Letizia Tanca

TESS: Temporal Event Sequence Summarization
Dominique Gay, Romain Guigourès, Marc Boullé and Fabrice Clérot

Succinctly summarizing machine usage via multi-subspace clustering of multi-sensor data
Sarmimala Saikia, Gautam Shroff, Puneet Agarwal and Ashwin Srinivasan

11. Special Session - Environmental and Geo-spatial Data Analytics (EnGeoData)

Constrained Spectral Clustering for Regionalization: Exploring the Trade-off between Spatial Contiguity and Landscape Homogeneity
Shuai Yuan, Pang-Ning Tan, Kendra Spence Cheruvelil, Sarah M. Collins and Patricia A. Soranno

Multifactorial Uncertainty Assessment for Monitoring Population Abundance using Computer Vision
Emma Beauxis-Aussalet and Lynda Hardman

Detection, Tracking, and Visualization of Spatial Event Clusters for Real Time Monitoring
Natalia Andrienko, Gennady Andrienko, Georg Fuchs, Salvatore Rinzivillo and Hans-Dieter Betz

Big Data from Cellular Networks: How to Estimate Energy Demand at real-time
Davide Tosi, Stefano Marzorati, Mario La Rosa, Giovanna Dondossola and Roberta Terruggia

Traffic Risk Mining from Heterogeneous Road Statistics
Koichi Moriya, Shin Matsushima and Kenji Yamanishi

Learning Urban Users' Choices to Improve Trip Recommendations
Boris Chidlovskii

RIMM: A Novel Map Matching Model With Rotational Invariance
Junpeng Bao, Yuepeng Zhang, Qian Cao, Jun Zeng and De Zhang

Assistive Classification for Improving the Efficiency of Avian Species Richness Surveys
Liang Zhang, Michael Towsey, Phil Eichinski, Jinglan Zhang and Paul Roe

12. Special Session - Statistical and Mathematical Tools for Data Science (SMTDS)

Constrained Independence for Detecting Interesting Patterns
Thomas Delacroix, Ahcène Boubekki, Philippe Lenca and Stéphane Lallich

A Ranking-based approach for Hierarchical Classification
Azad Naik and Huzefa Rangwala

A Combination of CUSUM-EWMA for Anomaly Detection in Time Series Data
Vyron Christodoulou and Yaxin Bi

Multi-Target Regression from High-Speed Data Streams with Adaptive Model Rules
João Duarte and João Gama

Scalable Image Annotation Using a Product Compressive Sampling Approach
Anastasios Maronidis, Elisavet Chatzilari, Spiros Nikolopoulos and Ioannis Kompatsiaris

Graph-based Semi-supervised Learning for Time Series Classification
Zhao Xu and Koichi Funaya

13. Industry Session

Sentiment Analysis of Facebook Data using Hadoop based Open Source Technologies
Sudipto Shankar Dasgupta, Swaminathan Natarajan, Kiran Kumar Kaipa, Sujay Bhattacherjee, Arun Viswanathan andSairam Yeturi