Sancar Adali

Scientist

Raytheon-BBN Technologies

Biography

I am a machine learning researcher at Raytheon BBN Technologies. I have worked on computer vision, bioinformatics and statistical analysis problems. I am interested in different approaches to learning from disparate modalities of data collected from different sensors. I am also interested in applications of data fusion to graphs and in general, learning from graph data.

I have a PhD in Applied Mathematics and Statistics from Johns Hopkins University. My PhD advisor was Carey Priebe. We have worked on projects involving data in dissimilarity representation in addition to graphs. My dissertation research involved using dissimilarity data from disparate sources in order to solve learning problems. These type of problems are examples of multiview learning and my approach is a dissimilarity-centric method to find a common representation for the disparate data from different views. An extension of my approach allows the solution of the seeded graph matching problem, a variant of the graph matching problem where a portion of the vertex correspondences are known.

I also have a M.Sc in Engineering from Brown University. I worked in computer vision problems at LEMS lab.

Also at Johns Hopkins (2007-2009), I was a research assistant in the Cardiovascular Research Grid project, developed code for predicting sudden cardiac death or VT/VF events for people implanted with introverter cardiovascular defibrillator(ICD) devices.

In addition to becoming familiar with different statistical methods, I also try to gain as much computing skills as I can. Learning R was an arduous journey and I learned to love R at the end by repeatedly failing and occasionally succeding. See the software section for the R packages I am working on.

I also work with Matlab and Python for scientific computing. Here’s my github page by the way.

I like learning new things and solving practical data problems. My two career goals have been to improve my software development skills and learn about new mathematical methods and approaches. You could also say I am a typeface and design enthusiast.

Like the best of statistics/data science practitioners, I am adept at learning new computer languages.

I used C++ for a long time working with VXL library. I picked up Python during my PhD which I would say is my (current) favorite computer language.

Interests

Statistical Programming
Data analysis
Extensive R and Matlab programming, scientific programming in various computer languages (Python, C++)
Analyzing small-sample, high-dimensional data (bioinformatics and medical data)
Experience handling censored and missing data (survival analysis, missing data problem)
Manifold Learning and Dimensionality Reduction methods

Education

PhD in Applied Math and Statistics, 2014

Johns Hopkins University
M.Sc. in Applied Math and Statistics, 2008

Johns Hopkins University
M.Sc. in Engineering, 2005

Brown University
BSc in Electrical and Electronics Engineering, 2003

Bogazici University

Selected Publications

Seeded Graph Matching

Given two graphs, the graph matching problem is to align the two vertex sets so as to minimize the number of adjacency disagreements between the two graphs. The seeded graph matching problem is the graph matching problem when we are first given a partial alignment that we are tasked with completing. In this paper, we modify the state-of-the-art approximate graph matching algorithm *FAQ* of Vogelstein et al. (2015) to make it a fast approximate seeded graph matching algorithm, adapt its applicability to include graphs with differently sized vertex sets, and extend the algorithm so as to provide, for each individual vertex, a nomination list of likely matches. We demonstrate the effectiveness of our algorithm via simulation and real data experiments; indeed, knowledge of even a few seeds can be extremely effective when our seeded graph matching algorithm is used to recover a naturally existing alignment that is only partially observed.

Donniell Fishkind, Sancar Adali, Heather Patsolic, Lingyao Meng, Vince Lyzinski, Carey E. Priebe

PR

Details PDF Slides Video Code Dataset Project

Fidelity-Commensurability Tradeoff in Joint Embedding of Disparate Dissimilarities

In various data settings, it is necessary to compare observations from disparate data sources. We assume the data is in the dissimilarity representation (Pękalska and Duin, 2005) and investigate a joint embedding method (Priebe et al., 2013) that results in a commensurate representation of disparate dissimilarities. We further assume that there are “matched” observations from different conditions which can be considered to be highly similar, for the sake of inference. The joint embedding results in the joint optimization of fidelity (preservation of within-condition dissimilarities) and commensurability (preservation of between-condition dissimilarities between matched observations). We show that the tradeoff between these two criteria can be made explicit using weighted raw stress as the objective function for multidimensional scaling. In our investigations, we use a weight parameter, w, to control the tradeoff, and choose match detection as the inference task. Our results show weights that are optimal (with respect to the inference task) are different than equal weights for commensurability and fidelity and the proposed weighted embedding scheme provides significant improvements in statistical power.

Sancar Adali, Carey Priebe

In JoC

Details PDF Slides Video Code Dataset Project

Joint Optimization of Fidelity and Commensurability for Manifold Alignment and Graph Matching

In this thesis, we investigate how to perform inference in settings in which the data consist of different modalities or views. For effective learning utilizing the information available, data fusion that considers all views of these multiview data settings is needed. We also require dimensionality reduction to address the problems associated with high dimensionality, or “the curse of dimensionality.” We are interested in the type of information that is available in the multiview data that is essential for the inference task. We also seek to determine the principles to be used throughout the dimensionality reduction and data fusion steps to provide acceptable task performance. Our research focuses on exploring how these queries and their solutions are relevant to particular data problems of interest.

Sancar Adali

Details PDF Slides Video Code Dataset Project

Learning spatiotemporal features for infrared action recognition with 3d convolutional neural networks

Infrared (IR) imaging has the potential to enable more robust action recognition systems compared to visible spectrum cameras due to lower sensitivity to lighting conditions and appearance variability. While the action recognition task on videos collected from visible spectrum imaging has received much attention, action recognition in IR videos is significantly less explored. Our objective is to exploit imaging data in this modality for the action recognition task. In this work, we propose a novel two-stream 3D convolutional neural network (CNN) architecture by introducing the discriminative code layer and the corresponding discriminative code loss function. The proposed network processes IR image and the IR-based optical flow field sequences. We pretrain the 3D CNN model on the visible spectrum Sports-1M action dataset and finetune it on the Infrared Action Recognition (InfAR) dataset. To our best knowledge, this is the first application of the 3D CNN to action recognition in the IR domain. We conduct an elaborate analysis of different fusion schemes (weighted average, single and double-layer neural nets) applied to different 3D CNN outputs. Experimental results demonstrate that our approach can achieve state-of-the-art average precision (AP) performances on the InfAR dataset:(1) the proposed two-stream 3D CNN achieves the best reported 77.5% AP, and (2) our 3D CNN model applied to the optical flow fields achieves the best reported single stream 75.42% AP.

Zhoulin Jiang, Viktor Rozgic, Sancar Adali

Proc. CVPR

Details PDF Slides Video Code Dataset Project

Seeded Graph Matching Via Joint Optimization of Fidelity and Commensurability

We present a novel approximate graph matching algorithm that incorporates seeded data into the graph matching paradigm. Our Joint Optimization of Fidelity and Commensurability (JOFC) algorithm embeds two graphs into a common Euclidean space where the matching inference task can be performed. Through real and simulated data examples, we demonstrate the versatility of our algorithm in matching graphs with various characteristics--weightedness, directedness, loopiness, many-to-one and many-to-many matchings, and soft seedings.

Vince Lyzinski, Sancar Adali, Joshua T. Vogelstein, Youngser Park, Carey E. Priebe

In arxiv

Details PDF Slides Video Code Dataset Project

Recent Publications

Seeded Graph Matching
Details PDF Slides Video Code Dataset Project
Fidelity-Commensurability Tradeoff in Joint Embedding of Disparate Dissimilarities
Details PDF Slides Video Code Dataset Project
Joint Optimization of Fidelity and Commensurability for Manifold Alignment and Graph Matching
Details PDF Slides Video Code Dataset Project
Learning spatiotemporal features for infrared action recognition with 3d convolutional neural networks
Details PDF Slides Video Code Dataset Project
Seeded Graph Matching Via Joint Optimization of Fidelity and Commensurability
Details PDF Slides Video Code Dataset Project
Manifold matching: Joint optimization of fidelity and commensurability
Details PDF Project

Software

RGraphM is a graph matching R package based on [graphm]http://cbio.mines-paristech.fr/graphm/. Coming soon to CRAN.

Recent & Upcoming Talks

Joint Optimization of Fidelity and Commensurability for Manifold Alignment and Vertex Assignment
Wed, Aug 1, 2012, 8th World Congress in Probability and Statistics
Joint Optimization of Fidelity and Commensurability for Manifold Alignment and Vertex Assignment
Mon, Aug 1, 2011, Joint Statistical Meetings 2011
Optimal Weighting for Joint Optimization of Fidelity and Commensurability in Tests of Matchedness
Mon, Aug 1, 2011, Joint Statistical Meetings 2011

Links

Curated over the years

R and development

Advanced R development - GitHub How to improve your R and data analysis skills by doing, asking for help and learning from online sources R, Statet/Eclipse and Latex primer R package for managing data projects

Statet- an Eclipse plugin for R

Introduction to R

Robust workflow with StatET + Eclipse (+ SVN + LaTeX + Sweave + BibTeX + Zotero)

R examples: usage of different libraries

Programming

github tutorial Development Documentation for a lot of languages C++ code style Python for stats and data analysis Primer explanations of math concepts

Interesting science writings

Blog post on Neural Networks, and hypotheses on why they work well

Latex,Fonts,visualization

For choosing palettes for graphs that are distinct or that have desired properties http://colorbrewer2.org/

A Survey of Free Math Fonts for TeX and LaTeX1

The 100 Best Fonts (in a Huge Sortable Table)

History of neue haas grotesk (aka Helvetica)

microsoft-fonts

Misc

One of the best collections of reading (both fiction and non-fiction articles) material for your Pocket: Longform Sites for Learning

Projects

Some of the things I have worked on.

ALADDIN

ALADDIN : Automated Low-Level Analysis and Description of Diverse Intelligence Video

CVRG

Prediction of Heart Arrythmias

Data Fusion from Disparate Dissimilarities and Joint Optimization of Fidelity and Commensurability

My thesis project. How to use disparate data with very different representations for inference purposes

External Project

An example of linking directly to an external project website using external_link.

Seeded Graph Matching

The other portion of my disseration project.

XDATA

DARPA program for organizing big data analysis and visualization efforts.

Teaching

I taught an intersession course on R programming at Johns Hopkins. Here are my class notes:

Let’s first start Rstudio. This is a good GUI that helps you organize the R expressions(functions called with some arguments) you run, the plots and outputs as a result of those commands.

We have four panes. The upper left is the R script file editor, where you will enter your commands and save them as a text file. The lower left is the R console where you will see the commands sent to R and the results of those expressions. The results of expressions can be stored in variables to be used in future expressions. The upper right shows the variables and the data stored in those variables. The lower right pane has Files,Plots,Packages,Help tabs. -Files will list the files in the current working directory in your computer’s file system -Plots will show you any plots you have asked R to draw. -Help is where you can see information about R functions -Packages: we don’t need to worry about now

We’ll use R to read, modify and analyze data. Let’s start entering some R expressions and see the results.

How to store data in R

The most basic way one can think of to organize numeric data is a vector. Let’s create a vector whose elements are numbers by concatenating a few of them. We’ll call “c” function with the numbers as arguments to the function.

#creating a numeric vector
a.vector <- c(2,4,5)

If we need a sequence of integers, we can use double colon operator between the starting integer and ending integer

another.vector <- 2:6

We can concatenate different vectors to get new vectors. a.brand.new.vector <-c(a.vector,another.vector)

For any sequence with regular increments, “seq” is the function to use. It has three arguments the first number , the last number and the interval

a.seq.of.numbers <- seq(0,1,0.01)

If we need to store data as a two-dimensional array , we use “array” function (with number of rows and columns)

an.array <- array(a.seq.of.numbers,dim=c(10,10))

In the “array” function, we make use of the “dim” argument which is a vector of length two(number of rows and number of columns). Note that the functions we call might have possible arguments that we don’t provide. R will assume the values of those arguments are the default values, which you can look up in the help document for that function. To do that, enter

?array

We usually won’t need to look up all these arguments, though, since the default values are sensible.

day.number <- 1:7

What is the size of the vector?

length(day.number)

Suppose the data is not numeric, but is composed of strings(any combination of characters words, sentences, etc.) ( Qualitative data for example). We can store multiple strings in a vector like numbers. We call such vectors “character vectors”. We let R know we’re dealing with characters by putting quotes around them.

#Creating a character vector

#We can store qualitative data as vectors, too. (2.1.1)
day.names <- c("Mon","Tue","Wed","Thu","Fri")

During analysis of data, one often needs to modify some portion of a vector. we use square brackets after the name of the vector and inside the brackets we enter a vector of indices. The elements are indexed starting from 1 so if we want to get the first three elements of “day.names”

# If we want to use only part of a vector, we use indexing to choose
# which elements want
#Numeric indexing
day.names[1:3]

If we want to get the elements with indices 2,4,5 (the numbers we had stored in a.vector

day.names[a.vector]

When we need to access some portion of the two-dimensional array we created

an.array[1:4,1:3]

If we want to edit some elements of vectors and arrays, we can edit them in RStudio. If you’re using Rgui, we can use edit function to open up a spreadsheet interface

edit(an.array)

An array is a two-dimensional array of numbers or strings. But data can be a combination of different kinds of variables. The traditional convention of data organization in spreadsheet format is columns corresponding to different variables or features (qualitative or quantitative) being studied and rows corresponding to different observations of those variables. We will use data frames in R to store data organized like this. A data frame in R is a collection of vectors that all have the same length, but can be of any type (numeric vector, character vector). Each vector forms one column of the data frame. Many datasets that you load into R will be stored in data frames.

# data.frames in R

Let’s create a data.frame that is composed of the name of days and the integer corresponding to that day. “data.frame” function creates a data frame with each supplied argument as one of the columns. Columns can also be named, the names of the columns are supplied when creating the data frame. In the following example, these are “number” and “names” respectively

#creating a data.frame (many datasets will be in this form)
a.data.frame <- data.frame(number=day.number,names=day.names)

This R expression caused an error, because we didn’t have vectors that have the same length. Let’s fix this

day.names <- c(day.names,"Sat","Sun")

a.data.frame <- data.frame(number=day.number,names=day.names)

The indexing for data.frames is like arrays:

#Indexing for data frames
a.data.frame[1:3,]
a.data.frame[,2]

One can also access specific columns by using the name of that column, which will return the vector in that column

a.data.frame$names

And we can access some portion of that vector like any vector

a.data.frame$names[1:3]

The variables in data can also be logical variables, that is they take one of the values TRUE or FALSE. Logical vectors that store this kind of data are very useful, especially in indexing.

# A vector can be composed of logical values, too(TRUE or FALSE)
a.logic.vector<-c(TRUE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE)

If we use one of the comparison operators (“<”,”>”,”<=”,”>=”,”==”) with other type of vector, the result will be a logical vector whose elements are TRUE or FALSE. The value in the logical vector will be TRUE, when the comparison is true.

day.names==”Mon”
day.names==”Tuesday”
day.number<3

If we wanted to get some specific elements of a vector, say “a”, we can use logical vectors that have the same length as index vectors to extract those specific elements. The elements of “a” that will be extracted are the ones which have TRUE values in the logical vector used for indexing

#logical indexing (3.3,3.4)
day.names[a.logic.vector]

# Count the number of TRUEs in the logical vector
sum(a.logic.vector)

The logical indexing is very useful, when you want to extract some portion of the vector that satisfies a particular condition. Let’s look at an example

This function will load a dataset that’s built into R. This dataset includes a vector with the name nhtemp. It will be loaded with the name nhtemp. It’s the yearly temperature record from New Haven.

data(nhtemp)

One can use ls() function to list all the variables that are currently loaded into R

ls()

nhtemp is an R variable that’s slightly different than a vector, so we first want to turn into a regular vector.

nhtemp <- as.vector(nhtemp)

If we want to get temperature records that are higher than 54 degrees, we can use the logical vector

nhtemp>54
nhtemp[nhtemp>54]

If we want to get the indices of the TRUE values in logical vectors, we use the “which” function

which(nhtemp>54)

Now we can use plot functions to plot this series of tempearatures If we use only one argument in the plot function,

plot(nhtemp)

If we want to plot x-y plot (temperature against the years

plot(1912:1971,nhtemp)

plot(1:60,nhtemp,type='l')

These are the summary statistics functions in R

mean(nhtemp)
median(nhtemp)
max(nhtemp)
min(nhtemp)
var(nhtemp)

sd(nhtemp)
sqrt(var(nhtemp))
summary(nhtemp)

For plotting histograms, we use hist function with different

hist(nhtemp)
hist(nhtemp,breaks=14)
hist(nhtemp,breaks=seq(min(nhtemp),max(nhtemp),0.5))
boxplot(nhtemp)

smokes = c("Y","N","N","Y","N","Y","Y","Y","N","Y")
amount = c(1,2,2,3,3,1,2,1,3,2)

a.table <- table(smokes,amount)
prop.table(a.table)

chisq.test(a.table)

Contact

sadali[at]googlesmail.com
Cambridge,MA 02138, USA

Sancar Adali

Scientist

Raytheon-BBN Technologies

Biography

Interests

Education

Selected Publications

Seeded Graph Matching

Fidelity-Commensurability Tradeoff in Joint Embedding of Disparate Dissimilarities

Joint Optimization of Fidelity and Commensurability for Manifold Alignment and Graph Matching

Learning spatiotemporal features for infrared action recognition with 3d convolutional neural networks

Seeded Graph Matching Via Joint Optimization of Fidelity and Commensurability

Recent Publications

Software

Recent & Upcoming Talks

Recent Posts

graph matching and spaces

Migration

Writing content with Markdown, LaTeX, and Shortcodes

Building R package from C codebase

First post

Links

R and development

Programming

Interesting science writings

Latex,Fonts,visualization

Misc

Projects

ALADDIN

CVRG

Data Fusion from Disparate Dissimilarities and Joint Optimization of Fidelity and Commensurability

External Project

Seeded Graph Matching

XDATA

Teaching

How to store data in R

Contact