Embedding Methodology and
Statistics for Inference
SancarAdali
July29,2013
XDATA SUMMER CAMP
Sancar Adali
FROMDATATOSTATISTICS
Sancar Adali
Computestatisticsfromtimeseriesofgraphs(TSG)
Out-of-sampleextensionforadjacencyspectralembedding
FasterEmbeddingbytheuseofOOS-embedding
Dissimilaritycomputationformultivariatetimeseries
TensorDecompositionfortimeseriesdata(adj.matrices,
multivariatedata)
Fastcomputationoflocalstatisticsinverylargegraphs
CAPABILITIESANDIMPLEMENTATIONS
Sancar Adali
Rpackages:ScanStats,AdjMatEmbed,
DissTimeSeries
Python:Large-graphinvariants,MySQL-igraphfor
TSG
igraphC/C++library(develbranch)
CAPABILITIESANDIMPLEMENTATIONS
Software
Sancar Adali
BITCOIN
BITCOINDATAANALYSIS
Time
User1User2...
User1
User2
User3
...
TimeSeriesofGraphs
TimeSeriesofScanStatistics
Sender Receiver Transaction
amount
TimeStamp
Sancar Adali
BITCOIN
BITCOINDATAANALYSIS
Variousanomaly
detectionsusingthe
normalizedstatistics.
Theverticeswhicharethe
sourcesofanomalous
activityshouldbe
investigatedfurther.
Two variants of scan statistics (us and them)
for the Bitcoin data
Sancar Adali
Kiva
Jointembeddingofallentities(lender,loan,
partner,borrower)
Relationshipbetweenentitiesofdifferentkinds
->Adjacencymatrixofgraph(entities->vertices)
Lender-lendergraph:edgesmeanlentmoneyfor
thesameloan
Lender-loangraph
Partner-loangraph
EMBEDDING APPROACH
Sancar Adali
Embedthemostactivelendersin-sample
Repeatuntilallentitieshavebeenembedded:
Embedabatchfromtheremainingentitiesvia
out-of-sampleextension
Clustertheembeddedentities
EMBEDDING APPROACH
Fast Embedding via out-of-sample extension
Sancar Adali
Kiva Entity Embedding
KIVA DATA ANALYSIS
Embedding of Kiva lenders (colored by clusters)
Embeddingof720KKiva
lenders
Otherentitytypeswillbe
OOSembedded
Sancar Adali
Kiva Entity Embedding
KIVA DATA ANALYSIS
Embedding of Kiva lenders (color coded by country)
Embeddingof720KKiva
lenders
Otherentitytypeswillbe
OOSembedded
Sancar Adali
Scan Statistics for Anomaly Detection
AKAMAI-TRACEROUTE
Traceroutes
Time
IP1IP2IP3...
IP1
IP2
IP3
...
TimeSeriesofGraphs
TimeSeriesofScanStatistics
Sancar Adali
Scan Statistics for Anomaly Detection
AKAMAI-TRACEROUTE
Sancar Adali
Embedding dissimilarities between CIDR Traffic
AKAMAI-CIDR
CIDRHourlyTrafficforeachcategory
DissimilarityRepresentation
EmbeddingviaMDS
Aggregateby
summingthetrafficfor
eachweek
MultivariateTimeSeries
Sancar Adali
AKAMAI-CIDR
CIDRsbasedonChina
(blue)showaclustering
pattern
Embedding of ~4K CIDRs (outside US) based on the dissimilarity of traffic
patterns (color coded by country)
Sancar Adali
AKAMAI-CIDR
3DPlotofCIDREmbeding
Sancar Adali
Scan 1 Statistic & Spectral Embedding
Igraph extensions
Scan 1 Statistic of Erdos Renyi Graphs
with average Degree of 100 edges
igraph 0.7 introduces:
FastimplementationofScan1
Statisticexactandapproximate
invariant
Fastspectralembeddingof
adjacencymatricesusing
ARPACK
Sancar Adali
COLLABORATORS
PNNL/Stanford/Purdue:RyanHafen(Akamai-Traceroute,
Akamai-CIDR)
BBN/Raytheon:WalterAndrews(Kiva)
Oculus:PeterSchrettlen(Bitcoin)
ThankstoPeterWang(Continuum)andRyanHafen
(PNNL)forprovidingderiveddata
ThankstoeverybodyinDARPAXDATAprogramfor
supportingthiswork.
Contact information:
SancarAdali
sadali1@jhu.edu
JohnsHopkinsUniversity
3400N.CharlesSt.
100WhiteheadHall
Baltimore,MD21218
Thank you.