The RANSAC (Random Sample Consensus) Algorithm [1, 2] is an adaptive method to detect outliers by dividing the sample datapoints into "inliers" and "outliers". Treated in this way, the data can be incorporated into a model that exhibits minimial error. RANSAC is also robust to outliers, which might be useful for dealing with heavy-tailed and other non-normal (e.g. Gaussian) distributions.
RANSAC was developed for and used extensively in computer vision research [3]. A MATLAB toolbox specialized for image processing, but possibly amenable to other applications [4], is available from the MATLAB Exchange [3].
[1] Zuliani, M. RANSAC for Dummies. August 13 (2012).
[2] Second picture is from: Zaman, T. [3D] Ransac (get planes from point clouds). March 23 (2011).
[3] RANSAC Algorithm for Image Processing. Stack Overflow (2011).
[4] Kowdle, A. Parallel implementation of RAndom SAmple Consensus (RANSAC). Docstoc (2010).
[5] Chen, H. RANSAC Toolbox. MATLAB Toolbox (2009).
[2] Posey, B.M. Demystifying the "Blue Screen of Death". Microsoft TechNet AND Use a Windows Blue Screen of Death for Your WordPress 404 Error Page. How-to Geek.
Here are a few recent articles on the process and blind spots of applying algorithms to real-world problems. In [1], Rhett Allain discusses the role of random walk theory in modeling how two lost people find each other in a forest (or perhaps two ships in the night).
In [2], the NP-hard nature of the travelling salesman problem (TSP) is discussed in the context of logistics optimization. And to spice things up (or top them off), the author of [3] finds analogies between the process of algorithmics and the process of baking.
[1] Allain, R. How Should Two Lost People Find Each Other? Wired Science Dot Physics blog, September 10 (2013).
[2] Vanderbilt, T. Unhappy truckers and other algorithmic problems. Nautil.us magazine, Issue 003 (2013).
[3] Hromkovic, J. Algorithmics, or What Have Programming and Baking in Common? In "Algorithmic Adventures". pp. 37-71, Springer (2009).
[1] Venkatesh, B. et.al Elephant shark genome provides unique insights into gnathosome evolution. Nature, doi:10.1038/nature12826 (2014).
[2] Jones, N. The Learning Machines. Nature, 505, 146-148 (2014).
[3] Hinton, G. Recent Developments in Deep Learning. Google TechTalks, March 19 (2010).
[2] Kirby, L. and Paris, J. Accessible Independence Results for Peano Arithmetic. Bulletin of the London Mathematical Society, 14, 285-293 (1982).
[3] Mathematics: what are some of the most counterintuitive mathematical results? Quora, March 27 (2014).
[4] JAVA Application: Hydra game. Mathematics and Computation blog.
In this installment of Popular
Algorithmics, I present a short reading list covering a data
structure called the Bloom
filter. The Bloom filter [1] is a complex probabilistic hash function
(e.g. maps semantic data to quantitative parameters) with a
membership query mechanism [2]. This is good for any kind of
set-based classification. In general, Bloom filters (particularly the
Murmur algorithm) are more efficient than the cryptographic hash used
in applications such as Bitcoin.
The probabilistic aspect of Bloom
filters mean that there is a deterministic false positive rate, which
can be tuned using a known number of input elements and hash
functions. Bloom filters can be applied to many types of problems,
including probabilistic verification [3] and bioinformatics [4].
[1] Broder, A. and Mitzenmacher, M. Network Applications of Bloom Filters: A Survey. Internet Mathematics, 1(4), 485-509 (2003).
[2] Holmes, A. Hadoop in Practice.
Manning Press (2012).
[3] Dillinger, P.C. and Monolios, P.
Bloom
Filters in Probabilistic Verification. Lecture Notes in Computer
Science, 3312, 367-381 (2004).
[4] Stranneheim, H., Kaller, M.,
Allander, T., Andersson, B., Arvestad, L., and Lundeberg, J.
Classification
of DNA sequences using Bloom filters. Bioinformatics, 26(13),
1595-1600 (2010).
Reading List:
Ellis, J. All
you ever wanted to know about Bloom filters. Spyced blog, January
28 (2009).
Ksankar Is
our Neocortex a Giant Semantic Bloom Filter? Of Natural Intelligence,
Machine Learning, and Jeff Hawkins. My Missives blog, April 14
(2013).
Bloom
Filters by Example. Bill Mill blog. Github
fork.
BARCODE:
Fast Lossless Compression via Cascading Bloom Filters. Homolog.us
blog, May 8 (2014).
What
is the Advantage to Using Bloom Filters? StackOverflow.
Davies, J. Bloom
Filters Overview. Jason Davies blog.
This installment of Popular Algorithmics is about Bayesian Poisoning.
This is a heuristic strategy dating from the late 1990s, and involves
the intentional degradation of e-mail spam filters (typically naive Bayesian models)
by spammers [1]. Poisoning strategies consist of Type I (intentionally
increasing the number of false positives) and Type II (embedding common
English words into a Spam message) attacks.
[1] Accettura, R. Bayesian Span Filter Poisoning with RSS. Fun with Wordage blog, Januiary 29 (2007).
[2] Eckleberry, A. What is the effect of Bayesian Poisoning? Security News blog, August 26 (2006).
This edition of Popular Algorithmics is a throwback to the 1990s. 7 August of 2014 was, according to Microsoft's "The Fire Hose" blog, the 20th anniversary of Microsoft's first website. Microsoft 1.0 was originally intended to be a low-res information portal. Interesting foray into what was to come.
Quasi-algorithmic word of the day: sockpuppetry.
Sockpuppetry is the practice of inventing an deceptive identity for
one’s avatar in a virtual world. As with Bayesian
Poisoning, sockpuppetry comes in two flavors: type I (a phony
persona different from the creator is used) and Type II (where
obfuscation of identity is the only requirement). Unlike Bayesian
Poisoning, however, Sockpuppetry can be done by a live person and not
a bot.
Seife, C. The Weird Reasons Why People Make Up False Identities on the Internet. Wired, July 29 (2014) and the book Virtual Unreality.
[1] Accettura, R. Bayesian Span Filter Poisoning with RSS. Fun with Wordage blog, Januiary 29 (2007).
[2] Eckleberry, A. What is the effect of Bayesian Poisoning? Security News blog, August 26 (2006).
This edition of Popular Algorithmics is a throwback to the 1990s. 7 August of 2014 was, according to Microsoft's "The Fire Hose" blog, the 20th anniversary of Microsoft's first website. Microsoft 1.0 was originally intended to be a low-res information portal. Interesting foray into what was to come.
Seife, C. The Weird Reasons Why People Make Up False Identities on the Internet. Wired, July 29 (2014) and the book Virtual Unreality.
This
edition of Popular Algorithmics is an introduction to cactus graphs [1]. Wolfram Mathworld
defines cactus graphs as a connected graph where any two graph cycles have
no edges in common. This is similar to the leaves of a botanical cactus. Cactus
graph topologies are an instance of a block graph, in
which every connected component constitutes a clique or a cycle. The
cactus representation can be used for a variety of applications, from modeling wireless networks to hierarchical
tree building.
[1] Lima, M. The Book of Trees: Visualizing Branches of Knowledge. Princeton Press (2014).
Blockchain is described as the transaction ledger. a large number of computers are connected to the network and have done all the online transactions effectively. learn more on professional blockchain course
ReplyDelete