Unable to connect, retrying...

feeds
popular
recent
reader
about
story
technologies
what is rss
connect
dmca
contact

Online collaborative whiteboard. Powerful, engaging with timer, emoji's, commenting and voting.

Feed Preview Research on Search

Research on Search

My study of machine learning, data mining, computational linguistics and information retrieval, towards the grand goal of developing the "perfect search engine" that "understands exactly what you mean and gives you back exactly what you want" (Larry Page).

Feed: Related:

New-style Python classes

Tue August 21st, 2012

With Python 2.2 a new type of classes, named new-style classes , was introduced. New-style classes add some convenient functionality to classic classes. New-style classes are recognized by having...

http://researchonsearch.blogspot.com/2012/08/new-style-python-classes.html

Python caching decorators

Thu August 9th, 2012

Implementing a dynamic programming algorithm is made much easier in Python by using a caching/memorization decorator .

http://researchonsearch.blogspot.com/2012/08/python-caching-decorators.html

Python's handling of default parameter values

Fri May 11th, 2012

Python's handling of default parameter values becomes tricky when one uses a mutable object (e.g., list or dictionary) as a default value. > Default parameter values are evaluated when ...

http://researchonsearch.blogspot.com/2012/05/pythons-handling-of-default-parameter.html

Python's splat operator

Fri May 11th, 2012

Python actually has the splat operator (as in Perl or Ruby) that can unpack the arguments out of a list, tuple, or dictionary : func(*args), or func(**kwdargs).

http://researchonsearch.blogspot.com/2012/05/pythons-splat-operator.html

The Hungarian algorithm in clustering evaluation

Wed September 7th, 2011

The Hungarian algorithm (aka Kuhn–Munkres algorithm or Munkres assignment algorithm) can solve the assignment problem in polynomial time O(n^3). It can be used to find the optimal mapping from...

http://researchonsearch.blogspot.com/2011/09/hungarian-algorithm-in-clustering.html

Fastest membership test in Python

Sun August 28th, 2011

What is the most efficient method to check whether an item is in a given group or not? In Python, it seems that set (or frozenset) would be slightly faster than dict and much much faster than lis...

http://researchonsearch.blogspot.com/2011/08/fastest-membership-test-in-python.html

Submodular functions

Fri August 26th, 2011

Intuitively, a submodular function over the subsets demonstrates "diminishing returns", which is related to the concept of marginal utility in economics. Its usefulness for machine learning is ...

http://researchonsearch.blogspot.com/2011/08/submodular-functions.html

L1 regularisation Is efficient for selecting relevant features

Fri August 26th, 2011

Andrew Ng has proven in his ICML-2004 paper that sample complexity grows linearly in the number of irrelevant features when using L2 regularisation (in logistic regression, support vector machi...

http://researchonsearch.blogspot.com/2011/08/l1-regularisation-is-efficient-for.html

New linear-time algorithm for suffix array construction

Sun July 3rd, 2011

Juha Kärkkäinen, Peter Sanders , and Stefan Burkhardt: Linear Work Suffix Array Construction , Journal of the ACM (JACM), Volume 53 Issue 6, November 2006. As the authors have said, this algori...

http://researchonsearch.blogspot.com/2011/07/new-linear-time-algorithm-for-suffix.html

Research Impact for REF

Fri July 1st, 2011

The British government's emphasis on the practical impact of research in REF reminds me of Feynman's following words. > Physics is like sex: sure, it may give some practical > resu...

http://researchonsearch.blogspot.com/2011/07/research-impact-for-ref.html

DiveRS'11

Fri June 17th, 2011

Pablo Castells , Jun Wang , Ruben Lara , and Dell Zhang are organising an ACM RecSys-2011 workshop on Novelty and Diversity in Recommender Systems (DiveRS) . A special issue of ACM TIST in the...

http://researchonsearch.blogspot.com/2011/06/divers11.html

A couple of metrics

Fri June 17th, 2011

It is often desirable to measure the dissimilarity or distance between items using a proper metric . Jaccard coefficient can be converted to a metric by by subtracting the Jaccard coefficient f...

http://researchonsearch.blogspot.com/2011/06/couple-of-metrics.html

A Poor Man's Parallel Processing

Sun October 3rd, 2010

A very crude, but often good enough, method to achieve parallel processing (e.g., on multi-core computers) is to partition the large input data file into small chunks, run the program to process ...

http://researchonsearch.blogspot.com/2010/10/poor-mans-parallel-processing.html

nDCG

Fri September 10th, 2010

The choice of the gain and discount function for the popular IR performance measure normalised Discounted Cumulative Gain (nDCG) has been discussed and empirically justified in a CIKM-2009 paper...

http://researchonsearch.blogspot.com/2010/09/ndcg.html

LNRE

Wed August 11th, 2010

Here is a good tutorial with Matlab examples about Statistical Estimation for Large Numbers of Rare Events (LNRE) .

http://researchonsearch.blogspot.com/2010/08/lnre.html

VLFeat - a computer vision toolbox

Fri June 18th, 2010

The VLFeat open source computer vision library that implements popular feature extraction algorithms (such as SIFT , MSER , and quick shift ), clustering algorithms (such as integer k-means , ...

http://researchonsearch.blogspot.com/2010/06/vlfeat-computer-vision-toolbox.html

Bloom filters and Locality Sensitive Hashing

Tue June 1st, 2010

Locality Sensitive Hashing (LSH) of l-bits is achieved by carrying out l independent random cuts of the Euclidean space: if two data points are in the same side of all these cuts, they are very ...

http://researchonsearch.blogspot.com/2010/06/bloom-filters-and-locality-sensitive.html

An application of Bloom filters

Mon May 31st, 2010

It is said that Google's BigTable uses Bloom filters to reduce the disk lookups for non-existent rows or columns.

http://researchonsearch.blogspot.com/2010/05/bloom-filters-and-bigtable.html

A suffix tree implementation with Unicode support

Mon May 3rd, 2010

It seems that there is currently no suffix tree implementation with Unicode support publicly available online. So I adapted Thomas Mailund's suffix tree implementation in C with a Python binding...

http://researchonsearch.blogspot.com/2010/05/suffix-tree-implementation-with-unicode.html

Longest Common Substring

Mon May 3rd, 2010

Given two strings, S of length m and T of length n, their longest common substrings can be found in O(m+n) time using a generalised suffix tree , or in O(mn) time through dynamic programming (e...

http://researchonsearch.blogspot.com/2010/05/longest-common-substring.html

Bayesian inference for the Gaussian

Sun March 28th, 2010

Given the prior probability $p(mu) = mathcal{N}(x_0,sigma_0^2)$ and the likelihood $p(x_1|mu) = mathcal{N}(mu,sigma_1^2)$, the expectation of the posterior probability $p(mu|x_1)$ has a ver...

http://researchonsearch.blogspot.com/2010/03/bayesian-inference-for-gaussian.html

Comparing Data Analysis Packages

Wed February 3rd, 2010

A succinct comparison of data analysis packages including R, Matlab, SciPy, Excel, SAS, SPSS and Stata, can be found here . I recently tried Stata, but found its language syntax ugly and awkward.

http://researchonsearch.blogspot.com/2010/02/comparing-data-analysis-packages.html

The myth about the Internet

Wed November 11th, 2009

Walter Willinger et al. recently published a paper in which the scale-free network model of the preferential attachment type for Internet is said to be a myth, as it is based on fundamentally...

http://researchonsearch.blogspot.com/2009/11/myth-about-internet.html

Large networks are not modular

Wed August 12th, 2009

A pretty striking finding in the WWW'08 paper from Leskovec etc. is that in nearly every network dataset they examined, there are tight but almost trivial communities at very small scales (up to...

http://researchonsearch.blogspot.com/2009/08/large-networks-are-not-modular.html

Spectral Graph Partitioning

Thu July 30th, 2009

There are a number of methods in the family of Spectral Graph Partitioning, including the traditional min-cut and various balanced cut criteria (such as ratio-cut, average-cut, normalized-cut and...

http://researchonsearch.blogspot.com/2009/07/spectral-graph-partitioning.html