Very partial listing from our other bibs of mostly very new cites.
F. Menczer:
To appear in JASIST
http://www.informatics.indiana.edu/fil/Papers/JASIST-04.pdf
Recent Web searching and mining tools are combining text and link analysis to
improve ranking and crawling algorithms. The central assumption behind such approaches
is that there is a correlation between the graph structure of the Web and
the text and meaning of pages. Here I formalize and empirically evaluate two general
conjectures drawing connections from link information to lexical and semantic Web
content. The link-content conjecture states that a page is similar to the pages that
link to it, and the link-cluster conjecture that pages about the same topic are clustered
together. These conjectures are often simply assumed to hold, and Web search tools
are built upon such assumptions. The present quantitative confirmation sheds light
on the connection between the success of the latest Web mining techniques and the
small world topology of the Web, with encouraging implications for the design of better
crawling algorithms.
F. Menczer, G. Pant, P. Srinivasan:
To appear in ACM TOIT
http://www.informatics.indiana.edu/fil/Papers/TOIT.pdf
Topical crawlers are increasingly seen as a way to address the scalability limitations of universal
search engines, by distributing the crawling process across users, queries, or even client computers.
The context available to such crawlers can guide the navigation of links with the goal of
efficiently locating highly relevant target pages. We developed a framework to fairly evaluate topical
crawling algorithms under a number of performance metrics. Such a framework is employed
here to evaluate id_erent algorithms that have proven highly competitive among those proposed
in the literature and in our own previous research. In particular we focus on the tradeo_ between
exploration and exploitation of the cues available to a crawler, and on adaptive crawlers
that use machine learning techniques to guide their search. We noticed that the best performance is
achieved by a novel combination of exploratory and exploratory bias, and introduce an evolutionary
crawler that surpasses the performance of the best non-adaptive crawler after sufficiently long
crawls. We also analyze the computational complexity of the various crawlers and discuss how
performance and complexity scale with available resources. Evolutionary crawlers achieve high
efficiency and scalability by distributing the work across concurrent agents, resulting in the best
performance/cost ratio.
Colin Cooper Alan Frieze
On a General Model of Web Graphs
http://www.aladdin.cs.cmu.edu/papers/pdfs/y2003/power.pdf
Colin Cooper Alan Frieze
Crawling on Web Graphs
http://www.aladdin.cs.cmu.edu/papers/pdfs/y2002/spider.pdf
Persona: A Contextualised and Personalized Web Search http://www.hicss.hawaii.edu/HICSS_35/HICSSpapers/PDFdocuments/DTDMI01.pdf Recent advances in graph-based search techniques derived from Kleinberg's work [1] have been impressive. This paper further improves the graph-based search algo- rithm in two dimensions. Firstly, variants of Kleinberg's techniques do not take into account the semantics of the query string nor of the nodes being searched. As a result, polysemy of query words cannot be resolved. This paper presents an interactive query scheme utilizing the simple web ontology provided by the Open Directory Project to resolve meanings of a user query. Secondly, we extend a recently proposed personalized version of the Kleinberg algorithm [3]. Simulation results are presented to illustrate the sensitivity of our technique. We outline the implementation of our algorithm in the Persona personalized web search system.
H. Chang, et al
CreatingCustomized Authority Lists
http://citeseer.ist.psu.edu/chang99creating.html
concept of lifting good hubs