SE News :Yahoo! Search Webmap (Yahoo! Developer Network blog)

แวะมาอัพเดทข่าวก่อนแล้วกัน เรื่องของ Yahoo search webmap.

The Webmap build starts with every Web page crawled by Yahoo! and produces a database of all known Web pages and sites on the internet and a vast array of data about every page and site. This derived data feeds the Machine Learned Ranking algorithms at the heart of Yahoo! Search.Some Webmap size data:

* Number of links between pages in the index: roughly 1 trillion links
* Size of output: over 300 TB, compressed!
* Number of cores used to run a single Map-Reduce job: over 10,000
* Raw disk used in the production cluster: over 5 Petabytes
Source : Hadoop running in production on the Yahoo! Search Webmap (Yahoo! Developer Network blog)

Blogged with Flock


Tags: , ,

Tags :

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.