Created by 이영석 (http://yslee.cs-cnu.org)
“Data is raw, , preprocessed, engineered for analysis to achieve better product ! .” by Youngseok Lee
1 Terabyte (TB) = ? GB
1 Petabyte (PB) = ? TB
1 Exabyte (EB) = ? PB
Zetta, Yotta, ...
Google, Facebook, Amazon, MS, Netflix, Walmart, eBay, ...
Alibaba, Baidu, ...
분석알고리즘 + SW 플랫폼 -> 문제해결
Netflix Prize
“ Data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. ”
function map(String name, String document):
// name: document name
// document: document contents
for each word w in document:
emit (w, 1)
function reduce(String word, Iterator partialCounts):
// word: a word
// partialCounts: a list of aggregated partial counts
sum = 0
for each pc in partialCounts:
sum += pc
emit (word, sum)
By w:Florence Nightingale (1820?1910). - http://www.royal.gov.uk/output/Page3943.asp, Public Domain, https://commons.wikimedia.org/w/index.php?curid=1474443