Monday, April 2, 2007

Conclusion - The Future of Google


(Photo from April 9, 2007 issue of BusinessWeek)

Google has 31% of the online advertising market revenue, and 56% of the market share for online searches. The company has nowhere to go but up, I'm not sure but I believe reading somewhere it said Google spends 20%-25% of its resources pursuing new ventures.

Google and Microsoft are currently both pursuing the acquisition of doubleclick, the number 1 online ad agency. Also Google has come out with its own productivity software suite including word processors, spreadsheets, database software, presentation software, instant messenging and video conference software to compete with Microsoft in the corporate arena. Microsoft charges about $500 a year for its suite per user, whereas Google will only charge $50.

Now that Google is entering the software market, there soon may be nothing people do without doing it through Google.

Google is also constantly expanding its massive server farms and fiber-optic capacity. Google may eventually be an alternative to the internet, by providing the same network that runs faster.

Google is also currently working on Artificial Intelligence systems, an example of that is its spell checking system, its language translator, and eventually a search engine that almost knows what you're looking for as soon as you start looking for it.

Google has bought YouTube as is currently working on a way to search for what's in a video without knowing the video name or the user who submitted it.

They are also trying to create the first online ad marketplace, like an ebay for anyone who wants to advertise. Where clients can bid in auctions for advertising spots. Google is also looking into revolutionizing television commercials, so that television commercials can be custom-tailored for each household, based on the ages and the people living in them, as well as their hobbies, interests, likes, dislikes, and shopping habits.

(Is Google Too Powerful? http://www.businessweek.com/the_thread/techbeat/archives/2007/03/is_google_too_p.html?chan=search)

Technology

(Photo from April 9, 2007 issue of BusinessWeek)
One of the most remarkable things about Google is how it works: "Our business relies on our software and hardware infrastructure, which provides substantial computing resources at low cost. We currently use a combination of off-the-shelf and custom software running on clusters of commodity computers. Our considerable investment in developing this infrastructure has produced several key benefits. It simplifies the storage and processing of large amounts of data, eases the deployment and operation of large-scale global products and services, and automates much of the administration of large-scale clusters of computers." (How Google Works: http://www.baselinemag.com/print_article2/0,1217,a=182560,00.asp)

"Google runs on hundreds of thousands of servers—by one estimate, in excess of 450,000—racked up in thousands of clusters in dozens of data centers around the world. It has data centers in Dublin, Ireland; in Virginia; and in California, where it just acquired the million-square-foot headquarters it had been leasing. It recently opened a new center in Atlanta, and is currently building two football-field-sized centers in The Dalles, Oregon." (How Google Works: http://www.baselinemag.com/print_article2/0,1217,a=182560,00.asp)

What Google uses to be the most efficient search engine and to handle massive amount of CPU and networking workload is called Grid Computing. It is part of what is called Internet 2.0 or Web 2.0, it is the wave of the future. Grid Computing is defined as utilizing a parallel infrastructure of many computers to distribute the workload evenly by pooling or sharing multiple computers' system resources. So instead of one big all-powerful and more importantly expensive supercomputer you split it over smaller more common and cheaper computers. It is the same technology used in the Human Genome Project to map the Human Genome (which has now turned into the Human Proteonome Project) , it is the same technology SETI@home (http://setiathome.berkeley.edu/) uses, and the same thing medical research companies use to find a cure for cancer.

As Google CEO Eric Schmidt states "Google does more than simply buy lots of PC-class servers and stuff them in racks, we're really building what we think of internally as supercomputers."

Google has also managed to develop portable data centers, so they can pack up a data center into a 20 or 40 foot shipping container and load it up on a tractor trailer rig to deploy anywhere. The trailer contains 5000 processors and 3.5 Petabytes of data storage space that can be delivered overnight.

What Google does with all this storage space is cache a large part of the Internet on its data centers (about 737,000,000 websites), so then it jumps from site to site through links, and creates a index. Then when someone types in a search query on http://www.google.com/ the search is then compares the query to the index and finds the best match, based on content and the number of links from other sites. The link analyzing system is called PageRank and was developed by Page and Brin, Altavista displayed the number of links associated with a site but never utilized that information. All of this happens in order to determine the best match and display them in ranking order for your convenience in just milliseconds, the time to locate your search results in always displayed in the upper right hand corner.

Larry Page and Sergey Brin also came up with BigFiles, which split large files into small pieces to be stored on many computer that are pooling their hard drive space. Google incorporates the Google File System (GFS) onto its servers that makes sure at least 3 copies are stored on separate computers to ensure data consistency and error prevention, so Google can store its data reliable on low-cost and more unreliable computers.

Google upgraded its software by creating BigTables which is a Database Management System, it stores structured data from Google, Google Maps, Google Earth, and Search History using standard relational databases such as MySQL. BigTables just breaks down tables into smaller pieces to be stored on multiple computers.

Google uses a version of RedHat Linux with kernel-level modifications by their programmers. Their Distributed Filing System is the Google Filing System, the Distributed Scheduling system is called the Global Work Queue, and their Database Management Systems are BigTables and Berkley DB which I believe is Oracle's.

http://www.baselinemag.com/print_article2/0,1217,a=182560,00.asp