UW and Google: Teaching in Parallel
By Sierra
Michels-Slettvet, Engineering InternEarlier this year,
the University of Washington partnered with Google to develop and implement a
course to
teach large-scale distributed computing based on MapReduce and the Google File
System (GFS). The goal of developing the course was to expose students to the methods needed
to address the problems associated with hundreds (or thousands) of computers processing huge
datasets ranging into terabytes. I was excited to take the first version of the class, and
stoked to serve as a TA in the second round.
But you can't program air,
so Google provided a cluster computing environment to get us started. And since computers
can't program themselves (yet?), UW provided the most essential component: students with sweet
ideas for a huge cluster. After learning the ropes with these new tools, students finished the
course by producing an impressive array of final projects, including an n-body simulator, a
bot to perform Bayesian analysis on Wikipedia edits to search for spam, and an RSS aggregator
that clustered news articles by geographic location and displayed them using the Google Maps
API. Check out
Geozette.
We are looking at ways to
encourage other universities to get similar classes going, so we've also published the course
material that was used at the University of Washington on
Google Code
for Educators. You're more than welcome to check out the
Google Summer Intern video lectures on MapReduce, GFS, and
parallelizing algorithms for large scale data processing. This summer I've been working on
exposing these
educational
resources and other tools so that anyone can work on and think about cool
distributed computing problems without the overhead of installing his or her own cluster. In
that vein, we've released a virtual machine containing a pre-configured single node instance
of Hadoop that has the same interface as a full cluster without any of the overhead. Feel free
to
give it a whirl.
We're happy to be able to expose
students and researchers to the tools Googlers use everyday to tackle enormous computing
challenges, and we hope that this work will encourage others to take advantage of the
incredible potential of modern, highly parallel computing. Virtually all of this material is
Creative Commons licensed, and we encourage educators to remix it, build upon it, and discuss
it in the
Google Code for Educators Forum.
Lastly, a quick shout out to the other interns who helped out on our team
this summer: Aaron Kimball, Christophe Taton, Kuang Chen, and Kat Townsend. I'll miss you
guys!