2010-08-24
"How to write a great research paper"
I stumbled across some helpful slides in a comment on FemaleScienceProfessor's blog: "How to write a great research paper" (by Simon Peyton Jones, a researcher at MSR!)
Looking back at my summer with Microsoft
August 13th was my last day as a Microsoft intern. Ever since then, I’ve been missing working with great people, reading lots of interesting papers, and contributing to a larger effort in the best way I know -- by writing code :)
I spent the summer working on the Azure Research Engagement project within the Cloud Computing Futures (CCF) team at Microsoft Research’s eXtreme Computing Group (XCG). My project was to design and build CloudClustering, a scalable clustering algorithm on the Windows Azure platform. CloudClustering is the first step in an effort by CCF to create an open source toolkit of machine learning algorithms for the cloud. My goal within this context was to lay the foundation for our toolkit and to explore how suitable Azure is for data-intensive research.
Unfortunately, high school ends late and Berkeley starts early, so the internship was compressed into just seven weeks. In the first week, I designed the system from scratch, so I got to control its architecture and scope. I spent the next two weeks building the core clustering algorithm, and three weeks implementing and benchmarking various optimizations, including multicore parallelism, data affinity, efficient blob concatenation, and dynamic scalability.
I presented my work to XCG in the last week, in a talk entitled "CloudClustering: Toward a scalable machine learning toolkit for Windows Azure." Here are the slides in PowerPoint and PDF, and here’s the video of the talk. On my last day, it was very gratifying to receive a request from the Azure product group to give this talk at a training session for enterprise customers :)
I spent the summer working on the Azure Research Engagement project within the Cloud Computing Futures (CCF) team at Microsoft Research’s eXtreme Computing Group (XCG). My project was to design and build CloudClustering, a scalable clustering algorithm on the Windows Azure platform. CloudClustering is the first step in an effort by CCF to create an open source toolkit of machine learning algorithms for the cloud. My goal within this context was to lay the foundation for our toolkit and to explore how suitable Azure is for data-intensive research.
Unfortunately, high school ends late and Berkeley starts early, so the internship was compressed into just seven weeks. In the first week, I designed the system from scratch, so I got to control its architecture and scope. I spent the next two weeks building the core clustering algorithm, and three weeks implementing and benchmarking various optimizations, including multicore parallelism, data affinity, efficient blob concatenation, and dynamic scalability.
I presented my work to XCG in the last week, in a talk entitled "CloudClustering: Toward a scalable machine learning toolkit for Windows Azure." Here are the slides in PowerPoint and PDF, and here’s the video of the talk. On my last day, it was very gratifying to receive a request from the Azure product group to give this talk at a training session for enterprise customers :)
- Introduction by Roger Barga, my manager - http://www.youtube.com/watch?v=Sy6MyB_w0fs
- General introduction - http://www.youtube.com/watch?v=djkiyhG0e4A
- Technical introduction - http://www.youtube.com/watch?v=N9BsoXze61Y
- Algorithm and implementation - http://www.youtube.com/watch?v=MpAGwyFQqHw
- Optimizations (Part 1) - http://www.youtube.com/watch?v=bU43KnbCfxs
- Optimizations (Part 2) and Results - http://www.youtube.com/watch?v=vxucDtIpttI
2010-07-02
First week as a Microsoft Research intern
My first week as a Microsoft Research intern has been a lot of fun! Here are a few highlights:
MSR Intern Technology Connections: I attended a fascinating series of talks by the team leaders of Microsoft's various dev tools on Tuesday morning. Some of the best ones:
MSR Intern Technology Connections: I attended a fascinating series of talks by the team leaders of Microsoft's various dev tools on Tuesday morning. Some of the best ones:
- A behind-the-scenes look at how LINQ works in C# by Eric Lippert.
- A demo of some of Visual Studio 2010 Ultimate's cool features by Justin Marks. (It costs $11,899 :O)
- IntelliTrace, a way to step backwards through a program's execution history
- Architecture Explorer, a neat visualization of program flow and dependencies.
What I'll be working on: Building a dynamically scalable, fault-tolerant distributed k-means algorithm on Windows Azure.
The environment: I'm the only high school intern in XCG, and they don't generally take college interns, so I'm surrounded by PhD interns. It's a great learning opportunity :)
2010-06-22
Resources for getting started with Windows Azure
My internship at Microsoft Research's Cloud Computing Futures Group is starting next Monday, and I'm trying to get ramped up on high-performance computing with Windows Azure as quickly as possible so I can start developing real code sooner. Here are two of my favorite resources so far:
"Windows Azure for Research," a presentation from the same group that I'll be working with over the summer. This is a concise summary of Azure's features and possibilities -- and a great way to get excited about the platform!
Programming Windows Azure, a new book from O'Reilly -- by a member of the Azure product group. This is a well-organized and up-to-date guide, and the author's enthusiasm for the subject comes through :)
(Unfortunately, some of the code samples are poorly formatted in terms of indentation and variable naming. Still readable enough, though.)
2010-05-16
Interning at Microsoft Research over the summer
A few months ago, I decided to apply to Microsoft as a summer intern. I recently heard back from them, and I'm looking forward to joining Microsoft Research's Cloud Computing Futures Group.
I'll be working on the "Client + Cloud" effort. Currently, researchers need access to their own clusters to do heavy data processing. It would be more efficient to do number crunching in the cloud, where resources can scale along with researchers' needs. But many of researchers' algorithms require very low inter-node latencies, and clouds built of commodity hardware can't guarantee that. Over the summer, I'll be adapting these kinds of algorithms to work with the cloud's relatively high inter-node latencies, specifically using Windows Azure.
In many ways, this is my ideal internship. It provides a nice start in the research field, with the potential for a paper in a year or two. It's in an area of Microsoft that's on the leading edge -- as Steve Ballmer stated, cloud computing is Microsoft's future. And the Cloud Computing Futures Group has strong ties with UC Berkeley, so I'll be able to collaborate even beyond this summer.
I'll be working on the "Client + Cloud" effort. Currently, researchers need access to their own clusters to do heavy data processing. It would be more efficient to do number crunching in the cloud, where resources can scale along with researchers' needs. But many of researchers' algorithms require very low inter-node latencies, and clouds built of commodity hardware can't guarantee that. Over the summer, I'll be adapting these kinds of algorithms to work with the cloud's relatively high inter-node latencies, specifically using Windows Azure.
In many ways, this is my ideal internship. It provides a nice start in the research field, with the potential for a paper in a year or two. It's in an area of Microsoft that's on the leading edge -- as Steve Ballmer stated, cloud computing is Microsoft's future. And the Cloud Computing Futures Group has strong ties with UC Berkeley, so I'll be able to collaborate even beyond this summer.
Subscribe to:
Posts (Atom)