【金子凼】eBay in the 2000′s, Debugging on Live Site

Chinese Version
05-14-2021, Friday, Sunny

In 2000, I switched jobs from Lattice Semiconductor to eBay after Thanksgiving.

In 2000, eBay had just gone public for more than two years and had a quite few Chinese engineers among a very limited number of software engineers. For example, Vicki who hired me was the sixth software engineer of eBay and my boss Jim, as well as WenWen the third software engineer of eBay, Daniel, and Bill…

Before2002, eBay’s software system was developed and maintained on Microsoft C++ Visual Studio. eBay had a weekly software release, which covered all sub-systems, such as the selling system, buying system, billing system, user system, and batch jobs, etc. Such high frequency release of the complexity system often introduced new bugs.

A bug is an error in a computer system that causes the system to produce an unexpected result or to behave in unintended ways. Those new bugs only show up on the live site, eBay’s main website which allows users to buy and sell in real time online. Some of those bugs could not be reproduced in the QA environment. A QA environment is an internal testing environment that closely replicates the live site, but does not serve real customers; it was created for testing.

Therefore the eBay software system was released to the live site in debug mode in order to easily pinpoint the source code location of new bugs on the live site quickly.

Debug mode is just like a vegetable being plucked out from the field, packaged with dirt, roots, old stems, and dead leaves to sell. Release mode just like the regular vegetables seen in the supermarket, cleaned up with only the edible parts left. The spaces required for a system in debug mode are much larger than in release mode.

eBay had an unspoken test for a new developer: within a month of joining eBay, being able to set up development and live debugging environments on your personal computer demonstrates that you are a capable developer. I used C++ Visual Studio for more than 3 years in my previous job so I knew this development tool very well. However, eBay’s online software system serviced so many different links and sites, I encountered many problems while setting up the environments on my personal computer.

After setting up my environment, I felt that it was like Mission Impossible to set up the environment within a month after being hired and without Wenwen’s help. Wenwen was the first Chinese software engineer of eBay. Our cubicles were on the two ends of the same row on the second floor of Motors building, his was facing the window, mine was facing the walkway.

After hearing that my live debugging environments were ready, Jim immediately assigned me a P2 bug: “ID verification flow no longer working. The third party reported no traffic coming from eBay recently.” A bug that causes the system to malfunction in some non-trivial way which leads serious performance complaints from customers is classified as a P2 bug.

Jim said: “This bug was reported by the product manager. QA could not reproduce it on the QA environment. You need to go to the live site, follow the product manager’s steps to reproduce this bug and find out what caused it.”

I connected to eBay’s live site from my computer, logged in to eBay as the user given by the product manager, and stepped through the live source code on Visual Studio slowly to collect the source files related to ID verification flow. It was like looking for a needle in a haystack. Finally, I found the cause: Daniel had restructured a part the system on a large scale and moved a section of code from one file to another file, the “<<” became “<” in the new file, which caused the bug. Daniel was a top gun engineer! These big restructuring projects were always proposed by top gun managers and executed by their top gun engineers.

Restructuring a part of system on a large scale is like changing the layouts of wires, water pipes, sewers, corridors, etc. in a large office building. The executors must not only be familiar with the old layouts, but also integrate the new layout into the retained old structure. Restructuring usually involved actions like moving, redoing, and deleting etc.

After I showed Jim who sat next to me the one line of code that had caused the bug. Jim was very happy: “Go ask Bill, he is this week’s train conductor, for help in pushing this fix out onto the live site.”

A train conductor is an unique short-term role in eBay. No other companies have this role.

The content of a weekly software release, including all sub-systems, is called a train.

The software engineer who is in charge of a weekly release is the train conductor just for that week. During that week, the train conductor has a lot of privileges, such as pausing the release process, saying no to additional changes, tracking down new bugs encountered during the release, but nobody wants to become a train conductor.

Later, different sub-systems were handled by different train conductors and every train had several train conductors.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>