David A Notes

Conway and Coase on Collaboration in Software

2024-03-17T00:00:00+00:00

Building software is a team sport, but who’s on your team? The answer may surprise you!

Conway & Coase

Conway’s Law states that the structure of a large software project will inevitably come to resemble the organizational structure of the teams who build it. Colloquially, you are destined to “ship your org chart.” A critical insight here is the immense influence of human coordination mechanics on how software is built.

Loosely speaking, Coase’s Theory of the Firm answers the question: if markets are so effective a means of coordinating activity, why do we have sizable firms that internally use non-market information aggregation and decision-making processes? Oversimplifying, the answer is transaction costs: compared with simple command-and-control authority (eg, your boss tells you to do something), market transactions may entail additional friction in the form of price discovery, negotiation, contracts and assurances, and other coordination.

However, as technology and processes evolve, the “frontier” of which needs can be best met internally (via in-house capabilities) or externally (via the market) can shift. For example, standardization of a given physical input (like screws, or integrated circuits) can facilitate the sourcing of these components from third-party suppliers in the marketplace by decreasing the transaction costs.

The Cloud & SaaS

We might consider the evolution of the software industry through this lens. Open-source or commercial libraries or systems such as databases are analogous to physical inputs to a manufacturing process and have long been a part of the “software supply chain.” More recently, cloud computing has exploded in popularity and Software-as-a-Service (SaaS) vendors and use cases have rapidly proliferated.

For many SaaS vendors, the explicit value proposition for their customers is that acquiring the associated functionality (when not deemed essential to competitive differentiation) from the vendor can be a more attractive proposition than expensive and risky custom software development. Why should your business build your own payroll, authentication, or customer support systems? Ideally, this is a classic story of gains from specialization and trade: the vast majority of businesses are much better off outsourcing these functionalities to a constellation of external vendors who are each manically focused on their respective domains.

Taken together, Conway and Coase give us some interesting tools with which to think about modern software. In some sense, using AWS is a specially structured and intermediated collaboration with the Amazon teams who build and operate those services. If you build your application on AWS, certain architectural design patterns fit more naturally with the assortment of services offered by AWS, and in fact this is explicitly captured in in their various best practice architecture guidance docs. It’s Conway’s Law in action, but you’re shipping the AWS service offering catalog (ie, their org chart)! In a further fun twist, many of the aforementioned SaaS vendors are providing some coherent bundle of functionality and business value by building their services on top of AWS.

The past and future of software and coordination

One potential perspective on this dynamic is that recent technical and organizational changes have lowered transaction costs and shifted the “build-vs-buy” boundary between problems best tackled internally versus those where it makes more sense to shop for solutions in the marketplace. Software engineers have excellent reasons for building their apps by cleverly combining various vendor APIs and cloud services, but perhaps don’t often enough have the opportunity to step back and appreciate the deeper nature of the implicit globe-spanning inter-firm collaboration networks in which they are participating. While these ideas can be useful for understanding the recent history of software, it is an interesting open question whether they can help us navigate its near future as well.

Practical men, who believe themselves to be quite exempt from any intellectual influence, are usually the slaves of some defunct economist. -John Maynard Keynes

Engineering as Shadow Price Discovery

2022-10-30T00:00:00+00:00

Engineering: solving problems under constraints

Many engineering problems have the general form of trying to achieve some desired capabilities at minimum cost, or alternatively trying to design the best performing system within a certain price point. For purposes of a concrete example, say we are designing one of those rolling delivery robots like these ones I used to see rolling around Redwood City, California. To simplify, say that we need to achieve some range target while keeping the size below a certain city-mandated limit, and we’d like to have as much cargo capacity as possible while minimizing costs.

Tackling this problem, we realize that a cheaper battery can get us the desired range, but takes up more room that could otherwise be used for cargo capacity. More cargo capacity means more weight which requires more battery power for the same range. Everything costs money. How to decide what to do?

There are always trade-offs / No Free Lunch

Besides being something nice to say in a meeting if you have nothing else to add to the discussion, what does it mean to say that “there are always trade-offs”? Consider the case where this is not true: ie, you can potentially increase something good (performance, reliability, capacity) or decrease something bad (cost, pollution, delays) in the design without paying any price along other dimensions. This is sometimes known as a “free lunch”, and of course you would take it. Now, having maxed out on all available free lunches, you are now at some point in the constraint space where you can no longer get something for nothing. Once again, there are trade-offs! This is why there are always trade-offs: if there were free lunches to be had you would have already taken them, so here you are facing trade-offs.

Constrained Optimization

Constrained Optimization gives us a formal way to define problems where we want to maximize (or minimize) some mathematical function $f(x)$ (known as the objective function) in terms of variables $x$ which are subject to some constraints (either equality $g(x)=c$ or inequality $h(x) \leq d$) that define the feasible set of possible solutions. This field and its associated literature are vast and fascinating, and the amount of real-world activity influenced or controlled by the solutions of these optimizations is mind-boggling. In some sense, the tools of optimization can give us a rigorous way to navigate trade-offs.

We are just employing these concepts in an intuitive hand-wavy way here and we won’t be digging too deeply into actual equations. Going back to our delivery robot, imagine that we can drop in some specific objective function and parameters to perfectly capture the constraints and optimization goal described above. In this scenario, we can then just drop all this into some solver software and magically get the quantitative parameters that define the optimal or ideal design - hooray!

Shadow Prices

Having designed the perfect cargo robot for your exact circumstances, you are enjoying a coffee when your Product Manager (PM) bursts into the room and informs you that your employer has successfully lobbied to have local regulations changed to increase the maximum size of cargo robots by 3.7%! What does it mean for your design?

Before tackling this question, we must circle back to the original objective function definition. Recall that we want “as much cargo capacity as possible while also minimizing costs.” Unfortunately this is not, strictly speaking, a coherent objective function as we’ve no way to weigh a gain in cargo space against a decrease in costs. Instead, we have to define some “profitability” objective to be maximized that goes up with cargo space and down with manufacturing costs. In this definition, our stakeholders (say, the aforementioned PM) must have devised a valuation model for cargo space that standardizes its unit of measure to be directly comparable to manufacturing costs - eg, $10 per cubic meter of storage.

Shadow Prices extend this standardization to all of the other variables and constraints. At a given optimal solution, the shadow price for a given constraint tell us how much “better” we could do on the objective function (eg, profitability) if this constraint were very slightly (ie, infinitesimaly) relaxed. This means that, given the shadow price of the “size” constraint at our solution, we can now calculate the the approximate ROI of our company lobbyist who negotiated that 3.7% increase - great job!

Shadow Price discovery

In our cargo robot example, things worked out very nicely that the objective function was a relatively unambiguous target like “money”, and also that the PM was somehow able to devise a closed-form cash valuation model for cubic meters of cargo space. In real-world projects thing may not work out so cleanly. In fact, for an internal software engineering project, it is likely that there is not a well-defined objective function in the exact same sense as in our toy robot example. Instead, we may have a “multi-dimensional” objective with unclear weightings among the desired properties or capabilities:

speed of delivery: when can we have it by?
maintainability or extensibility of the code
cost (in AWS bills) to operate
reliability (uptime) of the component
various “nice-to-have” features or capabilities

We are now far afield of the well-defined world of Constrained Optimization, but perhaps we can lean on some of the intuitions we developed? Unfortunately, it may be incredibly difficult or even impossible to come up with sensible weightings between these properties. How much uptime are we willing to pay in order to get the component 1 sprint earlier? What is the cash value of refactoring the code to allow for easier extension in the future? Your stakeholders are not going to grind out an $N^2$ matrix of conversions weights between the various dimensions. How can we discover the appropriate shadow prices or exchange rates among the different properties of our solutions in order to choose the appropriate trade-offs?

Pricing with assortments

Harrison Metal has some nice short videos explaining various business-related concepts. Several of them explicitly target pricing, and in particular one of them covers the idea of assortments.

The video describes how vertical (good, better, best) and horizontal (different in kind) product assortments (along with optional “add-ons”) can be used to probe the marketplace and capture demand (more details in slides). What I find particularly interesting here is pricing as discovery: the variation supplied by the assortments “explores” the local neighborhood of latent consumer demand and reveals valuable information about who is willing to pay how much for what.

Now we come to an interesting idea: while stakeholders may not be willing or even able to explicitly articulate an $N \times N$ exchange rate matrix among wildly varying (and potentially discontinuous or lumpy) trade-offs in a design or plan, they are more likely to be able to choose or rank alternatives among a finite assortment of candidate proposals that vary along the relevant dimensions, thereby implicitly expressing some weightings among the crucial trade-offs.

Returning to our cargo robot, let’s pretend we do not have all of the nice equations and parameters we had previously assumed. Instead, we can propose three alternatives to the PM:

“econo-bot” that goes all-out to minimize cost but has less cargo space
“deluxe” design that has maximum cargo space, but at greater cost
some medium tradeoff that interpolates between 1. and 2.

Based on the feedback we get among these proposals, we gain some idea where the PM (our internal customer) believes the “sweet spot” to be is in terms of trading off cost and cargo space - hooray again!

Back to reality

Let’s try to fit this hypothetical scenario into some coherent framework of problem-solving in engineering. We can define “inner-loop” engineering as finding the best (or good enough) solution given a very nicely defined problem in terms of the goal and constraints. This is often no simple business, for “nature cannot be fooled”¹ as they say. But (certainly in software, probably everywhere) even this challenge is often embedded in an “outer-loop” problem of identifying and carefully defining the appropriate goal(s) and constraints in the first place. Given that this outer-loop defines the boundary between the “pure” technical solution and the messy human world, it is often a complex affair that, like pricing, can be more art than science. This is often where the trade-offs get traded-off, and hopefully these ideas from constrained optimization and pricing can help provide a useful conceptual toolkit for navigating these situations.

Finally some caveat: in a previous post I described a mental model of product development sequencing based on Prize-Collecting Steiner Trees (PCST), a famous combinatorial optimization problem in theoretical computer science. I thought it was a useful conceptual framework, but I also cautioned against trying to literally encode your problems in that way. Likewise, here I would not recommend actually trying to infer shadow prices or compute the NPV of a given bugfix, but it might be useful keep these tools and ideas in mind when trying to navigate tricky trade-offs among difficult-to-compare dimensions, and consider how one might implicitly elicit relative valuation information from stakeholders and/or domain experts via the presentation of carefully constructed alternatives.

Here “they” refers to Richard Feynman in the Presidential Commission report on the Challenger Space Shuttle disaster. ↩

Software problems

2022-09-06T00:00:00+00:00

What do you do with a problem like software?

Everyone in technology loves solving problems, you can just read their cover letters and LinkedIn bios where they say so. But what are “problems” in the context of software, exactly?

What follows below is a crude and idiosyncratic categorization of some different problem flavors one might encounter in software development. If this line of inquiry sounds intriguing, I would highly recommend reading the classic essay No Silver Bullet–Essence and Accident in Software Engineering by Frederick P. Brooks, Jr.

Code Problems

You have an accountId provided as an argument to a method you’re working in, but in order to make a required API call you actually need a customerId. There is another microservice that can perform the necessary translation, but the class you’re working in doesn’t have a handle to it in scope. You will need to ensure a valid handle is available to your class at construction-time. However, this service is not currently used in the particular binary you’re working on, so you need to familiarize yourself with the mechanics of the underlying service discovery and runtime dependency injection to wire it all up. Also you now need some additional mocking or stubbing for automated testing of your method to avoid relying on external services. Finally, after all of this you are able to getAccountDetails(), and it is a big hit at the biweekly sprint review.

The preceding stylized fiction is an example of what we could call Code Problems: difficulties encountered in achieving our goals that are largely artifacts of the organization and structure of the code itself. You’re not up against any fundamental laws of nature here, it’s just that, as-is, the code doesn’t currently do what you want and you’ll need to re-arrange a few things to remedy that.

What to do about Code Problems?

Ask this question to 2 software engineers and get 3 answers. Many “best (?) practices” can be interpreted as attempting to minimize Code Problems, such as Test Driven Development (TDD), static type systems, pair programming, Functional Programming (FP), design/code reviews, and so on. Refactoring can help make code easier to work with, but is also likely subject to diminishing returns.

It isn’t obvious (to me) if there exist any conclusive answers or one-size-fits-all fixes. Both Dan Luu and Hillel Wayne have conducted interesting surveys of empirical software engineering research. See for yourself, but the results are generally mixed.

One other thought is that the sums of money at stake in software development are vast. If there were $100 bills of “free lunch” productivity improvements lying on the ground, wouldn’t someone have already picked them up? Of course there would never be any forward progress on anything if people took this line of thinking too seriously, but it would seem surprising for there to be massive obvious wins hiding in plain sight in a such a fast-moving and competitive industry.

Physics Problems

As a business analyst, I want to query unlimited amounts of data instantly and for free, so that I can answer any arbitrary question that pops into my head.

Cloud-related advertising you might see at the airport notwithstanding, both the data and calculations required by the aforementioned use case are, at present, regrettably carried out by physical storage and compute devices, with all of their attendant costs and limitations.

It may be a bit obvious or simplistic to put it this way, but this physical reality means that

data (ie, actual 1’s & 0’s) has to be physically moved to some compute device: this includes networking, memory to cache, and beyond
some compute device (CPU, GPU, TPU, or Quantum) has to perform the desired Beta reductions, run the Turing machine, etc

These (ultimately) physics-imposed constraints (of current technologies) can therefore, in some sense, be described as ‌Physics Problems.

What to do about Physics Problems?

For reasons that are probably fascinating in their own right, Computer Science education seems to pay a great deal of attention to this topic. Let’s define Computer Science Cleverness™ as the study of how to eke out a bit (or a lot) more “bang for the buck” by organizing or implementing our systems differently. This could include things like caching tricks, improved algorithms, well-suited architectural choices, or various micro-optimizations up and down the stack.

Armed with our requirements and Computer Science Cleverness™, we then have (at least) three possible ways to deal with Physics Problems:

Balancing the tradeoffs between desired scale and performance and the costs and capabilities of available resources - on some level this may mean either spending more money or compromising on performance
Using Computer Science Cleverness™
Wait a bit for Applied Physicists and Electrical Engineers to somehow bail you out with their own brand of cleverness

Given the Net Present Value (NPV) of a delivering software solution today versus waiting for Physicists and EE’s to invent solutions to your problems, teams tend to opt for some mixture of 1 and 2 in practice.

Reality Problems

If people do not believe that mathematics is simple, it is only because they do not realize how complicated life is. -John von Neumann

On this topic, I would highly advise checking out Hillel Wayne’s recommendation of Data and Reality. In the very first chapter of that book, seemingly trivial everyday notions (“what is a thing”) are thoroughly inspected, and quick or easy answers are mercilessly ripped to shreds. What is so difficult about encoding “common sense” in software? At least part of the issue is that the dumb precision of code necessitates explicit reckoning with a combinatorial explosion of subtleties that, in everyday life, can (usually) be successfully navigated well enough by an embodied human intelligence capable of combining contextual cues with prior knowledge. This gap between human day-to-day reasoning capabilities and the effort required to codify “correct” behavior in software is an endless source of challenges and bugs.

For another perspective, talk with anyone who has ever worked on medical or governmental software. The dizzying complexity of the target bureaucratic system cannot help but be reflected in any software designed to interact with it. These concerns constitute Reality Problems, and these challenges are more or less irreducible, inherent to the real-world objectives of your software development endeavors.

What to do about Reality Problems?

Can we find an easier reality? Here it may be beneficial to widen the scope of your solution space to include the context in which your software will be used. If 0.001% of cases require escalation to some human judgment, maybe that’s ok, especially if excluding those cases from the scope of requirements for the software may make things 1000x easier.

Zachary Tellman’s fascinating Elements of Clojure articulates this nicely. Paraphrasing an idea that I found particularly resonant from that book, we could say that what is commonly meant by “over-engineered” is that a system correctly handles a broader range of inputs or operating conditions than is necessary or intended, whereas conversely an “under-engineered” system exhibits degraded or undesirable behaviors in some contexts in which we do wish it to work properly. Distinguishing between the two, especially in collaborative work, therefore requires carefully establishing a common understanding of the target environments and use cases.

The Actual Problem

Setting aside code organization, performance/resource constraints, and the mind-boggling complexity of the real world, there is still the actual original problem you are trying to solve, eg enabling the user to buy a widget from their phone. Assuming the problem itself is indeed the right problem (a major undertaking on its own), how can we achieve the desired goals?

What to do about The Actual Problem?

A particularly memorable Computer Science course I had the opportunity to take was Special Topics Seminar in Approximation Algorithms. Besides the fascinating content and expert instructor, part of what made it interesting was that the other students in the class were deeper specialists in CS Theory than I had typically encountered. I was struck by the ease and rapidity with which, upon seeing essentially any algorithmic problem, my classmates would pause like the Mentat in Denis Villenueve’s Dune for 2 seconds before proclaiming something like “reduces to Hamiltonian Cycle.”¹

Mapping new situations to a “vocabulary” of known patterns is a powerful technique that seems to recur across domains², and it is not clear why software should be any different. To what extent is your specific task a completely unique snowflake? Accepting that even highly innovative systems contain many “commodity” problems, much of the work becomes deconstructing your situation into its atomic pieces, mapping them to known solutions, modifying or adapting them to your specific circumstances where necessary, and appropriately combining them to yield the desired result.

This is not to say one can simply master this vocabulary and relax. Rapid evolution in hardware, infrastructure, ecosystems, and tooling mean that yesterday’s solutions may not map perfectly to today’s problems. A solid understanding of the strengths, weaknesses, and nuances of different techniques is critical to applying them intelligently in new situations.

Meta-problem: problem management

Which problems are the most important? As the classic senior engineer maxim goes: “it depends”. In any given context, it becomes a meta-problem to identify, categorize, and assess the severity of the various problems, as well as to understand the relationships among them. For example, code that is highly optimized for performance (to solve Physics Problems) may be tricky to later refactor or extend, creating Code Problems for future engineers (possibly including your future self). Navigating this dynamic portfolio of interrelated problems is arguably a significant piece of what effective software engineers get paid to do.

Furthermore, this discussion has focused primarily on problems associated with the code itself. Many of the thorniest challenges occur at a remove from the technical details, having to do with concerns at the “human level of the stack” such as coordination, communication, and alignment. These deserve an altogether separate discussion.

Fun example: given a collection of key-value “documents”, what is the minimum set of keys required for each docoument to be uniquely identified? Why Decluttering Complex Data in Legends is Hard ↩
The Cognitive Cost Of Expertise, WIRED. ↩

Encoder: Elements of Clojure by Zachary Tellman

2022-08-20T00:00:00+00:00

Encoder: experimenting with posting summaries or highlights of my (often quite) rough notes on books or papers I’ve read.

Elements of Clojure by Zachary Tellman

Names are not the only means of creating indirection, but they are the most common. The act of writing software is the act of naming, repeated over and over again.
— Elements of Clojure (@elementsofclj) December 7, 2018

In the earlier days of the “big data” buzzword, it seemed that perhaps data processing and analysis could sensibly be done using the JVM due to its rich libraries, mature tooling, and optimized performance. However, Java seemed a bit clunky and verbose for this purpose, opening the door for non-Java languages that targeted the JVM, probably most notably Scala (which Spark is written in) but also including a Lisp dialect called Clojure. At some point, I did some minor experimentation with using Clojure for data science, and at any rate I found writing a bit of non-trivial code in a Lisp to be an educational experience.

Elements of Clojure by Zachary Tellman is not intended to be an entry point into learning Clojure, but rather to provide a conceptual scaffolding and vocabulary for discussing higher-level tradeoffs and design choices. From the publisher’s description:

And so this book does not offer knowledge, it offers clarity. It is aimed at readers who know Clojure, but struggle to articulate the rationale of their designs to themselves and others. Readers who use other languages, but have a passing familiarity with Clojure, may also find this book useful.

Only 1 of the 4 sections of the book is truly Clojure-specific, the other 3 are densely packed with what I would perhaps loosely call “applied philosophy” of programming. Much of this content seemed to be more crisply distilled articulations of themes I’ve seen in code/design review feedback I had either given or received over the years working in (non-Clojure) software.

As promised by the description above, my experience of reading this book was unique among technical books I’ve read. Rather than telling me new things I didn’t previously know (eg, how does borrow checking work in Rust), it often gave me different ways to describe or understand ideas that had previously been lurking in the background of countless technical discussions.

Given all this, I don’t think I could recommend it to beginning programmers. Without a substrate of hands-on trial-and-error experience, many of the passages I found most fascinating would probably come across as vague fortune-cookie programming aphorisms. But for someone who “has touched the hot stove”¹ a few times, the book contains many sentences that I found remarkable for their insight and concision. Concepts are repeatedly stripped down to their essence, abstracted from the idiosyncrasies of particular technologies or application domains.

Two example themes that really stuck with me were naming things in software (arguably all one can ever do, as mentioned in the excerpt above), and careful consideration of the context in which your software will operate:

Over-engineering is not a property of our software, but of how we intend to use it.

The book also draws upon an unusual breadth and depth of references: Frege, Baudrillard, Hoare, Seeing Like a State by James L. Scott, Complex Adaptive Systems by Miller and Page, at least 2 different short stories by Jorge Luis Borges, and so on - perhaps making this book a great recommendation for the engineer on your team whose code review comments cite Leibniz’s “Identity of Indiscernibles” principle.

If you are looking for a silver bullet salesman, keep looking. But if the below passage resonates with you, then there is probably some cool stuff for you here.

This is not a problem that can be fully solved. We speak ambiguous words, we think ambiguous thoughts, and any project involving multiple people exists in a continuous state of low-level confusion.

Credit Stefan Zier for this euphemism (intended as a compliment) for experienced programmers, which is fundamentally in agreement with this assertion from the book: “The ‘seniority’ of an engineer derives more from their ability to predict adverse environments than from mastery of any particular technology.” ↩

Career links: teams and business

2021-09-09T00:00:00+00:00

Disclaimer: Not advice, consult a professional!

Following on a previously posted collection of links, here is another collection of resources focused on more organization-oriented topics. These are not exhaustive, and indeed some of these resources likely contradict each other outright. As in previous post, the reader is encouraged to take what they find suitable and leave the rest.

Context

“Some of us will do our jobs well and some will not, but we will all be judged on one thing: the result.” -Vince Lombardi

Most likely, a good deal of your professional work will occur in the context of collaboration with other individuals acting in some coordinated way to achieve higher-level goals. While your individual efforts will make a crucial contribution to the outcome, the effectiveness of how you and others work together will also be hugely important. Given this, it is worth giving some dedicated thought to the structure and dynamics of teams, organizations (especially businesses), and management.

WHAT: product development

Software engineering often supports the development of products or services that solve (either internal or external) customer problems. Questions about what exactly the team should build are important, and it is useful to acquire some familiarity with frameworks and tools people use to grapple with them:

The Lean Product Playbook - Dan Olsen
What Customers Want - Anthony Ulwick

HOW: teams, management, and organizations

How do people, teams, and organizations align and coordinate their efforts most effectively? Tough question, but here are some books about it.

Not necessarily software-specific:

High Output Management - Andrew Grove
High Growth Handbook - Elad Gil
Principles - Ray Dalio

At least kind of software-specific:

The Principles of Product Development Flow - Donald Reinertsen
Manager’s Path - Camille Fournier
An Elegant Puzzle - Will Larson
Accelerate: The Science of Lean Software and DevOps - Nicole Forsgren, PhD, Jez Humble, and Gene Kim

WHY: business structure and strategy

Zooming even further out than “What/Product” questions, we can consider why the business even exists, how it is organized, and why it does what it does:

The Nature of the Firm - Ronald Coase
7 Powers - Hamilton Helmer
Understanding Michael Porter - Joan Magretta

WILD CARD: Harrison Metal

Harrison Metal is an interesting organization, see their website for details. Among other things, they offer a library of free short instructional videos on a (very) wide range of startup and business topics and tactics. A few in particular that I have often forwarded to people are below:

Optional: tangentially relevant historical nonfiction

It can also be illuminating to take a deeper dive on some specific historical (up to and including very recent history) examples of teams, organizations, and projects.

The Idea Factory - Jon Gertner
The Making of the Atomic Bomb - Richard Rhodes
Working in Public: The Making and Maintenance of Open Source Software - Nadia Eghbal

Career links: ladder-climbing vs bet-placing

2021-09-08T00:00:00+00:00

Disclaimer: Not advice, consult a professional!

Software engineers (like anyone) often want a structured framework for thinking about career progression and growth. Many companies provide this in the form of a career ladder, and there are a wide variety of resources for learning more about developing skills and achieving milestones corresponding to different levels of the ladder. The simplicity of this approach can under-emphasize more strategic dimensions of personal development. For example, how can you cultivate some specialization, depth, or versatility that distinguishes you from a replacement-level Google L5? We can consider other non-ladder frameworks that place more emphasis on these aspects by viewing career development through the lens of startups, investing, or product development.

Context

This post is a “Don’t Repeat Yourself” (DRY) attempt to capture links and other resources on these topics that I’ve by now shared with people individually more than three times. My intention here is to assemble and briefly describe a collection of ideas from which the reader can take or leave what they like. Everyone’s personal situations and trade-offs vary considerably, and even a specific individual’s context, objectives, and constraints will be dynamic over time - caveat lector!

Get enough sleep, exercise, eat your vegetables

Certain guidance is well-worn or cliché for good reason (ie, it is basically true): invest in your communication skills, follow through on your commitments, seek out learning opportunities, develop your craft, and try to understand the broader context. There is no shortage of excellent books and articles covering these topics. But once you are already investing in these foundations, what’s next?

Beyond the ladder

As mentioned earlier, many tech companies have some version of a career ladder: a linear (vertical, even!) progression of increasing impact, expertise, and responsibility. More sophisticated versions may strain the metaphor by forking into different tracks of growth such as technical leadership or people management. These ladders are a popular tool for good reasons, as they encapsulate a useful consensus bundle of information, structure, and shared vocabulary.

However, there is room in the toolkit for more than one tool. Alternatively we might think of a career as a sequential process of making investments of effort and attention into uncertain endeavors over time. Instead of a diligent climber making cumulative progress up a ladder, imagine a calculating gambler (or investor, if you like), continuously allocating a scarce portfolio of skills, time, and other capital to solving problems, identifying and evaluating opportunities, and gathering additional information. Variations on this theme include thinking of a career as a startup or product, spending capital to carve out a lucrative niche. Viewed from this perspective, tidy questions about the next rung of the ladder are replaced by more open-ended questions around differentiation and decision-making under limited information or uncertainty.

Differentiation

“…don’t enter the rat race unless you’re the fastest rat!” -Erik Torenberg, “Build Personal Moats”

The writings of business theorist Michael Porter contain strong warnings against companies simply trying to be “the best,” as any gains will inevitably be competed away by equally determined and capable competitors. The opposite of the naive “battle to be the best” is the careful formulation and execution of a well-chosen strategy that charts a course towards a unique and defensible position in the marketplace. One example from the book is that IKEA doesn’t necessarily make the world’s best furniture in some absolute sense, but they have a very attractive value proposition for a particular customer segment, and their entire business commits wholly to going after this market via an interlocking set of difficult-to-replicate trade-offs.

Analogously, a career plan premised on being the smartest, most talented, or hardest working is, by definition, a dicey proposition for all but a few individuals. The same pitfalls also apply to highly competitive tournaments such as exclusive university admissions or selective employers. Indeed, the odds become even more dire if you believe that the internet has created a global marketplace for talent, or that advances in technology are tending to create “winner take all” outcomes. The thoughtful development of some rare and valuable skill (or combination of skills) is one way to, at least somewhat, sidestep the costly and unwinnable “battle to be the best (employee).” Some links to thought-provoking writing on this topic are below, and many of these articles themselves have links to further reading materials:

Build Personal Moats - Erik Torenberg
See your Career as a Product - Erik Torenberg
Career Moats 101 - Cedric Chin
Deep Work - Cal Newport
You and your research - Richard Hamming

Uncertainty

One challenge around trying to build some differentiated personal value proposition is that, by definition, there must be some “moat” preventing others from doing so, or doing it as effectively as you can. Going again to the startup analogy, a trio of fascinating blog posts (links below) from Jerry Neumann propose that startups fundamentally take on the risk of attacking uncertain opportunities, and that this uncertainty acts as a temporary moat keeping competitors from entering. During this window of time, the startup is in a race to exploit that buffer to build some other, more durable advantage(s) before the opportunity is sufficiently de-risked in the eyes of other better-resourced entrants. Jumping back to personal skillsets, an example of this could be becoming an expert in some emerging but as yet unproven technology, process, or domain.

This approach requires you to take on some risk that your bet doesn’t pay off (eg, the tech you chose doesn’t take off). If it were a sure thing, everyone would already be piling into it and it would be difficult to stand out from the crowd. Hopefully you have some insider information or domain expertise to formulate a better assessment of the odds than the general public, but there will still be an element of uncertainty to be evaluated and managed. Joining an early-stage startup can also be interpreted in this way. You are quite definitely placing a bet where the potential payoffs include both direct financial rewards as well as exposure to different kinds of growth and learning opportunities than you might get elsewhere.

Therefore, to effectively execute on your career strategy, it would be helpful to get comfortable incorporating the unknown and the uncertain into your decision processes. Below are some interesting links on this theme, many of which are unsurprisingly concerned with the problem setting of financial investment:

Startups and Uncertainty - Jerry Neumann
A Taxonomy of Moats - Jerry Neumann
Schumpeter on Strategy - Jerry Neumann
Investing in the Unknown and Unknowable - Richard Zeckhauser
Knightian Uncertainty - Peter Dizikes
Time Allocation as Capital Allocation - Cedric Chin (heavily inspired by Fortune’s Formula by William Poundstone- Against the Gods - Peter Bernstein

Putting it all together: what’s your edge?

If you find any of the above even remotely compelling, it could be a worthwhile exercise to try and explicitly work through an inventory of your worldview, skills, interests, and goals in these terms: how are you (or could you be) uniquely well-positioned to create exceptional value or solve important problems?

What is your existing (or desired) “career moat”?
What has to happen for you to get there?
What hypotheses or predictions about the broader world would have to be true in order for this bet to pay off - ie, where is there uncertainty?
What are the next intermediate checkpoints along the way?

Put another way, if you were positioning yourself in a job interview: what is your “edge” or “secret weapon”? Remember, all the candidates will (at least claim to) be proactive problem-solvers with cutting-edge technical skills, excellent collaboration habits, and a strong track record! At least once in a while, you may want to look “sideways” from the ladder and try to think about things a bit differently.

Acknowledgments

Thanks to my PhD co-advisor Mark Craven for the “secret weapon” framing and introduction to Hamming’s You and your research. Also many thanks to Cedric Chin, Jerry Neumann, and Erik Torenberg for “thinking in public” via blogs, Twitter, newsletters, etc!

Models and iteration speed in coding

2021-01-21T00:00:00+00:00

“The best material model of a cat is another, or preferably the same, cat.” -Arturo Rosenblueth and Norbert Wiener, The Role of Models in Science

What do programmers do when they are programming? One interpretation is that they are high-throughput empiricists, iteratively developing and testing many small hypotheses about what various pieces of code (including their own!) actually do by quickly running many micro-experiments. Thinking about how to make these experiments fast, cheap, and reliable can be a useful lens for understanding the costs and benefits of various software development settings, tools, and practices.

Software development: a day in the life

A few significant challenges around coding are

not knowing what we are trying to do
not knowing how to do it
not knowing if we’ve successfully done it

Challenge 1 is often a bigger picture business context question that we’ll set aside for this discussion, but Challenges 2 & 3 arguably constitute a sizable chunk of day-to-day “hands on keyboard” development activity. For a very simple example, say we have a data structure representing a set of customer orders and we want to find the order with the largest purchase price. This use case is probably easy to accomplish with the standard libraries, but maybe we don’t know the exact invocations off the top of our heads. By all means we should certainly RTFM, but to know for sure the fastest and easiest thing might be to work it out empirically with a small code snippet. This could be done in the REPL, via a quick script, or in a small unit test. Minutes of trial and error tinkering here could save hours of debugging later when the larger system isn’t behaving as intended.

Fast feedback, high fidelity

This kind of cheap, iterative experimentation with fast feedback is the closest thing to a “free lunch” as is likely to found in software development productivity. So what exactly is going on here? In the above example is our simple script, test, or REPL session is acting a model of the use case in our program, preserving the essential characteristics while being more amenable to rapid experimentation than the full system we’re building. The critical properties of such a model are:

Fast feedback: quick and easy cycles from making changes to observing their results
High fidelity: reasonable likelihood that behaviors will transfer or replicate to the true target context

Fast feedback is important because, as nicely described in “You are solving the wrong problem”, the limiting factor in this mode of developer productivity is the number of experiments we can run, and faster cycle times means more iterations. Back to our example, imagine how painful it would be if understanding each minor change to our hypothetical largestPurchase() function required re-compiling a 1M LOC application, packaging and deploying it, and then manually verifying the behavior from some UI flow.

High fidelty is critical to the usefulness of the experimental findings. If the experimental context is too far removed from the target environment, we run the risk that our results do not apply in the settings we care about. Simple examples of this are a unit test where the test stub of an external dependency diverges siginficantly from the implementation, or the “works on my machine” phenomenon.

In the ideal case, imagine working in some kind of magical “ultimate debug mode” that instantly surfaced arbitrarily detailed information about the entire set of consequences of even the smallest change exactly as they would play out in the full system.

Success stories

We can apply this approach to understand the appeal and utility of various development technologies and approaches:

Data Science Notebooks
JavaScript in the browser
REPLs
Test-driven development (TDD)
Compilers, type checkers, and linters

All of the above leverage interactivity to quickly convey the impacts of small changes back to the user. In the cases of Data Science Notebooks and JavaScript, the target environment itself is well-suited to direct experimentation (data analysis in the web notebook, or webpage behavior in the browser). REPLs and TDD can be used to quickly test and verify the behavior of isolated pieces of code, after which the user can reason about how those findings will translate into the program context. Compile-time or IDE-assisted type checking and linting can also be thought of in this way: these tools embody some very limited model of the program, and can therefore give you very fast feedback about your changes, such as “you are referencing an undefined variable”, “you are trying to do arithmetic on a String”, and so on.

More challenging domains

Looking for cases where it is difficult to get this kind of fast and reliable feedback can also be an interesting exercise. These cases are often instantiations of the “Norbert Wiener’s cat” quoted above: truly high fidelity experiments require basically doing the thing for real (i.e., the only model is the actual cat), which can incur long feedback cycle times, business risk, or other costs.

Microservices

Coding in a microservices architecture often involves dependencies on other services, which raises the question of how one can quicky experiment and verify your understanding of other services and your own code’s usage of them. Spinning up an entire versions of the full service may be complex or impractical (failing fast feedback), and test fixtures or other simulators of other services may not be fully realistic (failing high fidelity). Adapting development tools and practices to microservices is a huge topic, which is a testament to the difficulties here. For a much deeper and more comprehensive discussion of this issue, see “Testing Microservices, the sane way” by Cindy Sridharan.

Infrastructure

“Infrastructure as code” is a popular phrase, but (so far) the IDE is unlikely to be able to tell us what is going to happen when we switch over to a different load balancer, spin up a new service, or change the firewall rules. The lower-level details at play here tend to resist abstraction (hard to get high fidelity), with practices converging towards some variant of “try it and see” (which can suffer from expensive feedback cycles).

X as a Service

Applications increasingly involve stitching together external services provided by cloud providers and other platforms. In some sense, we can consider these as “external microservices”, and they inherit many of the same complications, albeit while being possibly even more opaque.

Evaluating ideas and improving pain points

To the extent that this framework accurately captures characteristics of software development, what can it tell us? Whenever we encounter excessive developer tedium or friction, we can try to see if we are suffering from cases where our models of the target system lack fidelity, have long feedback cycles, or both. Possible approaches to improving things can likewise then be understood as attacking one of these fronts:

developing a higher fidelity proxy or model to the target system
getting faster feedback on the existing target

Many emerging tools or practices around microservices, infrastructure, and cloud applications can be understood as trying to enable faster feedback cycles. Stretching the “Norbert Wiener’s cat” analogy to the breaking point, the “test in prod” school of thought is arguably about builiding a suitably robust, flexible, and well-instrumented cat. On the other hand, trying to develop higher fidelity simulations or models of these environments (without using the actual cat) seems like a relatively less well-explored direction so far.

Graphs, combinatorial optimization, and product development

2020-07-24T00:00:00+00:00

Introduction

Software product development can be thought of as spending scarce time, effort, and attention in order to (hopefully) deliver customer value. The actual details of how this happens in practice are rarely as simple as “go build XYZ”, instead messy reality consists of an interconnected web of target users and use cases, deliverables and milestones, noisy estimates, and interlocking dependencies. Teams must navigate questions of prioritization and sequencing in the context of this complexity and uncertainty, and mental models or frameworks can be useful tools to help impose some order onto the chaos. This post describes one possible framing inspired by a well-known combinatorial optimization problem over graphs, Prize Collecting Steiner Trees (PCST).

Minimum viable product

Say your team is building a simple analytics service to help email marketers better understand the performance of their campaigns.

Assuming we understand the target user needs well enough, delivering some bare minimum system for this use case will require implementing a handful of foundational functionalities. For example, perhaps our base system needs:

instrumentation to determine if emails are opened or links are clicked, sending outcomes to …
collection machinery for capturing and ingesting this data into a …
database for persistently storing this information in a form suitable for …
query interfaces for users to run various analyses.

Feature backlog

The magic of software is that additional potential use cases and features beyond this basic core are limited only by the imaginations of the users, product management (PM), and engineering team. Marketing analysts may also wish to do one or more of:

join target emails against other customer information (eg, demographics) to slice and dice success rates
generate pleasant and informative data visualizations
use those visualizations to create real-time business dashboards
train predictive machine learning (ML) models to optimize email personalization
do all of the above from their mobile device.

None of this comes for free. All the additional code has to be designed, written, tested, bugfixed, deployed, and monitored. Weighing these costs against the expected benefits of the resulting features is one of the principal tasks of product development.

Dependency structure

There also exists an underlying dependency structure among these enhancements. Like a strategy game tech tree, it is unlikely that you can successfully ship real-time dashboards before building basic charting, and training ML models will require being able to integrate your click data with other data sources for feature generation. How can we incorporate these constraints into our planning and decision-making?

Put a graph on it

Just as someone with a hammer sees all problems as nails, a Computer Science (CS) background can lead to perceiving all problems as amenable to graph-based approaches, which is exactly what we’ll do here. Let $G=(V,E)$ be a directed acyclic graph (DAG) representing our product roadmap, where vertices (or nodes) $v \in V$ correspond to features or capabilities (such as basic charting) and their incoming edges $e \in E$ correspond to the aforementioned dependency structure. For example, if $v_{b}$ “real-time dashboarding” requires $v_a$ basic charting, we can represent it with a directed edge $(v_a, v_b)$:

We can then define a special root vertex $r$ to represent the current state of our system, and populate out the rest of the capabilities and dependencies as nodes and edges connected to $r$:

What about the costs and benefits? We can model these with a non-negative cost function over edges $c: E \rightarrow \mathbb{R}^+$ and a similar profit function over nodes $p: V \rightarrow \mathbb{R}^+$. Intuitively, we would like to devise our development plans to achieve maximum benefit for minimum cost. This goal can be translated into the problem of identifying a connected subgraph $T \subset G$ containing $r$ (in fact, $T$ will always be a tree) to maximize the objective function

\[\max_T \sum_{v' \in T_v} p(v') - \sum_{e' \in T_e} c(e')\]

where, if $T = (V’, E’)$, then $T_v$ is the set of vertices $V’$ and $T_e$ is the set of edges $E’$.

Each candidate solution $T \subset G$ corresponds to some tree rooted at $r$. In our problem domain, this corresponds to a development plan spending the effort $\sum_{e’ \in T_e} c(e’)$ to achieve the milestones $v’ \in T_v$. Our domain dependency constraints are guaranteed to be met via the graph encoding, the connectivity requirement on $T$, and the inclusion of root $r$.

It turns out that this is an instance of a well-studied problem in theoretical CS known as the Prize-Collecting Steiner Tree (PCST) (nodes are “prizes”). Example applications of this problem are optimizing network layouts in telecommunication infrastructure or sensors in the utility grid. The PCST problem setting is an undirected and rootless graph, whereas the special case shown here with directed edges and a defined root $r$ is an instance of a rooted Steiner arborescence.

Implications for product development

The utility of a simplified model like this can be evaluated in terms of helping us to reason about real-life situations and manage the effective allocation of scarce resources. We can examine some some typical failure modes and advice through this lens. In each of these scenarios we show simplified schematic payoff charts of total cost (effort expended) on the x-axis versus profit (value delivered) on the y-axis.

Peanut-buttering / breadth-first

“Peanut-buttering” refers to spreading efforts too thinly across too many goals. Consider a relaxation of the PCST problem where you can partially buy edges, but only gain the profit when an edge is fully purchased. In a peanut-buttering strategy, we allocate our spend across many edges, corresponding to highly parallelizing work streams across as many tasks as possible. The downside is that no value is delivered whatsoever until tasks start crossing the finish line, as shown in the payoff chart:

In practice this strategy can be even worse than it looks here, due to uncertainty effects that will be mentioned below.

Over-specialization / depth-first

Going to another extreme, we can imagine pursuing some single longest path as deeply as possible. This could correspond to building out some very specific niche use case to the exclusion of any even basic complementary capabilities.

In this scenario, we can initially make good progress but eventually plateau into diminishing returns by continuing to invest in further incremental improvements along this deep dependency path. Returning to our marketing analytics example, this could be a development plan where we continue to deliver increasingly sophisticated and esoteric advanced ML capabilities without ever investing in even basic visualization, reporting, or data import/export functionalities.

Happy medium

In the idealized case, work is done in such a way to continuously balance the advancement of three purposes:

deliver immediate value
unblock subsequent high-value items
gain new information about the graph

The third item refers to the fact that, in practice, we will not have perfect knowledge of costs or payoffs, so learning new information about these is itself valuable.

If the first two can be well-balanced, the hope is that this approach can navigate between the Scylla of overly-diffuse efforts and the Charybdis of overy-narrow plans to yield a nice and consistent “up and to the right” payoff chart.

Graphs all the way down: PCST for technical architecture

While the motivating examples so far focused on tangible end-user value, we can apply the same ideas to more purely technical tracks of work as well. In this context, the prizes might be de-risking a particularly tricky part of the design, unblocking teammates, or accelerating overall development through improved tooling or infrastructure.

The map is not the territory

Of course this model is not perfect and elides crucial complicating details that make real situations so interesting:

team capacity is not an undifferentiated mass of abstract “points” - there are different skillsets, working styles, and team dynamics
goals are not simple binary outcomes, the quality and particulars of what gets delivered matter tremendously
true costs and benefits are not actually known
even the graph structure is probably not known
everything is continuously changing over time, both in terms of the underlying reality as well as our own imperfect knowledge.

As mentioned earlier, in such an environment the “prizes” can take the form of new information itself, such as learning whether the technical design can meet the desired performance requirements or if the target feature satisfactorily solve the customer use case. This added dimension means that the real-life optimization problem more closely resembles messy “explore vs exploit” trade offs than well-understood computational bottlenecks.

Your mileage may vary

Would I recommend literally encoding your plans into this format and dumping them into some solver software to decide your optimal course of action? Probably not. Is it helpful to have this mental model simmering in the background of your consciousness? Perhaps. Can visually sketching out approximations of these graphs help you to communicate and coordinate within and across teams? Please try it and let me know how it goes!

References

The idea of encoding dependencies into graphs is ubiquitous in large-scale project management, but most discussions I had found were more in the context of tracking the execution of some predetermined plan in terms of its inter-task dependencies, eg with Gantt charts. Deciding which areas to pursue at all, and in what order, did not seem to be the focus in quite the same way as described here. Product roadmaps and other visualization tools have some of this flavor, but I could not find much around framing the problem in terms of an objective function. The interplay between optimizing value and acquiring information in sequential decision-making discussed in The Principles of Product Development Flow may be closer to this line of thinking. If you know of other similar resources or writing please do let me know.

PCST is known to be NP-hard, and has been the subject of very interesting research, in particular on Linear Programming (LP) techniques for guaranteed-factor approximation. A nice collection of results for Steiner tree variations can be found here, and some landmark results on this specific problem are:

Bienstock, Goemans, Simchi-Levi, Williamson: 3-approximation
Goemans and Williamson: 2-approximation
Archer, Bateni, Hajiaghayi, Karloff: $(2 - \epsilon)$-approximation

Self-explainer: privacy-preserving contact tracing

2020-04-26T00:00:00+00:00

Self-explainer: experimenting with taking my usual rough notes on various topics of interest and posting only very slightly nicer versions of them publicly.

Context and problem

Way back in the first few weeks of the Bay Area pandemic shelter-in-place orders, I happened upon an interesting tweet from Carmela Troncoso, a security professor at EPFL. The thread was about a decentralized project for contact tracing called DP-3T. I read the simplified 3-page brief, found it pretty interesting, and bookmarked the longer white paper to read later. A week later (!), the news was abuzz with stories about the Google and Apple’s joint project to bring a similar technology to their mobile devices.

When someone is diagnosed with the target infectious disease (eg, COVID-19), public health officials would like to identify everyone who has recently been in close physical proximity to the diagnosed person in order to take protective actions to minimize further spread. The key problem here is identifying this set of potentially affected people, known as contact-tracing. The decentralized privacy-preserving variant of this problem is, to a first approximation, how to achieve this goal without simply building a massive central database that tracks everyone’s whereabouts at all times, a solution which has its own risks and drawbacks.

The key trick: ephemeral identifiers

So if Alice has been diagnosed with disease X, how can we identify the set of people who may have been inadvertently exposed to infection via their proximity to her over the past several days? Remember that we don’t want a centralized log of GPS coordinates, or a real-identity snapshot of your personal connections.

The key trick is the use of ephemeral identifiers, a private stream of codes generated by each installed copy of the app, similar to 2-factor authentication (TFA) apps like Google Authenticator. An informal simplified caricature (since writing this the project itself has a similar nice cartoon summarizing it as well) of the mechanism is as follows:

Everywhere Alice goes, her phone is constantly broadcasting her current ephemeral code, which changes at some regular intervals (eg, hourly). Her app locally (ie, on-device) stores a timestamped list of all of its generated codes, and everyone else’s copy of the app is doing the same.
Every copy of the app is also constantly listening for codes, and locally (again) recording timestamped records every one that it “sees”. This data acts as a kind of anonymized and device-local contact history.
If Alice is diagnosed with the disease, her doctor publishes (with her consent) the list of her codes from the past several days to some accessible location with some annotation indicating people who were in contact with these codes are at risk. Everyone else’s copy of the app can periodically poll this source and compare the published list of “at-risk” ephemeral identifiers against their local copy of observed identifiers. If there is a match, the user knows they may have been exposed and can take some action (getting tested, self-quarantine, etc).

This solution seems like it should satisfy the core functional requirement: identifying users who were in close physical proximity to later-diagnosed individuals (assuming everyone is using the app at all times). Proving the desired privacy properties is a more subtle and complex question, and also depends crucially on careful attention to implementation details. However, at a high level it seems plausible that, assuming the identifiers do not “leak” information, the design does a reasonable job of limiting the utility of the system for unintended or malicious use cases like ad targeting or political dissident hunting.

Additional details and complications

The materials posted on the DP-3T Github are quite interesting and get into deeper details of the practical and legal aspects of the system. For example, the caricature above is not strictly accurate - infected users actually upload an ephemeral ID-generating seed (which is itself ephemeral!) instead of their raw ephemeral IDs (see FAQ).

From the perspective of (especially European) privacy regulations and principles, one key claim is that the ephemeral ID-based design ensures that:

… from the server’s perspective, the data held is effectively not personal data, and cannot be linked back to individuals during normal operation.

That is, in some sense any installed copy of the app is (by design) a “dumb” ephemeral ID publisher and receiver, no more and no less.

Some other interesting questions explored are:

how can user privacy be compromised if a user’s phone is physically confiscated?
how to avoid users spamming/DoS’ing the system with false diagnoses?
how could a user opt-in to sharing additional granular information that epidemiologists would find useful?
what are the back-of-the-envelope scale requirements for the backend that publishes the infected ephemeral IDs (eg, queries per second)?

Will it work?

The effectiveness of this kind of effort would depend on

maximizing participation/adoption/coverage
minimizing “fragmentation” - everyone should be using the same system (at least within a geographical area)
actual actions taken in response to the contact-tracing information

In some jurisdictions the government can more or less impose app adoption by fiat, while in others users will have to be persuaded to voluntarily participate. Public-private partnerships with influential commercial entities (say, Google+Apple) might be one way to achieve this. Likewise, the immediate actions triggered by the risk signals will vary depending on local political and legal conditions.

When rolled out, how would the efficacy of these solutions be measured? In the absence of randomized controlled trials, what kinds of data analysis techniques could researchers use to estimate the influence of these kinds of systems on the the all-important $R_t$ parameters?

Techniques of interest

2020-01-31T00:00:00+00:00

Note: cross-posted from my website, where I recently attempted to jot down some brief descriptions of the problem domain I’m currently focused on as well as some tools and techniques I am particularly interested in.

Machine learning

One might expect software system behavior and its associated telemetry to be perfectly well-ordered and predictable. Setting aside the Entscheidungsproblem, the complexity, dynamism, and human-driven nature of these systems mean that, in practice, much of the data is actually noisy or chaotic. This provides a promising environment for machine learning and data mining: we have some domain knowledge about the underlying structure and mechanics of the data-generating process, but randomness and noise in the observed signals. Some example families of relevant machine learning approaches and problem formulations here are

time-series modeling
clustering and dimensionality reduction
partial or implicit supervision
structure extraction/induction
exploitation of graphical structure
anomaly or outlier detection
classifiers and their explanations

Population modeling

Another interesting question is how to pool or combine data across different instances when estimating models. We could consider each entity (eg, host machine running some application) to be totally unique and then estimate models for each in complete isolation. Going the other way, we could naively estimate a single model to cover all instances. The question of how to use metadata or domain knowledge to best interpolate between these extremes is a rich area for exploration, closely related to Bayesian hierarchical modeling or parameter tying in deep neural nets. Ideas from differential privacy may also be relevant in this context.

Software reliability

The challenges of ensuring that software works as intended can easily exceed the nominal effort and cost of creating that software in the first place, especially as you continue to iterate. Beyond the standard best practices around testing code and instrumenting systems, there are exciting opportunities in this area around functional programming, static typing, and the monitoring and testing of complex data-dependent systems like data pipelines and machine learning models.

Approximation algorithms

Resource limitations are an inescapable reality of practical data analytics systems, but surprisingly often it is possible to dramatically expand the operating envelope by accepting some small probability of non-exact results. These techniques are especially appealing where your use case is insensitive to a small approximation error, or if this error is insignificant in comparison to sources of noise or distortion already present in your data.

Product development

How do teams build the right thing, the right way? In general these are hard problems, and can be even trickier on the frontier of novel technologies, applications, or data resources. The effective allocation of scarce effort and attention under conditions of uncertainty within the context of organizational coordination across teams and timezones is a “grand challenge” problem in its own right.

Prior work

Previously, I worked on partially supervised probabilistic modeling of grouped event count data with latent variables. Specifically, I focused on text mining applications where we model word count representations of documents with latent topic models, a class of techniques that can exploit word co-occurrence patterns to recover human-meaningful “topics”. Often these purely statistical topics are not well-aligned to ultimate end-user modeling goals, motivating my research exploring mechanisms by which user-provided side information or domain knowledge could help guide statistical topic recovery, and how these learned topics could then be used in applications such as biomedical research or national security.