Recommend Resources for Starting A/B Testing 

This post was co-written with Emily Robinson, co-author of Build A Career in Data Science.

A/B tests are a powerful tool used by thousands of companies, from small start-ups to tech giants to companies founded in the 1850s. While they sound simple in theory, remember that “getting numbers is easy, getting numbers you can trust is hard!” Mistakes can be costly – we’ve caught and fixed multi-million dollar issues. To make A/B testing useful, you need to design the experiment well, check data trustworthiness, run some statistics, oversee an experimentation platform, and effectively communicate results. With so many moving parts, it’s best to learn from others. After over a decade of combined experience building and running A/B testing systems at various companies and teaching experimentation best practices, we wanted to share our favorite resources for getting started with A/B testing.

“Only a fool learns from their own mistakes. The wise person learns from the mistakes of others.” -Otto Von Bismark

There is a wealth of public material on A/B Testing, from blog posts to academic papers to talks to courses. But that same abundance can be overwhelming and make it hard for someone new to experimentation to know where to start. We’ve put together these two starter kits – one for product teams and one for data teams – to help. We also created an accompanying GitHub page with even more resources, including advanced topics. We hope these will get you up-and-running quickly and avoid many of the common A/B testing pitfalls. If there’s a resource you love that we haven’t included in our GitHub page, please file an issue or submit a pull request to add it!

Product Team Starter Kit

This starter kit is an action-oriented guide of short, to-the-point resources meant for members of a product team–Analysts, Product Managers, Designers, Engineers, etc. If you’re going to be the primary person responsible for running and analyzing experiments, or you’re curious and want to dive deeper, check out the Data Team Starter Kit. 

Experimentation Introduction and Best Practices

Statistics for Experiments

Data Team Starter Kit

If you’re a data team members (Analysts, Data Scientists, Data Engineers and Data Execs), we recommend reading the Product Team Starter Kit first and then coming here. For an in-depth guide to A/B Testing, we highly recommend the book, Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing by Ron Kohavi, Diane Tang, and Ya Xu. The resources below are mostly papers and can be dense, so it’s okay if you don’t understand everything or end up skimming. Their main purpose is to give you an overview so that you can uncover blind spots and know key terms if you decide to go deeper.

Designing Experiments 

Experimentation Platform 

Despite sounding simple, experimentation platforms are complex, large engineering investments. If you didn’t previously work somewhere that built one or if this is your first time reading about them, seriously consider buying instead of building. We consider a full “how to” to be advanced topics. Here are a few resources to give you a feel though:

  • Anatomy of a Large-Scale Experimentation Platform by Pavel Dmitriev et al. An overview of the architecture behind Microsoft’s Experimentation Platform, including its four core components and the reasons behind their design. 
  • The Modern Experimentation Stack by Che Sharma. Video explainer of the components and challenges in an exp platform. If you don’t have an hour to watch this, you don’t have time to build it. “Experimentation is a big pile of medium-sized problems that all need to work in concert.”
  • Supporting Rapid Product Iteration with an Experimentation Analysis Platform by Arun Balasubramani. DoorDash’s analysis engine for their exp platform, including why they automated and how they implemented it. Links to other useful posts.
  • Experimentation Platform in a Day by Dan Frank. You’re an adult who can decide to build instead of buy. We both worked at companies where building made sense. It could make sense for you too (though, if you have to ask…). This post promises a quick start with code snippets for all pieces. Do this at your own risk or only as long as sensible.  

Conclusion

We hope these starter kits have provided you a good jumping off point in your A/B testing journey. If you’re looking for more, check out our GitHub repo, which includes resources on specific advanced topics like switchback tests and estimating network effects. If you found this post useful, we’re considering making more resources for common A/B testing pain points – if you want to register your interest (and have a say in which topics we tackle), take our survey!

Journey Through Prediction Markets

Will there be a vaccine to COVID? Will Trump be reelected? Would Iraq have weapons of mass destruction? Uncertainty. The invisible yet central character in all real world stories. People deal with uncertainty in different ways. Some “mitigate” uncertainty by just declaring, “Definitely!” Confidence is sometimes self-fulfilling and has a place. However, overconfidence can lead to a misguided war in  Iraq.

Another family of answers, can be summarized as “Maybe”. This isn’t wrong, but that’s not a virtue. “Not even wrong” is a famous scientific insult hurled at the vague. Is there a way to make forecasts with the precise meaning of “definitely” (100%) or “impossible” (0%), but without the evasive slipperiness of “a good chance” or “a distinct possibility”?

Prediction markets are a possible answer. Prediction markets, like PredictIt,  require people to implicitly put precise probabilities on events. They wager money for personal profit. Collectively this produces very specific forecasts (eg 74%).  These forecasts are likely to be usefully accurate with some caveats (paper). The reason prediction markets get legal clearance is because academics are studying them. 

Wandering

For the last few months, I’ve researched and become fascinated by prediction markets. How and why do they work? When do they fail? What is trading on them like? How do you evaluate whether “74%” is a good forecast?

From scientific papers to blog posts, there’s a lot to read and explore. There is also a lot to experiment with. An overwhelming amount, in fact. I need a way to navigate and organize my desire to learn.

An Instrumental Goal

They say, “it’s the journey, not the destination”. Outside of airlines, that may be true. However, journeys are still organized and defined by their destinations. As a person who enjoys dabbling and wandering, sometimes I still find destination-type goals are a helpful tool for ensuring the journey is what I wanted.

To organize my wandering into a journey, I’m setting an instrumental goal (a destination). My primary objective is learning and my secondary objective is determining if prediction markets could offer a longer term, passive(ish) investment opportunity.

To explore both, my instrumental goal is investing a fixed amount on PredictIt for a year and assessing:

  • Can I clear an annualized return higher than my alternatives? (target: 10%+ net of all fees) [1]
  • Can I find a repeatable strategy with a good return on time? [2]
  • Is this fun / am I learning or is this purely becoming a transaction?

Topics I hope to learn about: statistical methods to assess forecast calibration, market dynamics (order books and arbitrage), complex project management and investment strategy development / backtesting.

My year will end around October 9, 2020 (this post is belated). In the meantime, I may share some lessons learned along the way!

_____________________________________________________

[1] Goal is beating long term average returns of equities (6-8%) since an index fund is the alternative. Before the 5% withdrawal fee, I’m targeting a 15%+ return.  
[2] Attractive strategies on a % return basis may not scale well due to capital limits (max $850 per contract). With automated trading banned, I’ll eventually need a large enough $ return on my time if I continue investing beyond this year.

Technology: Humanity’s Magnifying Glass

People do wonderful things with technology. They build digital communities, teach themselves new skills and raise money for causes. People also do terrible things with technology. They build atomic bombs, control botnets and cyber bully.

Technology isn’t automatically good or bad. It just expands human’s capability for good and bad. We can wage war more efficiently or we can heal more efficiently. Also, like most big picture questions, very few things are decidedly good or bad. There’s a lot of grey area. Technology definitely expands the amount of grey area. Should we applaud Facebook for letting us stay in touch with more people than Dunbar’s number suggests was possible? Or should we demonize Facebook for making us feel more lonely? Should we love cars for letting us travel or should we hate them for all of the pollution and being a major cause of youth death?

Technology just lets humans be more human, flaws included. Technophiles (blanket lovers of tech) and luddites (blanket haters of tech) make the same mistake: they have a default response to technology. Technology does not automatically improve or worsen the world. It’s all about what humans do with it. I am optimistic about technology, but that’s only because I am optimistic about humanity.

People sometimes say that art is at it’s best when it mirrors humanity. If art reflects humanity, technology magnifies us.

Solving For Place

My last 5 text messages went to people in 4 different states. My friends and family are spread out across the country. I grew up in one place and went to college in another. My first job was in a different city and then I moved again for my second job. The people I spend the most time with are those that live near me. As a system, geographic convenience is a very random way to allocate my time. How can I do better? Or, at the very least, how can I maintain friendship regardless of current location?

My friend, Ryan, started sending out a weekly email called a “Week Ahead”. It’s a way to stay in touch, share life and stay top(ish) of mind with close friends. When I spent two years in NOLA, I still felt like Ryan was with me. Even better, when I did see Ryan on a trip to NYC, we could skip the small talk. We already knew the general facts of each others’ lives. Instead, we could focus on the context of those event and the emotional complexity surrounding them. I did this for a while and then stopped. I’m starting again.

Another friend of mine periodically calls me on his walk home from work. I like that. I also just generally like phone calls as a way to talk to people.

Do other people have suggestions for the best way to stay in touch across geographies?

The Over Examined Life

Socrates said “The unexamined life is not worth living”. This was the defense he offered for teaching kids about philosophy (a crime punishable by death). He was still executed. The American psyche has a pervasive belief that we should examine life to find the best possible existence.  The media is constantly offering glimpses into the lives of “the rich and successful”. Implicitly, this arouses jealousy and a desire to become them. Business magazines offer highly generic insights on how they achieved success.

A school and parental system tells kids that they can be anything for the first twenty-ish years of their life. Anything! Students still make make a few non-committal choices when they go to college and pick their major (assuming they are lucky enough to do that). Suddenly, they graduate and now they must pick a direction. Despite the fact they can be “anything”, they must pick something. So they do. Then they spend years wondering if they are on the “best” path. They ask friends if they like their job and compare it to their own.

Consider this: you decide to go out for dinner. But where do you go? Sometimes, 15 minutes passes. Thrillist is consulted. Foodie friends are texted. Chrome is pushed to its limit with open tabs. In the end, you pick somewhere and probably doubt your choice.

I won’t address the major life choices (marriage + career), but at least for the small decisions, I like to satisfice. Satisficing is combination of satisfying and sufficing. Satisficing is about making a choice that is good enough. The goal is to quickly arrive at a decision that is satisfactory. It’s not about making the best choice. The “best” is elusive and subjective. Good enough is much easier. If you live in a major city and a restaurant has been around for a few years, it’s probably good enough. Bad food goes out of business very quickly.

I suggest using a heuristic, such as highest review on foursquare, to make a quick decision. You could also ask a friend and take the first suggestion. Alternately, you can visit a place you already like and spend time getting to know the people who work there. For medium stakes decisions, I’ve found it helpful to create a deadline for a decision. Don’t analyze past this deadline. Funny enough, I often find I go with my gut anyways. Not that my gut is necessarily right, it’s just rare to find a silver bullet of information even with more analysis. Without that silver bullet, gut is usually the decision maker.

Recently, I found myself deciding where to get dinner with a friend. Instead of consulting any lists, we went to the first place we walked past. It was that simple. I ordered the first thing on the menu that looked good enough. It was a great meal because I wasn’t even aware of the awesome options I didn’t know about.

Many people are familiar with the paradox of choice*. If you give consumers many options, they will be less satisfied with their decision than if they had fewer options. Capitalism has given us many choices. So much choice implies that there is a best one and we somehow need to pick it.

If the unexamined life is not worth living, the over-examined life is impossible to enjoy.

*Some research suggests the paradox of choice doesn’t replicate in other experiments and therefore is not a real thing.

Know Thyself

The best return on time I’ve found is reflecting on myself. Key areas of reflection: habits, motivation, productivity, strengths and weaknesses. I’ve written about experimenting with my dietcaffeine intake and how I think.

There is an advice mill, amplified by the internet, of generic advice. As if there is a universal alchemy formula to life. A pet peeve is when people blindly copy habits or mannerisms of people they deem successful. Celebrity culture magnifies this. It’s extreme absurdity is best displayed whenever there is a new overnight success. They always get interviewed in minute detail. Nobody cared about this person’s breakfast habits two weeks ago. Now, where they shop is supposed to unlock success? The person getting all of this attention is usually still in shock that anyone cares. They aren’t different from two weeks earlier. Unfortunately, given enough time, they will eventually believe this constructed narrative.

For another example, look at writers. There are no obvious lifestyle patterns that cause success. Some were alcoholics. Others were sober and religious. Some wrote all day. Some just here and there. Some believe in semicolons and others detest them. There is no copy and paste formula for being a successful writer. Yet there is a lot of lifestyle advice about what will make writers successful.

People spend a lot of time superficially learning about others. Instead, I use that time for self reflection. That’s been helpful. The most important thing isn’t to copy routines. It’s to figure out the ones that work for me.

Not knowing one’s strengths is the greatest possible weakness. Conversely, knowing one’s weaknesses is the greatest possible strength.

Learnings From Building A Dad Jokes Bot

I am currently learning about conversational UX’s. A few months ago, I built a Slackbot named Alan (RIP) in Python to do reporting at Dinner Lab. He was a worse UX than a dashboard so I got rid of him. Earlier this week, I built an SMS bot that tells Dad Jokes. It was a valuable learning experience.

Check out the landing page. The page prompts the user to text “Dad Jokes” to a number. Any mention of “dad jokes” and it will send a meme such as:

12705571_1072430306157022_7219233956192260256_n

Almost any other message will lead to a text based dad joke.

Twilio served as my back end. Twilio, while great, is not free. I tried to minimize costs, which meant trying to nudge users to send fewer messages. I had never tried to design a non-sticky, limited UX. That taught me a surprising number of lessons.

In no particular order, here they are:

Adoption

Observations: There is a ton of friction for non-tech people to use an SMS bot. Without telling them I built it, I sent around a link to the website. They responded that they were sketched out. SMS bots have a history of being spammy. On the other hand, my tech friends happily texted the number.

It’s also a lot of work to visit a website, find a number, text the number, learn valid commands and save the number. Definitely more upfront work than downloading an app or visiting a webpage. Actually, my favorite SMS bots are alert notifications where I gave a company my number. I only want to hear from them when flight is delayed.

Takeaways: It’s easy to get fooled by tech people’s willingness to try bots. SMS bots have a lot of friction in trust and discovery. Slack and Messenger are somewhat better for trust.

Conversational UX

Observations: People don’t explore new commands without them being suggested. I had a few easter eggs that nobody discovered. People often retype the same commands again and again. Funny enough, I realize I do that when I use iTerm.

People could find the same jokes on the internet. The slight advantage of the conversational UX is the ability to push into a message queue. The advantage is not actually that it’s a message. People usually preferred GIF’s and memes to the pure text jokes.

A number of users requested daily updates. If I wanted to make this sticky, I’d establish a cadence of when to expect jokes. Jokes are something that people share. Ideally, I’d send a joke at a time of day when people are likely to be in a social situation so they would share it.

Takeaways: Find ways to naturally suggest new commands. Always offer a “help” command. Include some suggestions in your onboarding. People default to hi, hello, hey, etc. when they start. I prompted people to start by texting “Dad Jokes”. This saved some messages and just got straight to the jokes.

Content Specific Insights

Observations: As soon as someone got the same joke, they’d stop using the bot. The bot randomly chose a message to send. I calculated the probability of when a person would get a repeat using the logic of the birthday paradox. I chose numbers such that people would only get a few jokes before a repeat. Earlier this week, I started experimenting with Marsbot. I’ve asked Marsbot for 10+ suggestions in a row. There were no repeats.

Takeways: Don’t repeat content. I’d even be careful about repeating the same patterns of speech. Marsbot has a few different templates for its recommendations.

Latency

Observations: This is boring, but latency and downtime are killers. It’s similar to the web where page load times matter a lot. Especially for a bot where things are interactive, latency is a terrible UX. Of course, I liked the latency because it got people to ask fewer jokes.

Takeaways: Minimize latency. It is core to conversational UX. If there will be a delay between responses, consider sending a “thinking” text. Also consider redesigning to get rid of latency.

 

The Value of A Working Theory

Life and business are filled with open ended questions. There is a natural tendency to treat problems as discrete. On close examination, there will be patterns connecting decisions and outcomes. A working theory starts as a way to discover those patterns. A thesis eventually helps exploit them.

In the pattern discovery phase, I rapidly iterate through working theories. They guide thoughtful exploration. With enough feedback, I build conviction. At that point, I call it a thesis. A thesis enables me to be scalably thoughtful. Once I develop a thesis I can apply all of that thought to a new problem at a low marginal cost. It is a framework to quickly make confident decisions.

I split the remainder of this post into two sections. The first is how I iterate through working theories to build conviction. The second is about the benefits of a thesis.

Pattern Discovery (A Working Theory)

By making one off decisions, I wrestle with the same issues again and again. I can miss the patterns if I view each choice in a vacuum. By reflecting on deeper concepts, it is possible to use each decision to build and refine a framework. It is a lot of upfront work.

A novel situation usually calls for novel thought. In the pattern discovery phase, I try to have as many novel experience as possible. You don’t know what will be valuable or cause an insight beforehand. So it’s best to explore. A few names for this process are opportunistic wandering (Gary Chou), convex tinkering (Nassim Taleb), maximizing entropy (Gary Chou again), lean ideology (Eric Ries, et al) and fertile chaos (me).

Whatever the name, ideas need to collide in unpredictable ways. One way I do that is by reading in parallel. I also seek out new experiences and alternative points of view. Perhaps the most important thing is to learn by doing. Following intense periods of doing new things, I try to spend time thinking.

When a new experience offers cognitive dissonance, I evolve my working theory. The Talmud is 6200 pages of Jewish law. It has an unusual format. It starts by presenting a law. Then there are questions about edge cases and contradictions. These questions lead to discussions. Those discussions usually clarify and evolve the law. Life’s events and others’ ideas do the same to my working theories.

[optional example] In consulting case interviews, people suggest being “hypothesis driven”. The interviewer usually asks an open ended problem like, “A lumber company is experiencing declining profits – what should they do?”. Being hypothesis driven means force picking an almost arbitrary starting point. Such as, “I believe revenue has gone down”. This working theory (hypothesis in consultant-speak) is not meant to be correct. However, it is empirically falsifiable and gives you a clear next step (ask about revenue). Usually, you iterate through different theories quickly. However, you start to converge on a promising one. For example, you learn that revenue is stable, prices are unchanged and volume is the same. So you switch your hypothesis to “costs have gone up”. The data confirms this is correct. Now you hypothesize that it is variable costs, which is again confirmed. Now you zoom in to discover that the price of labor has gone up since the state raised the minimum wage. Having a working theory guided the exploration.

This blog is my place to write down theories and evolve them. They might not be right, but they are helpful. Hence my blog’s name, “A Working Theory”.

Pattern Exploitation (Thesis)

A working theory can become a thesis. The difference is conviction. A future post will discuss how I form habits. The cornerstone of every difficult habit I’ve formed is conviction. The effort of pattern discovery pays off as soon as I have conviction. Of course, developing theses isn’t a linear process. There are feedback loops and revised ideas.

There are many advantages to having theses about life. One is quick decision making in relevant situations. Theses are like habits for complex decisions. Life’s biggest opportunities come rarely and with short time windows. Even if you could thoughtfully arrive at the right decision, it’s worthless if you can’t do it quickly. Theses help with that.

Most complex decisions come with lots of noise. Life often presents choices with familiar signal, but new noise. Theses focus on the key issues. Theses are the result of sifting through a lot signal and noise.

Theses help spot diamonds in the rough. Sometimes, new choices look superficially bad. However, a thesis focuses on the key issues (see above). If those issues match your thesis, you can confidently take action. You don’t need social approval if you have conviction in your thesis and its application to this decision.

A surprising benefit of a thesis is the management of anxiety and fear of missing out (FOMO). There are many careers, hobbies, people and ideas worthy of time. However, they are mutually exclusive to each individual. The “best”choice is nonexistent. Instead, I want to make choices that are good enough for me. Thoughtful theses give me that conviction. There are enormous benefits of focus. Theses give me conviction that what I am focusing on is good enough.

A thesis with conviction allows its believers to make quick, possibly contrarian decisions by focusing on the key issues without FOMO. That’s powerful. It’s why I’m willing to wrestle with working theories on this blog and in life.

I articulated a couple of my current life theses here.

South Park Text Analysis: They Stopped Killing Kenny and the Rise of Randy

South Park is one of TV’s longest running scripted, primetime shows. It has won Emmys, sparked controversy, earned lots of money and given me some of my best laughs. The fact it has survived so long is remarkable. Fans might suspect that the show has made changes that have let it survive. It is impossible to say what actually caused the show to survive. However, it is possible to analyze what has changed, which is what the rest of this post does.

My only goals were to explore a show I love and to play with some text and visualizations libraries. I had a good time so I decided to share what I did. This is meant to be a fun read for any South Park fan. Data nerds can check out the gory details on Github.

A Change in Jokes

We can start exploring the evolution of South Park by analyzing jokes. Over time, some of South Park’s most well known jokes have stopped appearing as often. The amount of times someone says “Killed Kenny” has declined dramatically since the show’s early years:

Killed Kenny

Actually, in later episodes, the writers will sometimes dance around these signature jokes. They tease the audience by putting Kenny in dangerous situations, but don’t kill him.

Similarly, all characters have also stopped saying “learned something” as frequently. This was a recurring joke, often said by Kyle. However it has faded as the show has progressed:

Learned Something

What has replaced killing Kenny and Kyle learning lessons? One new joke is Butters getting grounded by his parents:

Grounded Butters

Changes in Character Roles

Jokes are not the only part of show that has changed over time. The characters and the their roles have shifted. To explore this, I assigned a value of -1 to lines a character said in the first half of the series (seasons 1-9) and a value of +1 to lines they said in the second half (seasons 10-18). If a character has an equal number of lines in both halves of the show, this should sum to 0*. If it’s negative then they had more lines in the first half than the second and vice versa.

Here are scores for some of the most important characters:

Screen Shot 2016-05-12 at 12.16.38 PM

This doesn’t explain character progression across seasons. However, it is useful in determining which characters are worth exploring further. Stan, Kyle, Chef, Mr. Garrison and Jimbo all have declined substantially. In the second half of the series, Randy and Butters became more featured characters.

Kyle gave up lines. The decline in lines for Kyle also meant a decline in lines for his parents (Sheila and Gerald). Here is the whole family:**

Broflovski

Butters became a much larger character. Stephen is Butter’s dad and that meant more lines for him. Linda, Butters’ mom, had a brief spike in the some of the earlier seasons, but faded as the show progressed. Even though Butters is a more important character his importance fluctuates season to season. See here:**

Butters

Jimbo is featured much less often. Chef was killed off in the 10th season after the actor objected to the Scientology episode. Here is their progress across seasons:

Chef Jimbo

Mr. Garrison became Mrs. Garrison in the 9th season. After a few seasons, Mrs. Garrison changed back to Mr. Garrison. You can see both of them on this plot:

Garrison

As any fan of the show knows, Randy has become more important. In the first season, Randy accounts for less than 1% of the dialogue. By the 18th season, he is speaking almost as much as his son. While Stan remains an important character, he talks less. Randy’s wife, Sharon, fluctuates throughout the series:**

Marsh Family

Changes in Lead Characters

Who is the lead character for an episode’s main plot and how has that changed over time? We can look at which characters had the most words for an episode. Excluding the narrator, only 12 characters had the most words of an episode more than twice. Here they are:

Screen Shot 2016-05-12 at 12.22.19 PM

As you can see, Cartman is the main character the most often by a wide margin. However, other characters have changed roles over time. This chart shows the changes in episode leadership from the first half of episodes (1 – 128) to the second (129 – 256).

Screen Shot 2016-05-12 at 4.44.37 PM

Originally, the show was carried by Cartman, Stan and Kyle. Mr. / Mrs. Garrison and Chef played large supporting roles and even led some episodes. Randy and Jimmy were minor characters that occasionally got an episode. Butters began the show as a background character. He led his first episode in Two Guys Naked in a Hot Tub (episode #39). He started becoming a major character in the fifth and sixth seasons when he temporarily replaced Kenny. He faded a bit in seasons 9 – 11 and has since come back strong. And while he has become a major character, he isn’t the lead character of the main plot as often as I expected.

In the second part of the show, Cartman takes up even more episodes. South Park is clearly Cartman’s show. Even so, it’s diverse with different characters leading episodes. The positioning of other characters has changed.

Stan and Kyle lead less often though still remain core characters. Chef was killed off. Mr. / Mrs. Garrison is still a major supporting character, but leads fewer episodes now.  Jimmy continues to be a supporting character who occasionally leads an episode. Randy has gone from a minor character to one the show’s most important. In fact, only Cartman has led more episodes than Randy in the show’s second half.

Caveat: the show often has a more complicated structure than just a single lead. There can be an additional subplot or two. However, it’s hard to disentangle the 2nd most important character from the core plot and the lead character of the subplot.

Interlude — Fun With Words

Let’s take a break for some fun. Look at some of the words and phrases that can best predict when a specific character is speaking:Screen Shot 2016-05-09 at 11.30.11 PM

If you are a fan of South Park, those probably aren’t very surprising. However, they are quite funny such as Cartman saying “seriously” or “ey”.  You can see a lot of characters’ signature phrases such as Kenny saying “woohoo” or Towelie saying “high”. You can also see specific jokes from memorable episodes such as Kanye saying “fish”. It’s also funny to see how many of Butters’ words are non-committal sounds such as “uh”, “wuh” and “ih”.

An interesting takeaway: a character saying “Eric Cartman” is predictive that Cartman is talking. His alter ego, Coon, also likes saying his own name. For most other characters, saying their name is not  nearly as predictive that character is talking.

Changes in Show Structure

The show’s structure has changed. The seasons now have fewer episodes:

Episodes Per Season

The spread of the dialogue has changed as well. The lead character’s median dialogue each season started at 20% (the blue line). That has increased to 25%, which is a 25% increase from 20%. Who is speaking less?

It’s not the number 2 most talkative character of the show (the red line). That has held fairly steady at 15%. However, all of the other character besides the the two most talkative are speaking less (the yellow line):

Changes in Dialogue

Each episode is more concentrated around the lead character than it used to be.

South Park rarely had multi-part episodes at the show’s start. In the first 9 seasons, there were only 7 multi-part episodes. In seasons 10 – 18, there were 21 multi part episodes. Since there were fewer episodes in later seasons, this is an extreme proportional difference. Look at this:

Change in Multi-Part Episodes

Fan Ratings

You might wonder how fans feel about different types of episodes. I scraped IMDB using import.io to get some fan ratings.  Here are the average ratings for the show when each character is the lead:

Screen Shot 2016-05-17 at 11.06.51 PM

Clearly, fans love Cartman. Butters is almost tied with Cartman and Randy is tied with Stan. This is impressive considering that Butters and Randy started as background characters. They now lead episodes and get comparable ratings to the shows’ most important characters. Even so, Cartman is the show’s all star. He carries the most episodes and has the highest ratings. Kyle is the least popular of the boys. In fact, Matt and Trey almost killed him off in the fifth season. Instead they just shrunk his role. That seems to fit audience preferences.

How do fans feel about multi-part episodes? Here is a matrix of the average ratings. The rows split the show into seasons 1 – 9 and 10 – 18. The columns compare single-part and multi-part average episode ratings.

Screen Shot 2016-05-17 at 11.28.06 PM

As you can see, the multi-part episodes have been substantially better rated than the single-part episodes since season 10 started. This trend towards more multi-part episodes is well received by audiences.

Note: Since the ratings came from a user submitted internet poll, take them with a grain of salt.

Changes in Speech Patterns

How has the dialogue changed over time? Well there are actually fewer lines per episode:

Lines Per Episode

To make up for that, there are more words per line:

Mean Words per Line

Some Things Never Change…

As fans might suspect, South Park often gets involved in politics. Below is a graph that illustrates how the show makes more political references during election years with the blue line. The red line is the number of times a character says “president”. In between election years, there is a rise in jokes about the incumbent. Making fun of politics and politicians will always be a South Park signature.

South Park and Politics

A Bit More Fun

Here are some statistics for different characters:

Screen Shot 2016-05-12 at 12.38.15 AM.png

Here are a couple of fun facts:

  • Cartman has the most lines. He is followed by Stan and Kyle.
  • Kanye, is tied with Towelie for the lowest average word length. They both use a lot of small words when they talk.
  • Kenny has lowest average words per line by a wide margin. He doesn’t talk for long when he does say something.
  • A high percentage of self referential words is claimed*** to be a sign of narcissism. Towelie scores the highest. Kanye West scores almost as high as Satan. Unsurprisingly, Cartman is towards the upper end of this list. Surprisingly, so is his mom, Liane. The Announcer has the lowest percentage of self referential words. In the world of South Park, Jesus falls in the middle of the pack for self referential words. He is barely above a would-be vigilante (Coon) who masterminded a threat against a hospital to try to raise his public profile.
  • Cartman has the most monologues and the most final lines of an episode. He is followed by Kyle, Stan and Butters in both categories.

Summary 

South Park has changed over time. Kyle and Stan remain core characters, but carry fewer episodes. Kyle episodes got lower ratings on IMDB than Stan and Cartman episodes. Popular characters like Randy and Butters were given more lines. Randy has led many more episodes. The change in jokes mirrors these changes in characters. Signature jokes of Kyle (learned something) have faded to make room for new signature jokes such as Butters getting grounded.

Episodes tend to have fewer, but longer lines. The writers give the lead character a larger percentage of the dialogue. The writers are more confident in concentrating the dialogue around fewer characters and the words around fewer lines. The show has fewer episodes each season. They also have more multi-part episodes. In the 19th season (not included in this analysis) almost every episode is connected.

For all of the changes, one constant is Cartman. The show has gave him more lines and more episodes to lead. Fans loved it as they rate his episodes the highest.

That’s it. I can’t tell you that South Park survived because of these changes. However,  I always loved it and still do. Hopefully, you do to. Feel free to share your thoughts on the show’s evolution.

Notes:

This analysis only goes up to Season 18 Episode 9. All code and data can be found on Github.

*There are actually more words in the first half of the series. The numbers are weighted to account for that.
**To muffle noise, this uses a 4 episode rolling average.
***There is research that suggests self referential words are not correlated with narcissism.

Jargon or Shared Vocabulary?

Jargon can be very helpful. At its best, jargon is shorthand for a complex idea. It allows humans to communicate about new topics, niche topics and sophisticated topics. That is jargon at its best though. Jargon is helpful when both people understand the vocabulary. In fact, we might just call that a shared vocabulary instead of jargon.

Once both people understand an idea, it can be efficient to use a concise word to reference it. Close friends do this all the time. They reference a keyword of an inside joke and then laugh without any more explanation needed. However, someone who doesn’t get the joke would sit their awkwardly while those in the know laughed. The is the common scenario of what happens when someone uses jargon. The speaker understands it and the listeners don’t. Or at least, they don’t fully understand it.

Data Science and yoga both offer ample opportunities for jargon. As a yoga beginner, I find it very frustrating when an the instructor uses a lot of sanskrit. If you call it “chair pose”, I’ll remember this much better because you sort of pose like a chair. If you tell me to move into “Utkatasana”, I will waste time trying to remember what that means. More than likely, I’ll wait until some of the experienced people move into the pose and then copy what they did. Even worse, it’s discouraging. It reminds me that I’m still not part of this community.

The best way to use jargon is when it is a shared vocabulary. If you use it to someone who doesn’t understand the word, it should be to teach them. First, teach the concept. After all, that was the benefit of jargon – it’s shorthand for a complicated or niche topic. Then induct people into the knowledge guild by teaching them a word that summarizes the concept.

In statistics, Nassim Taleb found a large audience by explaining complicated topics simply. Instead of spending lots of time deriving the moments of a power law distribution (see how confusing that was?), he used relatable examples. He talks about the distribution of wealth or severe earth quakes. There are no equations, but there are lots of easy to understand charts and examples.

Why do people use jargon? There are three common cases that I see. (1) The first is when the person forgets what it’s like to be a beginner in the topic. Highly technical fields often do this to their experts. (2) The second is when the person is trying to communicate they are in the know. They might be self conscious about recently learning the concept. So they use jargon a lot thinking it gives them credibility. Masters of a topic are secure enough to not rely on jargon to convey the depth of their knowledge. (3) The final category is people who just thoughtlessly use the term. They aren’t trying to be impressive. I wouldn’t even say they are being lazy. Once you learn a good summary word, it can be hard unlearn it.

Brad Feld does a fantastic job of introducing people to the concepts and vocabulary of startup finance through his blog and book. Another great example is the “Tech Tuesday” series that Albert Wegner did on his blog, Continuations.com. Both of them teach concepts to novices. After explaining the concepts, they offer a vocabulary. They are inducting people into the knowledge guild not showing off that they are already in it.

I try to only use technical terms when they are a shared vocabulary. If the other person doesn’t share the vocabulary, it’s just jargon. If they have the time and interest in learning then I try to teach the concept, how it impacts them and, only then, the word to summarize it. The most important thing is to never use jargon to build credibility. The best way to build credibility is to respect the current vocabulary of your audience. If you teach them a useful concept, they will respect you more than if you use words they don’t understand.