[CRO] - A/B Test Results: Stop bragging. 17 experts speak the truth

Table of Contents

As CRO (Conversion Rate Optimization) gains more and more attention from Marketing professionals and industry leaders alike, there seem to be a trend in communicating results in a way that is misleading, sometimes dishonest and in some cases does more bad than good.

“Changing the color of our CTAs improved our conversion rate by 150%”.
John Smith – CRO expert

CRO is a complex process, that includes much more than A/B tests; even though some seem to only focus on this aspect, creating false expectations and missing out on sharing/communicating all the positive impact CRO can have on an organisation.

Whether you work in-house or in an agency, there are many ways you can actually communicate your work in a positive and constructive way.

Today, 17 CRO professionals from 11 countries around the globe have accepted to share their thoughts on this topic, bringing their wisdom and experience.

Georgi Georgiev, Bulgaria

Georgi Georgiev is a managing owner of digital consultancy agency Web Focus. He has 15 years of experience with online marketing, data analysis & website measurement, statistics and design of business experiments. He is the creator of Analytics-toolkit.com, used by hundreds of web analytics and CRO experts.

Georgi is also the author of the book “Statistical Methods in Online A/B Testing”, several white papers on statistical analysis of A/B tests, as well as the CXL Institute course “Statistics for A/B testing”.

Blog: http://blog.analytics-toolkit.com/
LinkedIn: https://www.linkedin.com/in/geoprofi/
Twitter: https://twitter.com/geor gizgeorgiev

There are three main issues I see with so-called ‘bragging’ among CRO and product growth professionals.

The first one is that often times a single number is reported – the observed percentage lift during an A/B test. Statistics tells us that we should not mistake the observed with the actual and so to it is equally as important to assess and communicate the uncertainty in our data with regard to the actual lift. For example, observing 30% lift with a confidence interval spanning [+1%,+∞) is much less impressive than observing 30% lift with an interval spanning [+15%,+∞). Using proper confidence intervals for percentage change instead of naive transformations from intervals for absolute difference gets a big thumbs up here. Note that communicating the interval reveals practically no additional information about the business or the test in question, so if you are allowed to share the observed lift, you should be able to share the CI as well.

The second issue in many of the cases with really impressive A/B test results is that their sample size is really, really small. At least when it is disclosed. Most of the time we are left to wonder how many people actually participated in the test. This leads into a related, though separate topic – whether the sample was representative enough. With very short-lived tests, there is significant potential for various biases in favour of the tested variant to affect the sample causing atypical results. Atypical in the sense that they stem from the sample being non-representative and so any outcomes do not predict actual performance outside of the test.

Including both the duration and the sample size in the case study would be ideal, though not always possible as they can potentially reveal more about the underlying business and other test parameters. If your case study has a sample size and duration which make its predictive validity questionable for the use case in question, then it is 10 times more questionable if one attempts to generalize for a different business, website, etc. Why should anyone pay attention to it then?

The third issue is known as cherry-picking. Showing off only the most impressive tests is sure to bias potential external clients or internal stakeholders towards having highly unrealistic expectations towards the whole discipline. This is especially true if everyone in the industry does this. It’s like showing the return of only the most successful stocks on the NASDAQ while hiding all the stocks which were tanking or staying flat during the same time. One would be likely to get the wrong impression that investing in the NASDAQ will surely or with high probability lead to super impressive returns.

The way to deal with the issue caused by cherry picked results is to start presenting averages and distributions of results from multiple tests over time, and not only your greatest ‘hits’. It could be for one client or across all clients. This should include non-significant outcomes as well. As an example of how this would change the landscape you can look at this meta analysis of 115 tests from the GoodUI database. Among other things, it showed the mean and the median lift to be just under 4% overall, and 6-7% for statistically significant tests only.

Presenting a distribution of the results will also show those negative or neutral tests that many don’t like to talk about. However, it is crucial that these are discussed in the industry as a way to set realistic expectations for all people involved in it. It will also help combat the erroneous impression that A/B testing is all about ‘wins’ when its utility is in assessing uncertainty and controlling the risk from the introduction of changes.

Elise Maile, United Kingdom

Highly knowledgeable, organised, and passionate. Elise Maile has 15 years experience within the Digital sector including personalisation and optimisation testing.

Having worked in a variety of industries including travel, auctions, e-commerce and the charity sector, she enjoys the fast paced test and learn cycle, analysing data and problem solving, but also utilising her well-honed design and development skills. Being user-focused is a key aspect of her work; encouraging businesses to use quantitative and qualitative data to make the best decision for their own audiences.

LinkedIn: https://www.linkedin.com/in/elise-maile/
Medium: medium.com/@angelfish2222
Web: www.e-maile.co.uk/

For a lot of businesses, numbers are the source of truth and surely 150% uplift is an amazing result, except the data behind those numbers can tell a completely different story. Making a sale relies on snappy sound bites, but that means that the nuances of a test are left out; the number of sessions, the length of time, even the time of year all offer further context that can help dispel myths when it comes to the true impact of testing.

Perhaps there are a few reasons why the details are never shared; limited attention span being one, but also NDA clauses. In which case, perhaps more CRO companies can share the stories behind their findings instead. The benefit to this is that it would prevent the need to share sensitive information such as financials, but it could also bring awareness to the other aspects of CRO that are often ignored in favour of impressive, but fudged, numbers. Qualitative data research can be just as effective, if not more so, in increasing conversions as understanding and improving the user experience tends to lead to higher customer loyalty and more long-term spending.

I wish more CRO companies would share the stories that their results are telling them, as opposed to some of the numbers. An almost endless list of variables can swing experimentation results; by not including those as part of a sales pitch, we’re damaging our own reputations.

I don’t have a degree in math. But it shouldn’t take a degree in math to realise that the majority of published CRO results don’t add up.

Jonas Moe, Norway

Jonas has hands-on experience in leading growth & CRO activities for his clients across industries. His expertise in the field of online optimization has made him a highly trusted advisor to some of the largest brands in Norway.

Throughout his career he has helped scale-ups, SMBs and enterprise-level businesses succeed online.

He is currently Head of Growth in INEVO, a digital marketing agency located in Norway.

LinkedIn: https://www.linkedin.com/in/growthmarketingspecialist/
Agency: https://inevo.no/

This is an issue that seems to be overrepresented in the field of Conversion Rate Optimization.

The most common reason for this is simply a lack of knowledge, experience, and not having a basic understanding of the statistics of running an experiment. Another reason why this misinformation is so widely spread and consumed, is that people love sensationalism.

If you want to catch someone’s attention whether it’s a reader, a potential client or your own boss, a lot of inexperienced CROs would gladly report a huge uplift that’s poorly documented rather than a small uplift that’s well documented, and this is mainly an issue regarding culture, biases and egos.

A-players don’t have their eyes fixed on a massive uplift as the only viable end result. A-players focus on the road getting there and learning along the way.

The ones chasing those 300% uplifts are bound to fail as they will resort to hacks and cheap tactics (which hardly ever works in the long run) and forget all about the process. A process is key to improvement over time as it’s repeatable and improvable.

I would much rather have my teams improve consistently over time, rather than hitting a couple of spikes every now and then out of pure luck. Spikes don’t last, consistency does.

Andra Baragan, Romania

I am an experienced conversion rate optimization specialist that helps mid-sized eCommerce businesses scale and grow.

Through the agency I manage, Ontrack Digital, we deliver highly specialized conversion optimization programs based on qualitative and quantitative research performed on-site and continuous experimentation.

We have helped drive significant ROI gains for numerous eCommerce businesses through our data-driven conversion optimization programs.

LinkedIn: https://www.linkedin.com/in/andra-baragan/
Blog: https://www.ontrack.agency/blog

The worst part about all the massive uplifts that conversion specialists are posting is the fact that they set unrealistic expectations. Your client or your boss will see one of these case studies and then ask you for the same results by using the same techniques.

I fully understand that clients are looking for previous success numbers when selecting a specialist and we also use case studies and show them off on our site. However, I always try to let our prospects know that, in the end, it all comes down to the relative uplift for their specific business and that 1% for one business can mean more than 500% for another business.

Amrdeep Singh Athwal, United Kingdom

About me – I am optimisation consultant with a strong focus on digital optimisation through data and user behaviour. It is only by combining data with user behaviour and building a rapport with our users in a truly empathic way that we can hope to truly drive a business forwards by fulfilling the needs of a user.

LinkedIn: https://www.linkedin.com/in/amrdeepathwal/
Web: www.conversionsmatter.com

So the most egregious example of misreporting I have seen was where someone mentioned a 1000% uplift. When I dug deeper I found the original conversion rate was 0.2% and they had taken it to 2.5%. So while technically true without the context how can I judge the efficacy.

In this instance it sounds like they took a crap page and made it ok I would have been much more impressed if they took a page converting at 2% to 5%.

Reasons why we can not always be more forthcoming are mainly around NDA, not wanting to be known as a blabbermouth or worries about the competition. The big boys like Amazon don’t worry about others having their data they happily give it a way.

A lot of it does come down to bragging or should I say biggin up the CRO field as many think we are just testers of button colours. So saying that we just tested a new feature that will deliver 20 million a year in incremental revenue sounds great. However all an AB test can do is say what the uplift was during the test period. It is very rare that you see the exact uplift predicted in a test weeks or months after a test has launched.

Arnout Hellemans, Netherlands

Arnout Hellemans is an online marketing consultant with a focus to grow and transform businesses; he has been helping large corporations like Achmea, Royal Bank of Schotland (RBS) and LeasePlan as well as blogs (TNW) and grown companies like babysitting platform Sitly. He favourite hobby is making the web a better place by fixing broken websites and experiences through testing and technical fixes.

LinkedIn: https://www.linkedin.com/in/arnouthellemans/
Web: onlinemarkethink.com/

The point I really would like to make is that we should stop talking about percentages of uplift as such, because people don’t remember or understand percentages. Instead of doing that I will first try and figure out what the main objective of the company is (profit, cost savings, marketshare, etc.), then we as optimizers should then start talking and sharing wins in a different way.

In most cases the only thing people really understand is $$$. So if instead of saying we got a 10% uplift say we are making an extra 12.000$ in profit per week.

The question in the company will then be something along these lines; how did you do that?? Now that you have sparked their interest, you can go on and explain all the things you have done, and trust me they will be interested. This is your chance to tell them how you work and how you are making a difference for the company.

Non CRO people are not interested in Conversion Rate or uplifts, not because they don’t care but simply because they just don’t know what it means. Speak to their imagination is probably my biggest tip

Chris Gibbins, United Kingdom

A User Experience, Human Centred Design and Experimentation expert with 19 years of experience working in digital, 8 of which were leading high performing cross-functional teams and departments.

Currently Chief Experience Officer at Creative CX, an insight-led and highly advanced Experimentation Consultancy based in central London.

Over the years Chris has been developing and evangelising customer experience approaches, combining the best of UX research, data analysis, product design and experimentation. Helping all kinds of businesses and organisations to identify and solve their customers’ problems, and make significant long-term improvements to their digital products and services.

LinkedIn: https://www.linkedin.com/in/chris-gibbins/

I’m actually in favour of teams (Product, Marketing, CRO, Experimentation teams) celebrating and sharing their experiment results as long as the experiment itself was run properly and the results presented are accurate and valid. The craft of experimentation needs more good PR.

However, there are definitely a few dangers to watch out for such as:

Only presenting the headline metric when talking about the results, instead of telling the full story, which involves interpreting the many secondary ‘learning’ type metrics.
Everyone else treating these results as gospel and assuming the same design change will work just as well on their website or App. This can lead to lazy and solution-first experimentation or CRO practices. Or even just skipping the testing part altogether.
Becoming too fixated on the end solution and not the work that went into uncovering these opportunities such as user research and data analysis.
Only presenting single winning experiments instead of talking about the previous losing tests or previous iterations. This can give an overly simplified impression of the process and can lead to too much opinion-led, random, or scattergun type testing.

Ben Co, United Kingdom

Ben is a Conversion Rate Optimisation specialist currently employed by N Brown Group PLC. He utilises his expertise in web development, UI/UX design and behavioural psychology to optimise websites from a wide variety of sectors to improve their performance.

LinkedIn: https://www.linkedin.com/in/bencro/
Web: https://benco.dev/

There are a plethora of articles touting case studies with phenomenal uplifts based on dubious data.

These articles are great for raising awareness of CRO and generating leads but they misrepresent the reality of CRO.

The reality is that some tests fail.

Which is good. Failure is an important part of learning and is integral to the CRO process.

When a test fails we’re given an opportunity to review every aspect of the test and our processes, for example:

Was the hypothesis misinformed?
Was more research needed?
Was the design in line with the hypothesis?
Did the variation’s code contain errors that caused the test to fail?
Did the test address a quirk in your users’ behaviour that’s only applicable during seasonal events?

When a test fails we can learn why it failed and use what we’ve learned to create better, more informed tests.

As an industry, we need to stop bragging about the tests that won and start talking about the tests that failed, why they failed and what we can learn to produce better tests in the future.

Lee Preston, United Kingdom

Lee Preston is a specialist in CRO with 10 years’ experience helping businesses understand their website visitors and making impactful changes that grow revenue.

Lee works as a CRO Consultant at Worship, a UK-based digital experience and conversion optimisation agency. The agency specialises in helping financial services businesses improve website experience and increase website conversions, including inbound calls, form completions or enquiries, and online purchases.

Lee’s fond of climbing mountains, bouldering, walking his collie, and making strange sounds with his guitar.

LinkedIn: https://www.linkedin.com/in/lee-preston-3a528935/
Web: https://worship.agency/

This kind of stuff can be pretty toxic for our industry and gives potential clients / people new to testing unrealistic expectations of the uplifts A/B tests can generate, and also gives unrealistic expectations on test durations.

Use any decent sample size calculator and you’ll see the minimum detectable uplift you’ll need to see as well as the sample size you’ll need to see that – the numbers are higher than you might expect, and anything under that has greater uncertainty and chances for error.

We see a lot of marketers & CROs get bogged down in the shiny tactics like A/B testing without even considering the bigger picture. Sure, just A/B testing is better than nothing at all, but it runs the risk of lacking a focus around what problems you’re solving for users and can lack longevity.

We’ve seen too many companies A/B testing trending elements like sticky CTAs and button colours rather than taking a step back, doing their research, creating strategies and hypotheses, then testing these hypotheses in a controlled manner.

Remember also that not all changes can be A/B tested, either. There’s bug fixes, there’s sample size limitations for smaller changes meaning they’re not suitable for testing, and there’s changes that clients need to make right away, all of which can be forgotten about if the CRO plan is solely reliant on A/B testing.

These types of smaller changes can make a significant impact on conversion rate too. I speak more about testing with a strategy here: https://worship.agency/conversion-strategy-and-why-you-shouldnt-start-with-tactics

Sometimes we can’t share a positive test result because of an NDA… there’s not much we can really do about that. Also, some just aren’t the types to humble-brag or share things publicly. They’ll share the result with their team / client then move onto the next test.

What’s really interesting is whilst you might hear about winning tests publicly, you’ll barely hear of people sharing their negative results publicly.

If around 70-90% of tests aren’t winners, there’s surprisingly few blogs or LinkedIn posts around the losers, it’s not a universally shared topic in the CRO world just yet and there’s still some stigma around sharing the reality of testing.

Not everyone wants to be seen making a ‘mistake’, although we all are.

In reality, sharing losing tests can help more marketers and CROs realise it’s a normal part of testing, so that consensus shifts towards CRO being more of a learning exercise.

If CROs started sharing headlines like: “What a learning experience, we tested a “no-brainer” and reduced X’s conversions by 20!” then it will bring more clarity to the industry and reduce the stigma of losing, and others may be encouraged to do the same.

Kurt Philip, Thailand

Kurt Philip is the founder and CEO of Convertica, a done-for-you CRO service. Get your free site audit at convertica.org

LinkedIn: https://www.linkedin.com /in/kurt-philip-cro/
Web: https://convertica.org/

What many people don’t tell you is that for every 100% uplift in conversions, there are also many failed experiments. You only read about the magic. But you don’t often hear about the tests that went south.

But here’s where it gets interesting.

These failed tests? They’re precious gems. They’re windows into your customers’ psychology. They give you the insights you need to guide your next conversion optimization campaigns.

How so?

CRO is like a new relationship. At the start, you’re trying to find out what makes the other person tick. You’re figuring out what she wants and doesn’t want. The more you know about these things, the more you’re able to predict how she’s going to react. The longer the relationship goes, the better equipped you are. But these insights only reveal themselves after many failed attempts.

It’s the same with conversion optimization. There may be dozens of failed tests. But these tests reveal what types of things your customers don’t respond to. These insights can then guide you as you go forward with further CRO experiments for that site. You are able to do tests that are tailored to your audience and not just based on best practices.

We hardly ever talk about the tests that failed because most people don’t want to hear about them. They just want to see what has worked on one of our tests and then run right to their own sites to copy it. But what one site’s users respond to may not work on your audience. And when that happens, what do most people do? They stop doing CRO because “it doesn’t work.” When in fact, you were just starting to get to know your customers.

At the end of the day, you’ll have to put up with the rain if you want to see a rainbow.

Javier Lipúzcoa, Spain

Javier works as a CRO specialist at Rankia since 2018 and have been in the industry of optimization for the last 5 years. He haas a background as a product designer and love user centered products, data-informed methodologies and experimentation.

LinkedIn: https://www.linkedin.com/in/javier-lipuzcoa/
Web: https://www.rankia.com

CRO is still being defined, even though frameworks, processes and methodologies have been designed by great experts for years. As the concept evolves some dilemmas have come up, even around its own name (Conversion Rate Optimization): is it an accurate term to define everything that this industry has created around it (from new features validation strategies to customer experience design)?

During these early years of industry growth, one of the ways of gaining visibility has been to show the most striking results of A/B tests. It’s attractive, it’s sexy, it’s tempting. It’s been, and still is, a way of generating sales, creating trust and promoting.In my opinion, it’s all about maturity levels.

On the one hand it has to do with the maturity level of CRO industry and, on the other hand, it has to do with the maturity level of both business and experimentation culture inside the company (or agency).

The more mature the industry, company or agency, the more difficult it gets to show those sexy results without showing, as well, the statistical analysis behind them; and the easier it is to understand that uplifting and predicting conversion rates is hard and goes beyond A/B testing.

Tony Grant, United Arab Emirates

Tony leads the CRO programme for Informa Markets. Driving a culture of experimentation across the division.

With 8+years experience in the CRO industry working with Start-ups, SME’s and FTSE 100 companies.

Tony holds an MSc in Digital Marketing, CRO certified practitioner and is Co-author of digital experience transformation book ‘In Demand In Command‘.

LinkedIn: https://www.linkedin.com/in/tonypgrant /
Web: https://www.indemandincommand.com/tony-grant-conversion-optimisation

‘I tried A/B testing once and it didnt work’.

One of the more common phrases I’ve heard over the years. This is highly likely because you’ve read dozens of case studies, articles and posts on how one A/B test increased conversions/revenue by 400%, then when you try it, nothing happens. If it was that easy to improve by that margin, surely, we’d all do it. Here’s what it really takes.

A/B testing is one aspect of many disciplines within CRO, other skills include neuromarketing, data analysis, data gathering, design etc. Essentially, CRO is evidenced-based marketing. Of which should be part of a wider marketing and business mix. Enabling you to systematically collate, analyse data, and understand better understand your customer.

The benefits of CRO as a methodology will impact wider marketing campaigns and your businesses bottom line. Key benefits include; increased speed to market, by getting your ideas live quicker, you can collate data, start learning and identify what worked, what didn’t and understand the impact on your conversions/revenue or improved customer experience.

Quite often people read about massive companies like Amazon, Google and Apple who have massive resources and how experimentation is rooted in their DNA. I want to give you another example.

A pioneer of ‘marginal gain’ Sir David Brailsford the mastermind behind British Cycling and Team Sky. The principle of ‘marginal gain’ or ‘incremental gains’ is simple. Think of everything that goes into your organisation (in this case – riding a bike) and if you improve each aspect by 1%, combining these marginal gains made you will have gained significant increase and impact overall.

Before training cyclists on a bike, Brailsford taught his team how to wash their hands properly. The philosophy behind this was simple, if cyclists were ill, or didn’t feel 100%, then how can they be the best cyclist they can be. Building on such a detail enabled the team to train harder, longer and more often. When you then add a team of technicians, engineers, trainers, psychologists and nutritionists all working towards a common goal meant the team went on to achieve immense success, and there is no reason an organisation can’t do this too.

Of course, when asked, no one put the success down to ‘washing hands correctly’, however, this attention to detail, testing/experimenting and analysis coupled with years of dedication from the athletes, led to significant results.

The point is if you have tried one test, it failed and think A/B testing doesn’t work, well doing it that way then, of course, it will fail.

What those case studies don’t tell you, is the amount of research, analysis, tests, design, development and post-test analysis over and over for a sustained period, then eventually it may lead a 300% increase from a single A/B Test. It’s far more likely that you’ll achieve small 5% marginal gains, over a period of time.

This doesn’t mean to say it has to be resource-intensive, in fact, it will save a lot of time, money and heartache. All you need to do is look and listen.

Dennis van der Heijden, Spain

Dennis is the Chief Global Happiness at Convert.com

LinkedIn: https://www .linkedin.com/in/dvdheijden/
Web: https://www.convert.com/

We learned a lot from an analysis we did based on 28,304 experiments in the A/B testing software platform we offer to our customers of Convert.com. When in 2019 we looked at these thousands of experiments distributed over our customer base we could draw some conclusions that no other person can draw. One in five (20%) CRO experiments reaches 95% significance, and agencies still get better results.

For those experiments that did achieve statistical significance, only 1 in 7.5 showed a lift of more than 10% in the conversion rate where in-house teams did slightly worse than average: 1 out of every 7.63 experiments (13.1%) achieved a statistically significant conversion rate lift of at least 10% and agencies get 15.84%.

From our research “winning” experiments – defined as all statistically significant experiments that increased the conversion rate – produced an average conversion rate lift of 61%. Experiments with no wins – just learnings – can negatively impact the conversion rate. Those experiments, on average, caused a 26% decrease in the conversion rate. We all love to say that there’s no losing, only “learning,” but it’s important to acknowledge that even learning from non-winning experiments comes at a cost.

Most agencies advertise the winners only, and we and every other tool do prefer to show the winning cases on our website but when you dive deeper you find things on our blog as authentic to share things like this.

Bragging about your win rate just shows how immature your testing program is as real CRO experts know that they will be wrong so many times and that is not hurting their ego, this is showing they test enough to move the organization forward all the time.

The only CRO team with 100% win rate did one test and stopped testing after that to not hurt their win-rate ever after.

Hasnaa Kadaoui, Morroco

Hasnaa is a CRO expert working for one of Morocco’s most trusted agencies.
She as a strong experience in CRO, having worked with banks and telecom companies such as Orange.

LinkedIn: https://www.linkedin.com/in/hasnaa-kadaoui-12215317/
Web: http://www.nssconsulting.ma/

« We can’t just launch an A/B test, then measure up afterwards »

Some of our clients still think this is a proper way to proceed.

Some website owners and managers still think that applying some tricks & tips to make good tests is a good idea, and integrate without analysis what they read on blog posts that talk about “The 10 best A/B test Ideas”. So it’s time to stop linking AB testing to a CRO program, it’s not enough. AB testing is just one aspect of many other processes within an optimization program.

We often notice that there is this gap between the businesses objectives and the metrics to control; in addition, we regularly meet the typical HiPPO (Highest Paid Person’s Opinion) who wants to test according to his feelings or opinions. Unfortunately, ego takes a key role here.

Whichever type of test you run, it is important to have a good process that will improve your chances of success and will earn you more money, so each CRO practitioner should follow the following steps:

First define the gols to achieve and the right KPIs to follow, only certain metrics can be considered as key performance indicators. Including all the data that give a status on the business. If we take for example the average time that a user spends on the site, this is not a monetary measure and will not give us enough visibility.

A second point which is also important is that of doing the Research Analysis, which allows you to understand the real behaviour of your visitors, by asking questions about what they are looking for on the site and if they manage to find them easily (especially surveys and NPS). The Reasearch Analysis allows us to pay attention to what they say and what they experience, and allows us to launch the right tests because it gives us more details on the elements to be tested on the page and gives us a complete idea of their weak points.

Finally, a good Hypothesis of test: Research helps us with creating a solid A/B test Hypothesis. We use the Hypothesis kit of Craig Sullivan:

Because we saw (data / feedback) we expect that (change) will cause (impact) we’ll measure this using (data metric)

It is important to follow this process because once we have specified the right metric to follow and understood the problem, we can easily remove these barriers and frictions from our consumers. Research Analysis is no longer a luxury, it is a necessary step in the process.

This good approach guarantees good test hypotheses and serves as a reference for all your projects. Running good tests is a skill acquired over time, the more you test the more you get better results.

Edouard de Joussineau, Germany

Edouard is the Growth Optimisation Manager at Urban Sports Club.

LinkedIn: https://www.linkedin.com/in/edouarddejoussineau/
Web: https://www.malt.fr/profile/edouarddejoussineau

CRO is hard.

To start with in part because of the level of education required to know what you are doing is insane. To become a doctor or a lawyer you’ll need at least 6 to 10 years of education, and even through you’ll probably not be a good one to start with. CRO professionals need to be trained across a wide knowledge spectrum.

You need an in-depth understanding about User Experience Design, Web Analytics, Copywriting, Psychology and decision-making mechanisms, User Research methodologies, problem solving theories, Scientific hypothesis testing, Business acumen, Web and mobile development, and yes: Statistics. Good luck with that.

We cannot bluntly rely on A/B testing tools results. They can be misleading even though they have recently evolved to cater to certain behaviours, like the fact that people can’t help themselves but look at the results every 5 minutes to make assumptions.

Are we ending A/B tests too early or too late? Should I run this test? What should we consider when running multiple simultaneous tests? How does MVT influence the exponential risk of errors rate for every variable I introduce? These are some of the questions you should be able to answer.

Simply take Statistical Significance. If you have looked up a definition, you have probably either stumbled upon either oversimplified misleading definition or super complex but accurate definitions. There is a reason for the later one. Every definition terms have very specific meanings which enable statisticians to speak a common language and apprehend complex concepts.

Statistical Significance does not tell us the probability that B is better than A, nor is it telling us the probability that we will make a mistake in selecting B over A. Yes there are subtle but fundamental differences, it is not a stopping rule either.

Statistical significance indicates the level of certainty that the observed difference between the control group and the treatments in your sample population is not due to chance, but reflective of the true population. Your statistical significance level reflects your risk tolerance and confidence level.

The discipline of statistics is a Rabbit hole in hypothesis testing, and it won’t be the one leading to Wonderland that I can tell you. It is even more so running experiments online. Get some help, invest in your education and then rely on simple principles. Your test produced inconclusive results after 2 weeks? Kill it and move on, there is an opportunity cost attached to it.

Mikko Piippo, Finland

Mikko Piippo, digital analytics and optimisation consultant and partner at Hopkins. Hopkins is an agency helping clients with ambitious goals to improve their digital marketing. Mikko occasionally blogs about analytics in his personal blog.

LinkedIn: https://www.linkedin.com/in/mikkopiippo/

There is no one best way to communicate anything. Unfortunately, we tend to forget this when working with the client or internal stakeholders.

For effective communication of anything you need to identify your audience, message and the action you want someone to take. Basically there are a few main options: you try to convince someone you or your company are doing a great job, you want to convince someone to act based on your recommendations or you try to teach something for a specialist audience.

Different audiences have different needs. Your client or internal decision-maker rarely wants to hear too many technical or statistical details – they want clear strong recommendations.

When communicating the results for marketing or sales purposes, these are not usually needed either. On the other hand, specialists are not easily convinced without standard statistical (sample size, p value etc.) and business metrics (did it make any real difference).

I have read countless blog posts about A/B tests and heard conference presentations about optimisation of websites and paid acquisition. Most of them are really content marketing published by someone trying to market something, usually an agency’s services. Sometimes, it is a tool or a course.

These blog posts are strongly biased. Most of the time, we hear only about successful single tests and changes. Tests without a statistically significant result are not written about.

The same is true for tests where the control performed significantly better than the treatment version. Thus the published CRO cases are an extremely biased sample of total population (all CRO projects, recommendations and A/B tests).

What is the result of this? Inflated expectations.

People with limited experience start to believe a winner is found in most A/B tests. The same is true for all marketing services, advertising platforms and marketing automation tools. Everyone promises great wins, but most of the time the reality doesn’t match the expectations.

It is difficult to change this. Everyone (agencies, clients, freelancers) has an incentive to publish only success stories and forget the not so successful experiments, advertising campaigns and other marketing activities.

Can we do anything? The least we could do would be to change the focus from single experiments (in CRO) and campaigns (in advertising) to ongoing processes and long-term success. It would be good for clients, agencies and the industry.

Franco Cedano, USA

Franco Cedano is a Growth Advisor and Conversion Optimization Specialist. He has a 11+ years of experience working in tech, with both fortune 500 Companies and angel backed Silicon Valley startups.

Franco is the founder of BambuSix, a Growth and Business Experimentation agency that helps 8 and 9 figure SaaS and e-commerce brands grow through business experimentation.

LinkedIn: https://www.linkedin.com/in/francocedano/
Web: https://bambusix.com/

Look, we all get it, the math behind correctly running a statistically significant AB test can get quite complicated. In order to fully communicate an AB test setup and the corresponding results you need to describe all of the following: statistical significance, statistical power, confidence level, confidence interval, sample size, p-value, alpha, z-score, standard error, null hypothesis, one-tailed or two-tailed t-test. frequentist or bayesian model. If you were to describe all these details to the CEO, their eyes would roll in their head and they’d fall over dizzy from all the jargon you just threw at them.

Obviously it’s important to distill the complexity of AB test results to a bite size nugget any stakeholder can understand. The problem is, on the flip side, most results are communicated with too little information, e.g. “the test was a 10% win”. Although usually done with best intentions, providing only a % improvement can end up being misleading or inaccurate.

In practical, plain english, terms, below are some ways test results are often communicated, and how accurate and appropriate each is.

A. “Conversion rate increased 10%” BAD.

B. “Conversion rate increased 10%, the test is statistically significant at 95%”. Still BAD.

C. “There’s a 95% confidence level that the variation is better than the baseline”. OK.

D. There’s a 95% probability that the variation is better than the baseline. There’s a 90% probability the improvement is in the range -3.3% to +25.5%. BETTER.

E. There’s a 95% probability that the variation is better than the baseline. There’s a 90% probability the improvement is in the range -3.3% to +25.5%. The actual result is most likely at the center of that range, around 10%. BEST

A and B are the most common way Conversion Optimizers communicate results to stakeholders, yet they are both BAD, very very BAD. They can be misleading at best, or turn out to be flat out inaccurate at worst.

D and E are really the only forms in which you should be communicated results. Not just to stakeholders, but amongst other growth team members

Stop being an AB Optimizer, start being a DE Optimizer.

Wrap-up

When I had the idea to get the opinions of CRO professionals on the topic of communicating CRO results, I never expected to get so many positive feedbacks. Yes, those who do practice with passion are indeed willing to share, concerned that their work may sometimes be misunderstood.

From the above opinions, and the ones received privately, we can summarise the following key ideas:

CRO is a complex process encompassing a wide range of activities.
A/B testing is just 1 element that is used to give us some confidence about hypothesis created from prior investigative work.
There is no way to understand A/B test results without a thorough understanding of Statistics.
Although NDAs may prevent us from publishing the full extent of our work, there are ways to communicate effectively and honestly.
CRO is not only about A/B test results, there’s a lot of value in the things we learn along the way, even when we fail.
Failing is just part of the CRO process.
Stop being an AB Optimizer, start being a DE Optimizer.

[CRO] – A/B Test Results: Stop bragging. 17 experts speak the truth