Daniel Morales_

Maker - Data Scientist - Ruby on Rails Fullstack Developer


2020-12-18 17:55:38 UTC

Why data science competitions are important for startups?

Today, large companies have large I+D budgets that allow them to experiment and be at the cutting edge of new technologies; always adopting the newest, trying to adapt it to their own needs, trying to find the hidden value in each of them. 

It is natural that not always a new technology is adapted to the needs of a particular company, however, with the process of I+D companies have an “innovation lab” where it is allowed to fail, and where amazing things also happen. 

In recent years, the technology that has gained millions of followers (companies and engineers), is Artificial Intelligence, or Machine Learning to be specific. These large companies have been able to find use cases that allow them to optimize all types of internal operations, lower costs and/or increase sales, income and profitability. 

But where have the startups been in this race to exploit and apply new use cases using this technology?

Unfortunately not all startups have the financial capacity to experiment with new technologies, neither by outsourcing nor by hiring talent internally. 

In fact, many startup founders see the possibility of using this technology as far away as possible. Even worse, we have found that many founders have no idea what machine learning can do for their startups. 

And it's worrying because this technology can give them significant advantages over the competition and/or increase their key differentiation. 

There is talk that data is the new gold, and it makes sense as long as startups know how to turn that data into gold. 

Finally, when we talk about Machine Learning, we are talking about data processed in the right way, which allows the generation of "artificial intelligence" and making predictions or classifications with that data. 

However there is a whole process behind to extract the data correctly, clean it, organize it, have it ready, take it to a machine learning model, experiment with the model (or models) and finally get scores and upload it to production. 

This is not an easy process, and this is where startups look remotely to those who have resources to take advantage of it. If only we could democratize the way we use this technology we could generate more value for all stakeholders, including users obviously. But how do we democratize access to this technology and the solutions that can be created?

Democratizing data science competitions

Some time ago, several platforms were born in which large tech companies from Silicon Valley and even big corps try to solve really complex problems with the help of external people, by using something called "competitions in data science". 

This was because the internal talent couldn’t solve these problems for many reasons, such as they didn't have the time, or skills. Obviously, these were really complex problems.

These data science competition platforms allow the company to access a global talent pool of data science specialists ranging from PHDs to self-taught people who were launched into the adventure of solving the challenge posted by the company sponsoring a competition. 

The prizes are obviously exorbitant, with Netflix even paying $1 million for a machine learning solution. 

The prizes on these platforms range from $10,000 USD to $100,000 USD on average. A privilege that only big tech companies (or big corps) can afford. 

And the startups? Well, if you have raised a Series B or C, you may be able to afford to sponsor a $10,000 USD competition, or even have an in-house team of data scientists to help you explore, experiment or solve problems with machine learning. 

But what about startups that are at an earlier stage and don't have these funds or that internal talent? Or what about the ones that are bootstrapping? This is where an idea came up to help startups in this situation. We have decided to rethink competitions in data science

Rethinking competitions in data science

Our approach is to democratize data science competitions. We realized that other data science competition platforms are focused on very large companies, very high prizes and very complex problems. 

This translates into competitions that can only be sponsored for big tech companies with deep pockets, competitions that take months to complete, and that are made for data scientists and "super-senior" teams. 

After all, sponsoring a $50,000 USD (or 1 million USD) competition is not for every type of company. 

That is why we decided to rethink the way data science competitions are built and decided to focus on startups of any size and from anywhere of the world, that can pay for competitions from $499 USD, that do not take so long to be resolved (4 weeks), that can launch more than one, two or three competitions (because they can afford it) and in which all kinds of talent in data science, of any level and from anywhere in the world can compete.

What if I hire a team of data scientists instead of sponsoring competitions?

In this case, you obviously have to think about how many people you will need to hire, do the hiring process and obviously pay a considerable salary to that talent (since it is worth it!). 

But if you do this without having experimented with the first machine learning model solutions, you will be taking a leap of faith without knowing if you will actually need this technology, if you will really be able to take advantage of it and if it will definitely generate value for your startup. 

Perhaps hiring someone makes more sense if you have already experimented a little with the technology and know what you want and what you can do. 

In case you haven't tested it, (and don't know what can be built for your particular startup and use case), competitions are the best option for you to get the most out of it. 

In fact on our platform there are more than 1,400 data scientists, who will be competing to solve your problem, it means that you will have 1,400 people working for you! We are not talking about 2 or 3 people, we are talking about thousands!

But the greatest benefit you can get from a competition is ultimately the machine learning algorithms as solutions. (see paragraph: "Let's make an example). 

Why sponsor a competition if I already have data scientists on my team?

Having talent in data science is great news for you, you must have already experimented with the technology, have solutions deployed and notice the great advantages of such technology.

However, you aren't looking at the big picture, because a data science team by itself might not have found the best solution to a problem. 

Let's say that for the XYZ problem your team achieved a score of 0.75 (out of 1). The right questions to ask yourself are:
  • Is it the best possible solution? 
  • Is it the most optimal solution? 
  • Is it the best algorithm?
  • Are there more algorithms that my team is not exploring?
  • Is it the highest score someone can get? 
  • What if someone has a score of 0.89 for the exact same problem?
  • And the most important of all: Is there anyone out there, anywhere in the world, who can achieve a higher score? 

In short, you are having an opportunity cost. You aren't looking at the big picture. 

That difference in scores (0.75 vs. 0.89) may seem small, but it could mean thousands of dollars in savings, or in income (depending on the problem). 

Or as many data scientists might say: "It would be a life-or-death difference". Just think about this: What if this model is to predict whether someone has a rare disease or not? It would be a life-or-death difference, on which medical treatment could depend. 

This is why competitions are the best option for you to get the most out of it. You keep your data scientists team, but you experiment with competitions and compare results. Or they simply focus on solving other problems. 

But the greatest benefit you can get from a competition is ultimately the machine learning models as solutions. Let's make an example

Let's make an example

Let's say you have found a data science problem, which is framed in a prediction problem (with medium difficulty), and for which you expect a final score of the model between 0 and 1, being 1 a perfect prediction and 0 a very poor prediction. 

If you had, say, 1 full time data scientist in your startup, working to solve that problem, it would possibly take 1 month to solve it and create a machine learning model, and at the end you could have a score of, say: 0.71. 

0.71 is a good score, after all the maximum score is 1, you are at 71% of accuracy in the solution of the problem. Now the question is: 

How much did it cost your startup to reach that solution? 
  • Answer: The monthly wage of the data scientist! (let’s say $100.000 annual wage, it means $8.333 USD for the solution, that’s a lot!). Other associated costs are the opportunity cost of knowing if there are better solutions and better scores for that same problem! Could there be solutions that reach a score of more than 0.8? Probably yes! But the startup will never know, because if their team member does not keep working on optimizing it, that would be the maximum score, it has no way to compare the final result! This same example applies if you have a more robust team. You'll be limited by the size of your team.

Let's suppose now that you have decided to sponsor a competition and pay the winners $999 USD in cash prizes. 

At the end of the competition, which lasts 8 weeks, you will get 20 machine learning models (the top 20 on the leaderboard), and let's say that the solutions that the competitors came up with, are in a range of: 0.58 to 0.86. This means that the winner of the competition got a score of 0.86 and the 20th competitor got a score of 0.58. 

If we compare the results, we will see that the winner of the competition was well above the single score that your in-house data scientist got (0.86 vs. 0.71). 

You also got 20 different solutions/models, with different approaches, from which you will be able to learn how to approach the solutions better.

How much did it cost your startup to reach that solution? 
  • Answer: $999USD + 30% fee, and the confidence of knowing that the winning model (for that particular problem) is the best model among more than 1,400 data scientists! There is no opportunity cost. This is open innovation in data science!

This same example applies if you plan to outsource this job to a software house that offers the machine learning service (it's only one company, not several competing to deliver the best model!) or if you try to hire a consulting firm, or if you want to hire a freelance. All of them have the same problem as an in-house data science team: you’ll only get a limited value. 

In short, there is no better way to experiment with this technology than by sponsoring a competition and understanding the results and value they can generate for your startup!

We have a tool that allows you to frame your problem and help you sponsor a competition. Click here.

Thanks for reading!