Senior Data Scientist

São Paulo, State of São Paulo, Brazil Full-time Allows remote

Who we are

TOTVS Labs is looking for a Senior Data Scientist to work on Carol - our Data & Machine Learning Platform (

Carol is already being used by several companies in different industries, e.g., Education, Manufacturing, Retail, Healthcare, and Agribusiness.


As a Data Scientist at TOTVS Labs you will

  • Identity relevant problems that can be solved by using Machine Learning
  • Design and implement Machine Learning models
  • Research, explore and develop new algorithms
  • Unlock the potential of the immense stream of data that goes through Latin America's largest software company
  • Help us to democratize access to Artificial Intelligence to companies of all sizes from different industries.

Ideal Candidate

  • Degree in a technical major e.g., Computer Science, Engineering, Math, Physics or in other relevant quantitative fields
  • Strong knowledge about Statistics and Machine Learning
  • Strong knowledge and working experience with Python
  • Intellectually curious, collaborative, self-motivated and a fast learner that’s comfortable working in a dynamic environment tackling challenging problems.


  • Proficiency with other programming languages such as Scala and Java
  • Experience with Data Science tools
  • Experience with Machine Learning platforms

You will be based either in our offices in Sao Paulo, SP Brazil or in Mountain View, CA USA

Important: If you believe you have what it takes to join us, please complete and submit the Data Challenge along with your application:

Data Challenge

The purpose of this challenge is to let you demonstrate the way you think and work. The dataset we are providing contains the orders made by customers in one of our applications. Here’s the description of each column:

  • customer_code: unique id of a customer;
  • branch_id: the branch id where this order was made;
  • sales_channel: the sales channel this order was made;
  • seller_code: seller that made this order;
  • register_date: date of the order;
  • total_price: total price of the order (sum of all items);
  • order_id: id of this order;
  • quantity: quantity of items, given by item_code, were bought;
  • item_total_price: total price of items, i.e., quantity* price;
  • unit_price: unit price of this item;
  • group_code: which group this customer belongs;
  • segment_code: segment this client belongs;
  • is_churn: True, if we believe the client will not come back. (for a `customer_code` this values is always the same)

There are many possible use cases for this data, e.g., product recommendation, churn analysis, sale forecasting, etc. Pick one use case and build a model for this case. We want to test your skills in data manipulation, cleaning, and predictive modeling, so, please explain each of your design choices, you can use jupyter notebooks for all your exploration and modeling. (e.g., EDA, preprocessing, model selection, hyperparameters, evaluation criteria, etc. )

You shouldn’t spend more than 5 hours to complete the exercise.

You can find the dataset for this challenge description and the data set in the following URL:

PS: Ideally, we should be able to replicate your analysis from your submitted source-code, so please explicit the versions of the tools and packages you are using.

Apply for this opening at