Senior Data Scientist

Raleigh, North Carolina, United States Full-time Allows remote

Who we are

TOTVS Labs is looking for a Senior Data Scientist to work on Carol - our Data & Machine Learning Platform (

Carol is already being used by several companies in different industries, e.g., Education, Manufacturing, Retail, Healthcare, and Agribusiness.


As a Senior Data Scientist at TOTVS Labs you will

  • Identity relevant problems that can be solved by using Machine Learning
  • Design and implement Machine Learning models
  • Research, explore and develop new algorithms
  • Unlock the potential of the immense stream of data that goes through Latin America's largest software company
  • Help us to evolve our Machine Learning Platform - Carol.

Ideal Candidate

  • Degree in a technical major e.g., Computer Science, Engineering, Math, Physics or in other relevant quantitative fields
  • Strong knowledge about Statistics and Machine Learning
  • Strong knowledge and working experience with Python
  • Intellectually curious, collaborative, self-motivated and a fast learner that’s comfortable working in a dynamic environment tackling challenging problems.


  • Proficiency with other programming languages such as Scala and Java
  • Experience with Data Science tools
  • Experience with Machine Learning platforms.


Important: If you believe you have what it takes to join us, please complete and submit the Data Challenge along with your application:

Data Challenge

The purpose of this challenge is to let you demonstrate the way you think and work. You shouldn’t spend more than 8 hours to complete the exercise.

The dataset we are providing contains the orders made by customers in one of our applications. Here’s the description of each column:

  • customer_code: unique id of a customer;
  • branch_id: the branch id where this order was made;
  • sales_channel: the sales channel this order was made;
  • seller_code: seller that made this order;
  • register_date: date of the order;
  • total_price: total price of the order (sum of all items);
  • order_id: id of this order;
  • quantity: quantity of items, given by item_code, were bought;
  • item_total_price: total price of items, i.e., quantity* price;
  • unit_price: unit price of this item;
  • group_code: which group this customer belongs;
  • segment_code: segment this client belongs;
  • is_churn: if this client is set as a churn.

Question 1 (10 Points)

List as many use cases for the dataset as possible.

Question 2 (10 Points)

Pick one of the use cases you listed in question 1 and describe how building a statistical model based on the dataset could best be used to improve the business this data comes from.

Question 3 (20 Points)

Implement the model you described in question 2, preferably in Python. The code has to retrieve the data, train and test a statistical model, and report relevant performance criteria. Ideally, we should be able to replicate your analysis from your submitted source-code, so please explicit the versions of the tools and packages you are using.

Question 4 (60 Points)

  1. Explain each and every of your design choices, you can use jupyter notebooks. (e.g., preprocessing, model selection, hyper parameters, evaluation criteria). Compare and contrast your choices with alternative methodologies.
  2. Describe how you would improve the model in Question 3 if you had more time.

You can find the dataset for this challenge in the following url: