As a data scientist, you will be required to develop robust privacy and risk management software, by performing core research and literature reviews, implementing and evaluating prototypes and algorithms for anonymity, encryption and risk assessment. You will also contribute to building predictive analytics and machine learning applications for Truata customers using their anonymized data. You will be a member of a highly skilled cross-disciplinary technical team designing and building a platform to enable enterprises to perform analytics on their customer data in a GDPR-compliant way.
This position reports into Trūata’s Chief Data Scientist.
• Core research on critical topics relating to anonymity, encryption, privacy and risk management, i.e. differential privacy, homomorphic encryption, etc.
• Understand business objectives and customer requirements
• Development of literature reviews / market landscape reports
• Development of tools to enable automated analysis of data risk reports.
• Identification of risk assessment tests and configurations based on internal analysis and state of the art reviews.
• Development of algorithms to implement required privacy enhancement controls
• Ongoing evaluation of data privacy controls for each customer, identification of area for concern
• Development of sample data for testing and evaluation
• Development of machine learning and predictive analytics models using anonymised customer data to support their key business requirements.
• Large-scale evaluation of effectiveness of Truata anonymisation techniques and machine learning models developed for customers.
• Post-graduate degree (MSc minimum) in a related discipline (computer science, statistics, data analytics)
• Applied knowledge (2+ years) in related programming environment (Python/Jupyter, Scala/Spark, R, etc)
• Excellent analytical and statistical analysis skills.
• Experience with manipulating and analyzing data using dataframes, pivot tables, visualization tools.
• Experience working with and creating data architectures.
• Practical experience using machine learning and analysis techniques (clustering, regression, decision tree learning, artificial neural networks, etc) and their real-world advantages, drawbacks and ideal usage scenarios.
• Direct experience with the entire data science project lifecycle, including requirements gathering, design, implementation, evaluation and presentation of results.
• Ability to present technical concepts and results clearly to different audiences and stakeholders.
Preferred skills and qualifications:
• Strong SQL skills
• Good working knowledge of Big Data technologies including Spark, Hadoop, Cassandra, Kafka, Redis, Hive, Impala
• Experience with cloud infrastructure providers (AWS/Azure)
• Excellent experience with data partitions and transformation, and in-memory computations (large-scale join / groupby / aggregations)
• Experience designing and building large scale Spark applications and data pipelines, monitoring and optimizing Spark job performance.
• Customer requirements gathering and KPI reporting / presentation of data science project outputs.
• Industry experience in relevant vertical such as financial services, travel / hospitality, telecom, insurance, health
• Experience in data privacy, GDPR compliance and risk management