Curriculum Vitae
Antoine Lucas
Research Engineer · ML Systems & Scientific AI
Pharma · Cosmetics · Biotech · Paris, France
Research Engineer & Data Scientist with 5+ years building production-grade ML systems in regulated and scientific environments — pharma, cosmetics, and biotech. Expertise spans deep learning (computer vision, NLP, LLM fine-tuning with LoRA/QLoRA), end-to-end data pipelines, and statistical modelling (Design of Experiments, mixed models, multi-omics analysis). Proven track record translating complex scientific questions into deployed systems: from model development and pipeline architecture through to validated MLOps infrastructure and stakeholder delivery. Strong communicator at the interface of wet-lab science, software engineering, and ML research.
Experience
Sep 2025 – present
- Designed and deployed computer vision models (texture classification, colour analysis) and NLP pipelines (olfactory descriptor mining from scientific literature) for fragrance and cosmetics formulation R&D
- Modelled sensory and physico-chemical properties with partial least squares (PLS) regression, Gaussian processes, and Bayesian optimisation to accelerate ingredient screening cycles
- Built design of experiments (DoE) workflows (mixture designs, response-surface methods) cutting required experiments per formulation cycle by 30–40%
- Engineered machine learning operations (MLOps) infrastructure: drift monitoring, fully reproducible pipelines (Docker + GitHub Actions) in a Good Practices (GxP)-adjacent environment
- Delivered R Shiny dashboards covering sensory scoring, spectral data visualisation, and interactive experiment planning for R&D scientists
- Responsible for data science platform tooling: Azure ML, Posit Connect, Databricks — provisioning, access governance, and lifecycle management
- Owner and maintainer of a 20+ app catalog (R Shiny, Streamlit) used daily by R&D scientists
May – Sep 2025
- Developed reusable internal R packages and Quarto automated reporting templates, cutting analyst turnaround time by ~25% across consulting engagements
- Led internal upskilling programme on tidymodels, causal inference, and reproducible research for 8 statistical consultants
- Scoped and prototyped ML solutions for pre-sales client engagements in pharma and agri-food verticals
- Fine-tuned Microsoft Phi-3-mini-4k-instruct (3.8B) using LoRA (PEFT) on the LIAR dataset for binary claim verification — full pipeline from data preparation to evaluation; +19.5pp accuracy over zero-shot baseline (54.3% → 73.8%), F1 macro 0.72
Dec 2024 – May 2025
- Fast-paced 8-person startup contributing to a software-as-a-service (SaaS) product (FastAPI Python backend, Vue.js front-end), with full individual autonomy
- Delivered end-to-end pipelines in R and Python for tabular and large-scale omics data (genomics, transcriptomics, metabolomics)
- Applied permutational multivariate analysis of variance (PERMANOVA), sparse PLS-DA, and co-occurrence network analysis to characterise microbial interactions and identify key functional drivers
- Integrated analysis outputs into the production SaaS platform via representational state transfer (REST) API
Jul 2023 – Nov 2024
- Architected and delivered a full R Shiny platform (golem) managing pilot manufacturing chains for 100+ GxP clinical projects — 50+ daily active users across 3 departments
- Designed the data model, module architecture, and automated validation report generation to satisfy GxP audit-trail requirements (Installation/Operational/Performance Qualification (IQ/OQ/PQ) on Posit Connect)
- Owned the full delivery lifecycle: user-story workshops → agile sprints → user acceptance testing (UAT) → IQ/OQ documentation → production deployment
- Sole developer on the engineering side, working directly with 5+ senior sponsors across R&D and manufacturing
- Mentored 1 intern on building Shiny application modules
Oct 2021 – Jul 2023
- Delivered 100+ statistical analyses on ingredient efficacy (hydration, anti-ageing biomarkers) and molecular safety (quantitative structure-activity relationship (QSAR), genotoxicity) across cosmetic and dermocosmetic pipelines
- Designed and analysed clinical and semi-clinical studies from protocol writing through regulatory-grade reports — consistent 2-week delivery cycle
- Built internal R tooling (survival analysis, mixed-effects models, automated Rmarkdown reporting) adopted as standard across the biostatistics team
- Mentored 2 junior analysts and 1 intern; co-authored methodology guides on non-inferiority testing and adaptive designs
Education
2021
2021
Technical Skills
Certifications
GitHub Foundations — GitHub (2024)
Scrum Master — Agilbee (2023)
Languages
French (native) · English (fluent, professional)