duarte harris
Cart 0
duarte harris
Lead Data Scientist with a background in Philosophy and Pol.Sci.

Hi. I’m Duarte 👋

An influential Data Scientist with a record of success leading large-scale data initiatives and delivering results for Governments & Financial companies. I'm passionate about leveraging data to solve business problems and mentoring young Data Scientists, while coordinating and collaborating with cross-functional teams to take projects from inception to deployment.

I’ve mostly focused on Natural Language Processing, Generation and Understanding (NLP, NLG, & NLU), lately I’ve been working more on Computer Vision, and I’m extremely curious about Reinforcement Learning (RL), GANs (Generative Adversarial Networks), Neural Evolution, and the mathematics of consciousness.

Online presence

 
 

Things I do

 

Data Analysis

I collect, clean and analyse data in order to extract insights and make informed decisions.

Machine/Deep Learning

I conceive, develop and train reliable models for a myriad of tasks, such as prediction, NLP, CV, and more.

 

Data Visualization

I believe all data should be beautiful, clear and understandable, and work to make it so.

 

 (some of) My story

I was born and raised in Lisbon, Portugal, before moving to Bristol, England for college, where I studied Philosophy. It snowed a little.

Back in Lisbon I studied Political Science while embarking in a myriad of jobs, from cleaning air conditioners to managing an SME. It was fine, but it wasn’t love.

I spent the next 6 years working as a writer while trying to figure out what I love. After much contemplation — and a few drinks — I had gotten nowhere.

Eventually my girlfriend took me on a trip through Scotland, where I stumbled on a book about Data Science. I was fascinated and have been studying and practicing it ever since.

Sometimes love is hard to find. Sometimes is waiting for us in a book tucked away in Scotland.

 

 What I’m Currently Working on…

September 2022

I’ve been working by myself on an algorithm to collect and analyse statistical data on padel while “watching” matches. I confess I underestimated the complexity of the problem when I began, thinking it would be much simpler than what it revealed itself to be.

That said, after successfully negotiating with some freelancers to collect data to train a model on, designing, training and testing several algorithms, I’ve been able to develop an architecture that shows promise identifying and distinguishing between 8 different padel strokes, and which I’m now iterating over to improve it.


November 2021

Since last April I’ve become Machine Learning Team Leader at Axians, leading a small team of young Data Scientists within the Intelligent Customer Services Unit. In this time we’ve developed a FAQs Retrieval/Suggestion System, and done some experiments building a Natural Language Search Engine.

This experience has allowed me to rekindle a passion for leading a team, while coordinating and collaborating with cross-functional teams to take a project from inception to deployment.

It was particularly interesting having to negotiate an aggressive deadline, all the while delivering everything on time and within the predefined budget, AND keeping all the stakeholders happy. :)


April 2021

Since September 2020 I’ve been working as a Data Scientist at Axians. Axians is an IT Consultancy, specialised in Government, Finance, and Energy sectors.

As part as the Intelligent Customer Services Unit, I develop models that facilitate the the relations between companies or governments, with their clients, users or citizens.

These models include a multi-class, semi-hierarchical Text Classification algorithm with over 600 classes, reducing errors in responses and in forwarding enquiries from the Portuguese citizens to the eBalcao system, while speeding up response times, saving thousands of hours of error-induced-work, making the whole process more agile and more pleasant for all involved.

For the same project, I’ve developed a semi-agnostic Automatic Keyword Extraction algorithm inspired by Google’s PageRank. Semi-agnostic for there was no labeled dataset to learn from, meaning that it would have to be completely independent from the content it would encounter. As long as the text is in Portuguese, by using Graph theory and Part-Of-Speach (POS) Tagging, this TextRank algorithm is able to identify the most relevant words within any text.

I’ve also picked up the project of a former colleague on Intelligent Document Automation, refactoring the code, adapting the OCR to also identify handwritten text, and created the algorithm to calculate the distance between the entities within the documents. In this project, the aim was to be able to automatically read digitised cheques, invoices, receipts, bank transfer slips, and identify certain identities (such as the issuer of the cheque, or the company that emitted a receipt), and its position on the document.

I’ve also been mentoring a Data Scientist trainee, and developed the syllabus for an Introduction to Machine Learning Course to be taught within the company to new hires, and Fundamentals of Natural Language Processing.

 

Get in touch

Please complete the form below