profile photo

Jonathan Wang

Data Engineer

London, UK

Linkedin

GitHub

Experience

Marshall Wace

Marshall Wace

Data Engineer

Apr 2024 - Present

  • Reduced the latency of the central authentication API by 100x from 5s to 50ms P95 response time. Built an API layer that reconstructed and cached the Active Directory graph with a sharded Redis cluster to reduce recursive search space and remove redundant calls.
  • Built an index rebalance insight parser leveraging generative LLMs. Articles were converted from unstructured HTML to JSON with correctness and schema validation stages. Delivered structured insights to quant teams databases within seconds of article publication.
smarkets

Smarkets

Data Engineer & Scientist

Sep 2022 - Apr 2024

  • Developed an industry-leading recommendation engine. Processed user activity data in an implicit matrix factorisation model; hyperparameter tuning was done using a custom metric suite. A/B testing revealed a 40% increase in engagement compared to unpersonalised events.
  • Implemented an asynchronous Python job using Kafka which loaded archived S3 files without severe I/O bottlenecking for trading backtesting. New system reduced processing time for terabytes of CSV data to 12 minutes per day of data, down from 2 hours previously.
  • Worked closely with internal DevOps team to migrate Rust and Nix pricing engine data systems from outdated infrastructure to a unified architecture. Reduced the engineering maintenance effort by 50% due to unified code architecture.
homex

HomeX

Applied Data Science Intern

Jun 2021 - Sep 2021

  • Built a backend API using Python's FastAPI serving a regression model to 5 other internal services.
  • Built a frontend BI data visualisation dashboard using TypeScript and React with interactive street maps and tables.

Education

Cambridge

Univeristy of Cambridge

M.ENG (Hons), Computer and Information Engineering

Oct 2018 - Jun 2022

  • Graduated Honours with Distinction (1st Class thesis & 1st Class exam results).
  • First Year: 74% ; Second Year: Ungraded (As a result of CoVid-19) ; Third Year: 71% ; Fourth Year: 72.5%.
  • Masters Thesis: Developed a large language model that could respond to hate speech using a novel evaluation suite and custom data. Modified cutting edge models such as Blender Bot & GPT-3 to achieve superior performance. Training was done on HPC clusters.

Skills

Programming

Data

DevOps

Projects

conference

Conference for Truth and Trust Online

Featured Speaker

Oct 2022

  • Gave an on-stage presentation to a 100+ person audience regarding the novel technical contributions to LLM research based on my Master's thesis. (https://truthandtrustonline.com/)