github: github.com/TomMonkeyMan fediverse:TomMonkeyMan@chinese.lol
EDUCATION
2015.8-2018.7 Tsinghua University Master of Science
– Major: Electrical Engineering Automation GPA: 3.7/4 (94/100, top 5%)
– Awards: First-grade scholarship of Tsinghua University (2015)
2011.8-2015.7 Tsinghua University Bachelor of Science
– Major: Electrical Engineering Automation GPA: 83.2/100
– Awards: Academic Scholarship of Tsinghua University (2012) /Fourth award for PLC Contest
WORKING EXPERIENCE
2021.4-present Tesla Senior Data Engineer – Vehicle Software
– Managing and evolving data pipeline development of CN market fleets while adhering to Chinese data policy.
– Vehicle diagnostic service development, building diagnostics services&system for engineers with high availability to support engineering service.
– Developed LLM-based projects, including prompt engineering for text-to-SQL, a document search and Q&A bot, and fine-tuning Llama3-8B on Tesla-specific data.
2020.6-2021.4 Baidu Inc. Technical Product Manager – Feeds Ads
– In charge of feed ads’ searching system strategy designing and optimizing, including advertising bidding mechanism, targeting strategy, and improving internet traffic monetization efficiency.
– Managed projects and requirements through cross sector cooperation.
2018.7-2020.6 IBM Data Scientist – Cognitive&Analytics
– Participated within IBM’s data platform services in China, leading client solution proposal and complex project implementation of Big Data and Analytics.
MAIN EXPERIENCE
- Micro-service Development
- API Development: Internal tools (API/website) for service RD on vehicle diagnostics with Python flask/gunicorn/golang.
- Frontend: Developed customized dashboard applications via bokeh serve/panel components with front-end (React).
- Data Pipeline Development
- AWS based data development: Managed AWS EMR/EKS/Glue service on aws for data consuming and processing, building data stream and pipelines via pyspark/sql and scala.
- ETL/Streaming: Built the multiple production ETL with Apache Airflow/Rundeck/cron. Transfer data from multiple DBs to data warehouses/OLAP. Develop kafka consumers for streaming processing, familiar with alternative queues like RabbitMQ.
- DB management: Abundant experience using RDBs like MySQL/PostgreSQL/SQLServer with performance optimization and migration. NOSQL experience with Hive/HBase/Presto. Additionally, utilized MongoDB and Redis for caching, Neo4j and TigerGraph for graph databases, and Greenplum for data warehousing.
- Data modeling
- Developed a Rust on spark-k8s framework for doing thermodynamic modeling for PCS mission profiles on Tesla CyberTrunk.
- CN tesla supercharging time estimation model validation on 2-D algorithms & supercharger availability profile prediction model on CN vehicles.
- LLM
- Prompt Engineering for Text-to-SQL: Developed a system using Baichuan-13B on AWS SageMaker to create and encapsulate prompts for generating SQL queries dynamically, enhancing database interaction efficiency.
- Document Search and Q&A Bot: Created a bot using RAG with LlamaIndex and Llama3-8B on AWS EC2 to improve access to Tesla’s internal operation manuals, utilizing Streamlit for the front-end interface.
- Fine-tuning Llama3-8B: Currently fine-tuning Llama3-8B on Tesla engineering-related corpus using PEFT framework, aiming to enhance the model’s performance on specialized engineering tasks.
- Devops
- K8s: Massive deployment on docker with docker compose/swam. K8s deployment via Helm, Argocd&git action.
- System Monitoring: R&D the Prometheus + Grafana + Sidecars as alerting solution. Integrated Prometheus clients to applications with multiple languages python/go/rust to expose useful metrics.
- Crash Monitoring: Consuming system crash/panic logs to Sentry/Splunk via python/go to track issues to Sentry with Jira integration for system stack trace debug&QA.
- Open-Source project: Lemmy (Fediverse) https://github.com/LemmyNet/lemmy
- Owner of Lemmy instance https://chinese.lol
- Chinese translation for Lemmy community