Data Engineer
- Alibaba Group
- Full-time
- Tehran
Job Description
We are looking for a skilled Data Engineer to join our team (one of Alibaba group business) and play a key role in building the data infrastructure for our AI-powered Legal Assistant. The ideal candidate will have strong expertise in data pipelines, preprocessing, and preparing high-quality training datasets for language models (LLMs).
You will work closely with AI engineers, MLOps, and product teams to ensure that our legal AI models are trained on clean, structured, and reliable data.
key Responsibilities:
- Design, build, and maintain scalable ETL/ELT pipelines for legal and judicial data.
- Collect, clean, normalize, and preprocess large volumes of unstructured text data.
- Prepare and manage training datasets for NLP and LLM models.
- Collaborate with AI Engineers to fine-tune models using annotated datasets.
- Implement automated data quality checks and validation processes.
- Manage databases, data storage, and optimize data access for ML pipelines.
- Ensure compliance with data privacy and security standards.
Requirements:
- Strong programming skills in Python (Pandas, PySpark, etc.).
- Experience with data pipeline frameworks (Airflow, Luigi, Prefect).
- Hands-on experience with databases (SQL/NoSQL, PostgreSQL, MongoDB, ElasticSearch).
- Familiarity with Spark NLP or other large-scale NLP frameworks.
- Solid understanding of text data preprocessing (tokenization, normalization, cleaning).
- Familiarity with NLP datasets and annotation workflows.
- Knowledge of data security and handling sensitive information.
فرآیند استخدام در علیبابا
هر فرصت شغلی، یک فرصت ماجراجویی
- ارسال رزومه
- بررسی رزومه
- ارزیابی تخصصی
- ارزیابی منابعانسانی
- گفت و گو با مدیر ارشد
- ارزیابی نهایی
- دعوت به همکاری
- شروع ماجراجویی