On behalf of Active Fence, SD Solutions is looking for a talented Big Data Engineer with strong experience in taking machine learning projects from prototype to production. The ideal candidate is comfortable working across the stack — from building robust data pipelines and database architecture, to deploying models in scalable production environments.

SD Solutions is a staffing company operating globally. Contact us to get more details about the benefits we offer.

Responsibilities:

  • Build Scalable Data Infrastructure: Design and set up modern, cloud-based data platforms from the ground up. Ensure they scale from gigabytes to petabytes, and support a wide range of data use cases.
  • Design Clean Data Models: Work with stakeholders to turn business needs into well-structured, performant data models. Use smart partitioning and schema design to keep things fast, organized, and cost-effective.
  • Lead Databricks Projects: Manage all things Databricks: Delta Lake tables, Unity Catalog, cluster setup, and access controls. Ensure everything runs smoothly, securely, and meets governance requirements.
  • Build Batch ETL Pipelines: Develop high-performance batch ETL workflows using PySpark in Databricks. Focus on reliability, scalability, and maintainability.
  • Publish and Maintain Data Products: Make datasets easy to find, understand, and use. Provide clear documentation and build APIs or libraries to enable self-service access.
  • Ensure Quality, Monitoring & Cost Control: Use CI/CD, testing, alerts, and cost monitoring tools to keep pipelines running reliably and efficiently.

Requirements:

  • 5+ years of experience in data engineering, with at least 2 years working hands-on with Databricks and PySpark
  • Delivered multiple end-to-end data platform projects, especially from scratch
  • Comfortable mentoring team members and guiding architectural decisions
  • Databricks & Delta Lake: Hands-on experience with Unity Catalog, Delta Live Tables, cluster management, and job orchestration
  • PySpark: Strong skills in batch data processing, performance tuning, and troubleshooting
  • Data Storage & Modeling: Familiarity with PostgreSQL, Parquet, MongoDB, and lakehouse architecture
  • DevOps & Orchestration: Experience with Git-based workflows, Airflow or Databricks Workflows, Infrastructure-as-Code tools like Terraform or AWS CDK, and container basics (Docker/Kubernetes)
  • Programming: Proficient in Python and SQL
  • Product-oriented – you think of datasets as products with clear users and goals
  • Clear communicator – you can explain your decisions and ideas to any audience
  • Self-driven – you take ownership and work well with minimal oversight
  • Curious and up-to-date – you stay current on new tools and best practices in data and cloud
  • Experience with LLM and RAG workflows (e.g. LangChain, LlamaIndex, vector databases) is a plus
  • API development using FastAPI or Flask would be nice to have
  • Familiarity with search technologies like Elasticsearch or OpenSearch is a plus
  • Experience with MLOps tools like Feast or Databricks Feature Store is a nice addition

About the company:

At ActiveFence, we're dedicated to making the internet safer. Our fight against online threats is powered by data, advanced algorithms, and a talented team of engineers. As online threats grow in number and complexity, we're rapidly scaling our operations to meet these challenges. If you thrive on diverse challenges, can move quickly, and are eager to explore new frontiers with your code, you could be a great fit for our team.

By applying for this position, you agree to the terms outlined in our Privacy Policy. Please take a moment to review our Privacy Policy https://sd-solutions.breezy.hr/privacy-notice, and make sure you understand its contents. If you have any questions or concerns regarding our Privacy Policy, please feel free to contact us.