Data Engineer (Contract)
Engagement: Data Strategy & AI Enablement
Location: Remote with access to client's secure environment (VDI-based)
About the Engagement
Meaningful AI is delivering a phased data strategy and AI enablement engagement for a client. The client has 30+ datasets in Databricks. We're building an AI-driven data platform — from data discovery through canonical modeling to an analytics-ready platform.
Phase 1 is already underway, where we are focused on discovery: profiling datasets, documenting schemas, mapping entity relationships, and producing a strategic data roadmap.
Phases 2-4 build on that foundation with ingestion frameworks, workflow integration, and ML pipelines.
What You'll Do
- Profile all 30+ datasets in Databricks: table structures, row counts, data types, distributions, refresh patterns
- Document schemas with inferred relationships and primary/foreign key candidates
- Assess data quality across dimensions: completeness, consistency, accuracy, freshness
- Analyze historical data behavior — determine which datasets use snapshot vs. overwrite patterns
- Support API and integration mapping (test data extraction capabilities)
- Build standardized ingestion framework and data pipelines (Phase 2)
- Implement data quality gates with automated validation and alerting (Phase 2)
- Support workflow integration, feature engineering pipelines, and ML data products (Phases 3-4)
What We're Looking For
Required:
- Strong SQL and Python skills
- Experience with Databricks (notebooks, Spark SQL, Delta Lake)
- Hands-on data profiling, data quality assessment, and technical documentation
- ETL/ELT pipeline development experience
- Comfort working in locked-down enterprise environments with restricted internet access
- Comfort with undocumented, messy data — you'll be making sense of datasets that have limited or no documentation
- Eager to learn AI tooling
Strongly Preferred:
- Financial services, lending, or banking data experience
- Experience with Medallion Architecture (bronze/silver/gold patterns)
- Familiarity with Power BI as a downstream consumer
4
- Experience working within VDI-based access environments
- Experience with modern AI tool sets
Environment
The client's environment is managed with strict security controls. Access is through VDI (Windows) RDP into a dedicated server Databricks. Internet access on work servers is limited. You must be comfortable working within these constraints.

Netguru

HQ revenue

MLabs

Our Future Health

PrizePicks

Deploy Alloy

Deploy Alloy

Deploy Alloy