Source, extract, and manipulate large data sets to serve Machine Learning and Artificial Intelligence needs of Paychex Insurance Agency.
Responsibilities
Utilize existing infrastructure to assemble large, complex data sets that meet business requirements through extraction, transformation, and loading of data from a wide variety of data sources
Work closely with architects, solution leads, data owners, Data Scientists and key stakeholders to facilitate and coordinate the data platform backlog grooming process, triaging new feature requests in preparation for future project activities
Deliver automation & efficient processes to ensure high quality throughput & performance of the entire data & analytics platform
Ensure data extraction, transformation and loading data meet data security & compliance requirements
Engage with data source platform leads to gain tactical and strategic understanding of data sources required by Agency Data Services AI/ML as well as Data Office standards.
Create data tools for data scientist team members that assist them in building and optimizing models for use in Paychex products
Qualifications
4 years of experience in Working with SQL, Databricks, Spark and/or other Big Data technologies.
4 years of experience in Experience building and optimizing data pipelines, architectures and data sets to answer specific business questions and identify opportunities for improvement.
2 years of experience in Experience working with large-scale data processing and storage using Azure Data Factory, Integration Runtime, Data Lake, Databricks, Spark, Azure ML, and Cosmos DB.
4 years of experience in Proficiency with languages such as Python, SQL, PySpark, R.
2 years of experience in Understanding of Privacy, Compliance and security aspects of data storage & processing.
2 years of experience in Experience delivering data solutions via Agile methodologies.
4 years of experience in A successful history of manipulating, processing and extracting value from large disconnected datasets.
2 years of experience in Proficiency for Software Development & CI/CD methodologies and tools for automated infrastructure code; ML Ops.
2 years of experience in Design, implement, and maintain automation platform and tools, including Ansible Tower, Azure, ARM, Terraform Enterprise, Azure DevOps and GitHub Actions.
2 years of experience in Salesforce FSC, Salesforce Data Cloud.
Requirements:
Bachelor's Degree in Computer Science, Data Science, Engineering or equivalent software/services experience - Required
8 to 10 years of experience with 5+ years of relevant data experience.
Experience with SQL, Experience with large scale data processing, ETL, Data Profiling, Databricks, Azure Data Factory, Snowflake.