Vacancy caducado!
Software Guidance & Assistance, Inc., (SGA), is searching for an Big Data QA Engineer for a Contract assignment with one of our premier Information Technology Services clients in San Diego, CA. Responsibilities :
- The Data Engineering team has a need for a Big Data QA engineering professional who can assist with important test automation and QA tasks associated with mission critical projects.
- This engineer will work closely with the team's primary test engineer and the broader team to align cohesively with ongoing projects.
- Primary focus will be on QA test automation.
- Will define a test automation strategy for new data pipelines or modify existing strategy to expand QA coverage of an existing pipeline.
- Create data pipeline, specific input datasets, and expected datasets to implement QA automation.
- Modify existing input datasets and expected datasets when business requirements for existing data pipelines change.
- Create or modify Python scripts for triggering data pipeline specific QA tests and validating against the expected outputs.
- Integrate with Jenkins CI/CD automation to run nightly QA tests automatically.
- Perform manual testing where automation is not feasible, or QA tests need to be run on ad-hoc basis.
- The role requires an engineer who is data savvy and has an overall system quality mindset.
- Good understanding of basics of probability and statistics. For example, ability to create representative smaller samples from larger datasets by applying various selection methods. Should know what a Normal distribution is and how to prove the randomness of a data sample. Able to apply descriptive statistics for data validation, troubleshooting.
- Advanced Python scripting skills are must. For example, ability to work with ease using various data formats such as CSV, JSON, XML in Python and create modular/re-usable code. Should be able to write OO code and understand list comprehensions in Python.
- Comfortable working in Linux environment with bash, grep, awk, ssh, xargs etc.
- Critically think about corner cases in data pipelines and create test cases to simulate those conditions. Should understand the significance of test coverage. Should understand fault injection.
- Understands the problems associated with processing large datasets (10's of TB) and is conceptually familiar with technologies available to solve those problems.
- Not expected to know Hadoop or Spark but would-be a plus.
Vacancy caducado!