COVID-19: We are continuing to provide recruitment services to our customers & candidates and urge you to contact your relevant recruiter should you need any assistance.
  • +61 2 80678518 (Head Office)
  • +91 40-7960-9275 (Offshore Office)
Jobs in Australia, Sydney, Melbourne | Job Search Australia | Careers Australia | Job Opportunities | Production Incident/Support Engineer
Careers
LinkedIn Apply
Experience :  5+ Years
 Location :  sydney
Hiring Mode :  Permanent/contract
Contract :  Full Time/Long Term
Salary :  Open
Description : 

Production P1/P2 Incidents, Infrastructure issues

Digital P1/P2 incidents in Production Environment - owning issue to service restoration, Identify Root Cause and own it towards to Strategic fix.

 

Production Environment Health Check: Potential P1/P2 Incidents

Address all health check issues found in Production through Monitoring, Alerts - owning issue to service restoration/resolution, Escalate the alerts to other portfolio's and teams when Necessary

 

(P3/P4 incidents)

  1. Keep track of P3 and P4 incidents, restore and resolve them through Problem tickets
  2. Keep the Snow Incident count < 50
  3. Keep the Problem Count (Including Known error count) < 50 update weekly basis (update it before the weekly Problem Management meeting)
  4. Update the JIRA tickets for the newly created Snow tickets which required code fix- Move it backlog and prioritise it based on the customer/business Impact
  5. Keep the Backlog ready for Sprint, must have 2 value adds and 2 Technical Debts added to each sprint from BSA's
  6. Any Data Breach Related Incidents/Emails/queries - Escalate it Immediately to Operations Lead and Manager
  7. Keep adding the PVT Documents in confluence, at least One per week for Digital Integration systems
  8. Check for weekend PVT in Confluence page for any critical PVT activities and get it clarified with Release/Change Manager before the weekend
  9. Any DDC application issues should be assigned to David post initial L2 analysis and add it to JIRA as well – Document the relevant Incident analysis details to Confluence
  10. L3 – Code analysis & Small Enhancements to be fixed/owned by BSA's if it is less than 2 days of Dev effort.
  11. Strong ITIL
  12. Continuous Improvements:
  13. Handling Minor Enhancements
  14. Suggest Process Improvements
  15. Improvise Monitoring and Alert capabilities (Dynatrace, Kibana, Splunk and Other Alert & Monitoring Mechanisms from SOC)
  16. Suggest/Automate repeated Manual tasks in Digital Operations
  17. Involve in Cross Functional training
  18. Document – Artefacts on Ops Hand Book, Run Book, Onboarding Documents etc
  19. KEDB
  20. Keep Track on Capacity and System Organic Growth Management
  21. Keep track of Post Release Incidents/minizine Impact