Building a Cloud sourcing system that pulls data from tens of thousands tables on-premises, and hundreds of data source systems (RDBMS, file server, Kafka,...) Designing change data capture method for each source system
Designing and implementing a Big Data management system by leveraging Lambda/Delta/Kappa Architecture concept
Implementing data replication/projection lag from Techcombank on-premises to Techcombank Data lake on Cloud in real-time processing or batch model with high performance - scalable AWS services like Kinesis, Firehose, Glue (Spark), Lambda,...
Serving analytics data of ten million Techcombank customers to other Techcombank internal teams, providing hundreds of millions of data elements daily through high throughput, low latency protocols (WebSocket, message queue, REST, GraphQL,...)
Collaborating with the Data Architect team on regular basis to design/review data models, and application architectures
Maintaining software clean architecture, clean code, and high quality. Participating in code reviews, pair programming, mob programming, and coaching other members
Providing data features infrastructure to empower ML engineering in various bank businesses
Kỹ năng & Chuyên môn
Must have:
Bachelor’s or Master’s degree in Computer Science, Software Engineering, Information Technology, or a related technical field
Have 6+ years of experience as a Data Engineer or Software Engineer
Extremely proficient in at least 1 programming language (Java, Scala, Python)
Strong experience in systems architecture – particularly in complex, scalable, and fault-tolerant distributed systems
Good at multi-threading, atomic operations, computation framework: Spark (Dataframe, SQL, ...), distributed storage, distributed computing
Understand designs of resilience, fault-tolerance, high availability, and high scalability, ...
Tools: CI/CD, Gitlab, ...
Good at communication & team working
Being open-minded, willing to learn new things
Bonus points:
Cloud experience (AWS, GCP, etc), AWS is a plus
Experience in performance tuning/optimizing Big Data programs
Having knowledge of distributed query engines: Presto, Hive, ...