From June 6 to August 19, I worked as a backend development assistant in the Data Intelligence Department of the Accommodation Business Unit at the Trip.com Group Shanghai headquarters. I mainly focused on the maintenance and iteration of a data management product that provides in-company services. In this report, I will first give some background information about the company, the product, and the technical framework. Then I will talk about my specific work content in chronological order. At last, I will provide a brief evaluation of my internship.
Trip.com Group is a Chinese multinational online travel agency that provides services including accommodation reservation, transportation ticketing, packaged tours, and corporate travel management. I worked in a data development team that provides services to facilitate data analytics within the data intelligence department. The processed data, including accommodation tags, prices, and other information, will eventually be shown to customers when they search for hotels or homestays at Trip.com.
Our department uses Apache Hive as the data warehouse, which relies on Zeus scripts for the modification of Hive tables and data. However, modifications through Zeus scripts are relatively cumbersome and inefficient. Therefore, our data development team developed the Data Warehouse Tools (hereinafter referred to as DW Tools), an intranet website, that simplifies data upload, maintenance, and analysis. Rather than directly writing Zeus scripts, users can upload their data in excel files and manipulate tables using web user interfaces. The DW Tools consist of 6 panels, including data upload, data maintenance, whitelists, buried point analysis, consanguinity analysis, and permission management. My work mainly touched on the first three panels.
The web application comprises the Vue.js front end and the Spring Boot backend. I mainly focused on developing the backend of the DW Tools. The backend contains four layers: web, SOA, business, and DAO. The Data Access Object (DAO) layer manages access to the database; the web layer provides API endpoints to the front end; the service-oriented architect (SOA) layer provides endpoints in a unified form to other applications inside and outside the company; while the business layer is where the main logic exists. The code is hosted on a privately deployed GitLab. And the development follows a trunk-based Git workflow where everyone works on their respective feature branch, and the feature branch is merged into the release branch when finished. Additionality, developers need to write unit tests with PowerMockito for each new function, and the coverage requirement is above 50%. A new iteration of the product is first deployed to a feature acceptance test (FAT) environment for testing and then to the production environment. Furthermore, the company also uses iDev, a self-developed product development management tool similar to Jira, to plan and track the development of new features.
To begin with, I started with a minor enhancement that allows users to sort data maintenance entries by id in either increasing or decreasing order in the data maintenance panel. To implement this function, I added a new parameter indicating the sorting order to the list-project endpoint’s parameter list. In the business layer, I used LambdaQueryWrapper to wrap the SQL I needed to sort the entries. This first enhancement, though simple, gave me an overview of the project and the workflow and helped me understand how data is transferred through all layers.
After familiarizing myself, I was assigned another task to record and display all the user operations for every data upload entry, for example, upload and synchronization of tables. To persist the records, I created a new operation log table on the company’s PaaS development service platform. Then, I created the corresponding POJOs and mappers in the DAO layer using MyBatis-Plus. This function provides no new APIs. Instead, it periodically retrieves the job running status from Zeus APIs and records whether an operation is successful when the job is finished. In addition, I provided another API to list all the logs of an entry so that users can view the history easily. This function supports pagination and filtering by entry id and operation time. The task helped me to understand how to build new functionality from scratch and how to write to and read from the database.
After then, I collaborated with another team member to create a new whitelist system for the DW Tools. The whitelist system is to exempt some Zeus jobs from the alert middle platform so that these jobs will not trigger alerts even if they are not finished in the designated time. An administrator of the system can create whitelist types and approve user-applied whitelists, and a user can apply for a whitelist under a whitelist type. Each whitelist may have different dimensions like Hive table, Hive account, Zeus job id, and employee id, and each dimension can have multiple dimension values. In addition, there is also a similar operation log function for this system. I was mainly responsible for the CRUD of the whitelist types and displaying whitelist items and logs. The pages showing whitelists, whitelist types, and operation logs are similar to the log function I built for data upload. They all support pagination and filtering by multiple parameters. The CRUD of whitelist types goes through validation that only administrators can perform these operations and that no duplicated types exist. Creations and updates utilize LambdaUpdateWrapper. I implemented 11 of 23 functions of the whitelist system, which took me weeks to design, develop, and test. Yet this was a worthwhile experience because the task guided me through the whole process of building a nearly new product.
The internship provided me with a deep understanding of the software development tools used in the industry. The sophistication and preciseness of the industrial code go far beyond those of code in in-class assignments and research projects. In the Internet industry, a product may have an infinite life span, so maintainability is highly prioritized. We must follow rigorous rules and workflows so that our products work now and in the future. Additionally, any products in an Internet company must depend on some services provided by other departments, and therefore, it is vital to increase cohesion and decoupling. We must remember that any changes downstream affect our functionalities, and anything upstream rely on our products. Thus, we must be cautious when making any changes. Another point is that monitoring and logging are essential for Internet companies because the development cycle of a new feature is very short, so the development team must utilize the product’s vast number of end users to test and fix unidentified bugs. And monitoring is just the foundation of this process.
Despite the significant difference between industrial and academic projects, I still found the knowledge taught in class very useful in the internship because it helped me understand the underlying logic and principles of the technical tools. For example, relational algebra ideas helped me design a more efficient database; objective-oriented programming helped me learn the Spring framework more smoothly. The knowledge also helped me to identify some shortcomings of our project. For example, the trunk-based workflow that our team was using generated hundreds of feature branches, making the Git graph of the release branch uninterpretable. Maybe One Flow taught in the software engineering course works better.
The internship at Trip.com was a great experience that broadened my horizon, enabled me to apply theory to practice, and helped me learn industrial tools and ideas.
— Aug 22, 2022
Made with ❤ at Earth.