Pluggable Search solution for legacy Enterprise Application
Pluggable Search solution for legacy Enterprise Application
PROBLEM STATEMENT :
A company had been using a 15 year old legacy platform for their business. Over the years the application became slow, especially during their peak season. By closely looking at the system, we found out that search queries in SQL database were taking more than an hour when the system was under the peak load since the data has grown to large volumes over the years.
SOLUTION :
After much deliberation, team came up with a solution to supplement existing system with AWS Cloud Search. The idea was that the legacy Database becomes the single source of truth and the AWS CloudSearch can be used for searching purposes. That meant, we dont use the legacy database for search purposes. The data was too large to be efficiently searched in SQL db. Hence, AWS CloudSearch would enhance search capabilities of the system.
To further explain the new modified system :
1. Legacy Application :
8. Failure Log Files :
A company had been using a 15 year old legacy platform for their business. Over the years the application became slow, especially during their peak season. By closely looking at the system, we found out that search queries in SQL database were taking more than an hour when the system was under the peak load since the data has grown to large volumes over the years.
SOLUTION :
After much deliberation, team came up with a solution to supplement existing system with AWS Cloud Search. The idea was that the legacy Database becomes the single source of truth and the AWS CloudSearch can be used for searching purposes. That meant, we dont use the legacy database for search purposes. The data was too large to be efficiently searched in SQL db. Hence, AWS CloudSearch would enhance search capabilities of the system.
To further explain the new modified system :
1. Legacy Application :
- On the left side of the above diagram represents the legacy application which had a Client-Server architecture. This was an ON-Premise solution.
- Earlier all data related to new / existing orders was stored in the SQL database and Stored Procedures were used for searching the records in the DB. The performance of the stored procedures deteriorated over the years as the volume of data increased. Modifications in indexes and search queries were not helping any more.
- Given the new design, any new data related to new order or older order would be saved in legacy SQL database, and will also be pushed out to AWS cloud search. Moreover, an offline batch job will push all existing data in SQL db to AWS cloud search. Hence given this, all search queries would be redirected to AWS cloud search going forward.
- In case the new data is not being able to written to AWS cloud search, it would be entered in the Fail log and would be retried later.
2. Data Integration Web Server :
- The legacy Application server will post new data to this cluster of Data Integration Web Servers, sitting behind a load balancer.
- Once these Data Integration Web Servers receives the data, they push it out to the Kafka messaging platform and Redis Cache.
- Data Integration Web Servers also provides search API by searching data from AWS cloud search. This API is used by the legacy Application server to facilitate any search operation.
3. Messaging System :
- Kafka is used as the messaging system here. It queues the messages for bulk update to AWS CloudSearch.
- Apache KAFKA brokers fits well as this is a streamed data outflow system.
4. Message Readers :
- These are Kafka consumers that stream messages from messaging system and do bulk update to AWS cloud Search.
5. AWS Cloud Search :
- It used to index and search the data related to orders.
- It is a managed service in the AWS Cloud that makes it simple and cost-effective to set up, manage, and scale a search solution. This made it a Hybrid (OnPrem - OnCloud) solution.
6. Cache Servers :
- Redis is used for caching data related to new order or an update in existing order.
- Search API will get the most recent data from Cache server and merge with the data from CloudSearch before sending it back to client.
- Amazon CloudSearch has latency in indexing the batches (2- 10 sec observed). Cache server will cache most recent order updates and provide them to the user if the search request is received. The entry in the cache server will expire in pre-configured time.
7. Failure Recovery Services :
- It periodically test that the data in CloudSearch and Master SQL database are in synch. And Synchronize if inconsistent.
- It also reads Failure log files and retries the failed order data to CloudSearch.
8. Failure Log Files :
- Incase there are failures in posting the order data to Data Integration server from legacy application server, it can be logged to a failure log file.
- This log file can be read and failed order data can be send to CloudSearch.
Author : Mayank Garg, Technology Enthusiast and Georgia Tech Alumni
(https://in.linkedin.com/in/mayankgarg12)
Comments
Post a Comment