Every state in the USA has millions of dollars of unclaimed money every year. This money/property is initially held by organizations who hadn’t been able to contact its owner. After a certain period of time and as per state rules, this money is turned over to the state. The state then becomes the custodian of this money until it is claimed by the owner. Unclaimed Money is a USA-based organization that helps its citizens claim this money. The website consolidates this information from various state databases and federal resources into a central site. Hence it makes the search for unclaimed money quicker and smoother. For this project, the primary requirement of the client was to build a basic form to demonstrate the data scraping capability. The form will enable users to enter their first and last names, state and business names (if any). Now the program will search in the respective state sites and fetch a list of matched data. Later, this program will be used as an API.
We had a slew of challenges while working on this project:
- Scraping data from websites that have captcha validations
- Scraping data from US government sites without getting blocked
- Searching the data runtime and showing the latest and most relevant result
- Building the site like an API so that it can be called as REST API at a later stage
- For scraping data, we used curl calls and simple HTML dom libraries
- We used solving services to scrape data from websites with captcha
- To prevent getting blocked we used the proxy network. The bonus was that it could handle thousands of requests.
- We implemented REST API in the site and added further functionalities
We were able to overcome all the challenges with effective solutions that fulfilled the purpose of the project.