Every business owner, every entrepreneur will agree to the fact that manually managing invoices can be a huge nuisance. Sending out incorrect bills or not generating a bill at all can nip a business in the bud before it even gets started. “Around 82% of finance departments are overwhelmed by the high numbers of invoices they are expected to process on a daily basis and the variety of formats they’re received in.”
We were approached by one such client who was looking to automate the whole process of managing invoices.
We were approached by an Accounting firm that managed the invoices of a Photography company.
The photography firm clicked and provided real estate photos to various clients. The only way to know how many of those photos were put to commercial use was to manually sift through the firm’s clients’ websites and count each one of them. The count needed to be accurate to prevent any disputes in invoicing. The manual process was dull, repetitive, time-consuming and prone to errors.
The task was to automate the process of matching and counting the exact number of photos that were used by the photography firm’s clients’ on their websites to enable the firm to raise accurate invoices.
We designed a complete crawler solution for an end to end process automation, which followed the steps taken by human intelligence to find the exact set of the photographer’s images being used on any website.
We designed the following workflow to arrive at high quality and accurate results within an automated framework:
- We designed advanced website crawlers that were able to crawl websites with dynamically generated content and find exact pages based on the information present on the page such as text, meta tags, etc.
- Next, an algorithm we created was able to find the photographer’s images irrespective of any image manipulations such as cropping, contrast change, etc.
- Then, we created an intuitive interface where users could upload original clicked images, the website to be scraped, and other information that could be useful in finding the appropriate content.
The crawler smartly scraped images even from the websites with complex architecture and matched the scraped images with original ones irrespective of any manipulations. Clients were thus able to save numerous valuable hours otherwise wasted in manual efforts and money lost due to human errors.
Python, Selenium, Beautiful Soup, Spacy.