As Dag Kittlaus from TechCrunch notes, the sense of web resources in the short run is going to be drastically rethought. The time isn’t that far away when we are going to experience multiple online services merging together in order to give a user rapid accessibility from any web location to the content or a feature he is interested in. The network isn’t going to resemble the archipelago with separate islands but rather become the unified structure based on collaborative efforts of owners.
Just think of it. Imagine you have interactive and flexible platform which combines tools and services you constantly use. You don’t have to struggle to keep everything in order. Although this idea seems yet surreal, it has multiple precursors that never looked so unusual. For instance, web scraping and uniting data within a set structure happened to be a common approach for multiple purposes, including the ecommerce sphere.
Safety ring: helping small business
About two years ago we were offered to participate in the project the mission of which was to facilitate minor retail vendors in the United Kingdom and Ireland.
Trying to climb the first threshold of retail sales and present yourself to the world is the stage when 90% of startups fail.”
The common reason is obvious: a tight budget which is never enough to run a successful marketing campaign. That’s why the idea behind our new project was to build an efficient platform where minor vendors, including enthusiastic fellows crafting their first products in a garage, could present themselves to a picky consumer.
Ride big guys: how to make a breakthrough to customers?
No offense to anybody but customers are extremely rigid when it comes to choosing something new. Even quality and the affordable price can’t be the concrete substances to compete with famous brands. Most likely you’re hesitating between iPhone and Android when you decide the next mobile device and rarely consider BlackBerry as an option. The same works for wear, sports equipment, etc.
So the objectives of our project were the following:
- To engage big market players in our project;
- To let minor players leverage the brand awareness of famous companies;
- To build a convenient platform.
To depict this idea imagine that a brand new grocery store has just opened on the corner of your street. The location is surprisingly convenient since you just have to walk a couple of blocks instead of the 15 minutes of drive. When you happily visit new store to complete your regular shopping list and wonder for a while around shelves, you realize that there’s any item you could recognize. Although you see milk, bread, chocolate, cookies, all of these were produced by some unfamiliar companies. No Lays chips, no M&Ms, no Coca-Cola, and yet some chips, some candies, and some soft drinks. If you’re not a fan of gastronomic experiments that day, you’ll probably choose to drive.
In order to avoid this feeling of brand oblivion, you have to recognize at least some of the items. Then you might consider to experiment and try something new.
Having this in mind, we decided to build an ecommerce-like resource where the products of both big vendors and small companies would be displayed as advertisements. Each vendor would have his account which allows to publish ads within our marketplace. And in the same time, the software we use would parse advertisements from existing websites to present them automatically. In a nutshell, it was supposed to be an advertisement aggregator.
As our partner was in charge of negotiating with the managers of recognizable brands to be engaged in the project, our task was to provide an efficient technical ground for it.
Roleplay: users, merchants, admins, guests
Prior to any development actions, we had to outline the roles of users. Although we weren’t hesitating about guests, admins, and regular users, there were some considerations regarding merchants.
Guest. A user who hasn’t registered to a website. A guest can examine available items in the catalog, see prices, and sales.
User. Once you’ve signed up, you can purchase goods receiving access to source websites where initial ads were published.
Merchant. A retailer who has a profile with ads presented. As the project wasn’t actually a charity for minor vendors, we offered several payment plans for them. Contrariwise, the big vendors had the same profiles but to keep them on a platform there weren’t any fees required. Ads could be managed and edited within profiles.
Admins. Admin’s role was to manually build accounts for big vendors and notify them.
Structure: logics and technologies
1. Symfony 2.0. As a basic technology, we decided to use Symfony 2.0, which seemed customizable enough for our project. It was used for the project core and REST API.
2. Scrapy/CasperJS. These two served for spider scripts that were parsing ads from the target platforms.
3. Supervisord/Twisted. Supervisord was utilized for tasks and processes. As we had dedicated spider scripts for each target website, the Python based multithread supervisor (Twisted) was making sure that each spider finds the right access point to a catalog, it works with and does it in the right time. It was also in charge of updating database.
4. ORM lite. The out-of-the-box Doctrine ORM was so powerful that yet it appeared too heavy in terms of SQL relations and was overloading the memory. That’s why we developed its diet version customized for our project only.
5. Ext JS. This library was used to build the admin panel. Its major functionality was providing the manual import of businesses, and creating their profiles. It was both possible to import contacts by CSV format or Microsoft Excel.
6. JSON. The JSON data interchange was needed to set connection with the separately developed mobile application.
Scraping: with Scrapy
The major challenge we faced on the early stages of the development was how are we going to aggregate advertisements from our partner online stores to present them in the catalog. The magnitude of the market coverage was supposed to be rather big, so the idea to compose ads manually wasn’t even considered.
We happened to be the ones to follow the path of least resistance and took the Scrapy framework. More than 50 spider scripts were designated to crawl around the partner websites and gather data.
Collectibles: data which were aggregated
Scrapy seemed convenient enough on the first stages of the development. We were collecting:
- Name of the item
- Advertisement image
- Description
- Price
- Reduced price
- Relative discount (the discount value in percent)
Since some of these data aspects could be missing, it wouldn’t be displayed. If we had a price and a reduced price only, the spider supervisor was automatically defining a relative discount. Respectively, it was calculating a reduced price having a relative discount number.
Reality hits: making use of CasperJS
Once we’ve tested our Scrapy solution, the crude reality hit us. As more and more sites were enhancing designs applying dynamic AJAX based elements, Scrapy wasn’t ready to process those dynamic DOM trees. However, it was effective for static structures. Scrapy didn’t get along with JavaScript, and jQuery was just knocking our spiders down.
“We had to find some elegant solution, so a number of possible approaches were discussed. The option to use CURL and the Symfony integrated HTTP-client was rejected as it was too bulky. Symfony isn’t the best tool for sustainable tasks because Doctrine still has memory leaks.
Following the principle to fight fire with fire, we’ve decided to use PhantomJS. I personally considered actual Phantom to be rather unfriendly, so I took the CasperJS suite. Now I could write synchronous JS and had access to native methods of Phantom via the proxy. These capabilities were decisive arguments.
With the new spiders, we weren’t reinventing the wheel, but just switching the platform. For spiders, we’ve created a manger on Twisted which was tracking the database and running the scripts in several threads to optimize the process. What pleased me the most was the fact that we gained the access to the user’s JS environment. We could inject third-party scripts and basically had the full access to browser JS-console, and DOM trees weren’t a problem anymore.
Once I got a hand in writing those spider scripts, it took me from 15 minutes to an hour to complete one.”
Controlled randomness: sophisticated entities relations
Since our project wasn’t a charity, minor vendors had their accounts to manage the terms of payment. According to a number of credits purchased, a merchant could publish advertisements.
Once a merchant bought a subscription, advertisements would be displayed. The problem behind that was regarding the order of the pre-paid ads. Since we wanted to present both famous brands and minor vendors, we had to decide the proportions of the “big” and “small” ads. In order to gain consumers’ trust, the advertisements from minor vendors didn’t have to be overwhelming and yet enough to be noticed among various recognizable brands.
After some considerations, we’ve come up with several principles according to which we would display those “small” ads. The proportions were specifically defined for different user statuses. We’ve composed a set of general as well as specific rules on how the ads should be displayed.
General rule. To imitate randomness, the pre-paid ads had to be spread along the catalog grid. Moreover, they should never be next to each other in one row or a column.
Guest. A visitor who hasn’t registered yet would see no more than 5 pre-paid ads per grid.
User. A registered user would see no more than 3 ads per grid.
Merchant. Merchants weren’t seeing any ads of this kind unless they turned on a user mode to see how the ads were distributed.
ORM lite: dieting Doctrine
Once we’d outlined the principles, a new challenge arose. These complicated database entity relations combined with fuzzy data filtering could overload the memory if we would use Doctrine ORM which comes out-of-the-box for Symfony.
Although Doctrine is a robust tool, it required much memory while we weren’t actually utilizing the most of its functional capabilities. So we had two options. Either we look for an existing solution or we make the customized mapper tailored exactly to our needs.
Well, we found the third path. Lite ORM is the shortcut of Doctrine which wasn’t bringing so many relations into the MySQL database and respectively wasn’t overloading available memory resources. It took us about 30 hours to customize Doctrine the way we liked, test, and debug it. Basically, we’ve turned a higher-speed rail into the golf cart and named it ORM lite.
Here is the part of it.
Inside merchant’s cockpit: advertisement editor
Even though ads could be brought directly from the merchants’ sites, we’ve developed a convenient tool to post advertisements within our environment. The editor allowed to add any relevant information and post an advertisement.
Ads hub: to simplify the e-commerce
As been said, the cornerstone of our project was to make an online customer life easier and to build a convenient hub both for customers and merchants.
The outcome we experienced is rather impressive. Currently, we have more than 70 “big” retailers operating via the platform as well as more than 100 “small” businesses leveraging its capacities to promote their products. Our partner has raised the overall business value by 350% during the last 7 month.
As the development goes, we still intend to enhance a number of features:
- full-scale marketplace to trade products solely via the platform, including shipments;
- optimized mobile integration which brings all the capacities of the resources to smartphones and tablets;
- the expansion of the network to cover more regions;
- the development of several mobile applications each tailored to different product categories to acquire users who seek specific purchases.
Stay tuned for updates.