New Project Launched: Spectacular.Gift
It began like this: Amazon.com’s vast product catalog contains many clever and unique items, the sort that you may not know you wanted until you’ve heard of it. Alternately, these items might make an ideal gift when shopping for the person “who already has everything”. So I figured it would be a neat idea to curate a collection of these items and build a gift recommendation site around them. Doing so would allow me to explore some new server-side technologies and help keep my skills fresh.
Technologies used:
- Ubuntu VPS from Linode
- Apache2 HTTP server
- Apache Tomcat 7
- Apache ActiveMQ
- Spring 3.1 with Spring MVC
- PostgreSQL 9.3
- HTML5 + CSS3
- jQuery, flot, TinyMCE
- Java libraries such as Joda Time, Jsoup, Jackson, Logback, and Commons DBCP
- Amazon Product Advertising API
- Reddit REST API
To get some data populated in the database as a starting point, I set up a scheduled task to pull data from several Reddit forums where Amazon links are shared. Reddit conveniently makes this data available via their REST API. All products discovered in this way are set to unapproved status pending manual review.
Next, I set up another scheduled task to populate and refresh metadata about the products via Amazon’s Product Advertising API. Per Amazon’s terms in order to display price information, this data has to be refreshed hourly. For efficiency I request data on batches of ten products at a time, which is the maximum limit.
I created a manual process for reviewing and approving products to be shown. This process includes writing a custom description, adding relevant tags (e.g. “For Kids” or “Gadget Lovers”), and setting an age range and gender specificity (if applicable).
The UI is written in JSP and outputs HTML5. Some features are powered by javascript, such as the price history button which uses flot to render the graph of historical price data.
Spring 3.1 ties it all together. Spring MVC handles the front-end. Spring JDBC is used for interacting with PostgreSQL. I could have used Spring’s event system, but I wanted to get some experience with ActiveMQ. There are a number of message senders and listeners set up for events such as “price changed” or “product suggested”.
I’ll probably think of a snappier name eventually, but for now I registered http://spectacular.gift (new “.gift” TLD). Have a look if you like! It’s basically in beta, and I’m still adding new products and tags.
Web Page Optimization – Changes I Made for Page Loading Speed
It has been years since I optimized the page loading speed at GoToQuiz. At that time, image sprites and CSS/javascript minification were the state of the art. In the intervening years, though, further progress has been made. Of particular note is the “retirement” of older versions of Internet Explorer, and the emergence of CSS3 and HTML5, which allows developers the ability to further streamline their sites. So as I am in the process of revamping the UI, I’m taking advantage of the latest techniques to boost page loading speed. Be aware, this is nothing revolutionary–this blog post is simply an overview of which changes I’m making, and why.
Migrating pages over to HTML5 allows the markup to be more semantic while taking fewer bytes–a double win. Using tags such as nav, section and article helps alleviate so-called “div-itis”, allowing cleaner markup and a smaller overall filesize. HTML5 also follows the maxim of “convention over configuration”, letting you eliminate unnecessary attributes like type=”text/javascript” on script tags, for example.
The adoption of CSS3 has provided many possibilities for creating beautiful visual effects without using image files. Here is an example: on an early iteration of the design of the various headings on GoToQuiz, I used this sprite combined with CSS to create the rounded color bar appearance.
How to extract titles from web pages in Java
Let’s say you have a set of URLs and you want the web page titles associated with them. Maybe you’ve data-mined a bunch of links from HTML pages, or acquired a flat file listing URLs. How would you go about getting the corresponding page titles, and associating them with the URLs using Java?
You could use an HTML parser such as Jsoup to request the HTML document associated with each URL and parse it into a DOM document. Once obtained, you could navigate the document and select the text from the title tag, like so:
String titleText = document.select("title").first().text(); |
Elegant, but a lot of overhead for such a simple task. You’d be loading the whole page into memory and parsing it into a DOM structure just to extract the title. Instead, you could use the Apache HTTP Client library, which provides a robust API for requesting resources over the HTTP protocol. But it would be unnecessary in this case. Let’s keep it simple and rely only on the java standard library.
To sanitize user content, use an HTML parser
It is especially important, if you allow any HTML at all in user-submitted content, to sanitize that content by actually parsing the HTML and filtering it for any tags or attributes you wish to exclude. If you fail to do so, your site may be vulnerable to XSS (cross-site scripting) attacks.
Q: “But isn’t it overkill to parse the HTML? Can’t I use other techniques, such as regular expressions or simple string replacement, to filter out dangerous tags and attributes?” A: No, and I’ll explain why. Read more