Configure Lucene IndexWriter and IndexSearcher in Spring applicationContext.xml
Problem: you want to define Lucene IndexWriter and IndexSearcher as beans inside your Spring application to be injected/autowired into other beans.
Solution: follow the following steps.
- define the Lucene version as a constant
- define a Lucene analyzer (StandardAnalyzer) as a bean
- define a Lucene directory as a bean, using a factory-method for instantiation
- define an IndexWriter, wiring in the Lucene directory and an IndexWriterConfig set to use your previously-defined analyzer
- define an IndexSearcher, wiring in the Lucene directory
- define also a query parser (StandardQueryParser), wiring in the analyzer bean
You can then wire/autowire these beans into your application beans, for example:
Read more
Simple pagination taglib for JSP
Pagination is a common requirement when writing JSPs. Long sets of data must be broken up across multiple pages. There is no standard way to implement pagination in JSP or JSTL, however. You must use a custom tag to generate the page numbers.
I’ve created a simple pagination taglib that generates the page links. You can customize it to your needs. Usage example below. Here, I’m using it to paginate search results from Lucene:
<c:url var="searchUri" value="/searchResults.html?s=${searchval}&page=##" /> <paginator:display maxLinks="10" currPage="${page}" totalPages="${totalPages}" uri="${searchUri}" />
The paginator:display tag produces this output:
Convert between Java enums and PostgreSQL enums
PostgreSQL allows you to create enum types using the following syntax:
CREATE TYPE animal_type AS ENUM('DOG', 'CAT', 'SQUIRREL');
You can now use ‘animal’ as a datatype in your tables, for example:
create table pet ( pet_id integer not null, pet_type animal_type not null, name varchar(20) not null );
In Java, you’d have a corresponding enum type:
public enum AnimalType { DOG, CAT, SQUIRREL; }
Converting between Java and PostgreSQL enums is straightforward. For example, to insert or update an enum field you could use the CAST syntax in your SQL PreparedStatement:
INSERT INTO pet (pet_id, pet_type, name) VALUES (?, CAST(? AS animal_type), ?); --or INSERT INTO pet (pet_id, pet_type, name) VALUES (?, ?::animal_type, ?);
Postgres will also let you insert/update an enum just by passing its value as a string.
Whether casting or not, the Java side is the same. You would set the fields like this:
stmt.setInt(1, 1); stmt.setString(2, AnimalType.DOG.toString()); stmt.setString(3, 'Rex');
Retrieving the enum from a SELECT statement looks like this:
AnimalType.valueOf(stmt.getString("pet_type"));
Take into consideration that enums are case-sensitive, so any case mismatches between your Postgres enums and Java enums will have to be accounted for. Also note that the PostgreSQL enum type is non-standard SQL, and thus not portable.
Also, FYI, to view the set of values in a given Postgres enum type, you can use the following SQL query:
SELECT enumlabel FROM pg_enum WHERE enumtypid = 'your_enum'::regtype ORDER BY oid;
Web scraping in Java with Jsoup, Part 2 (How-to)
Web scraping refers to programmatically downloading a page and traversing its DOM to extract the data you are interested in. I wrote a parser class in Java to perform the web scraping for my blog analyzer project. In Part 1 of this how-to I explained how I set up the calling mechanism for executing the parser against blog URLs. Here, I explain the parser class itself.
But before getting into the code, it is important to take note of the HTML structure of the document that will be parsed. The pages of The Dish are quite heavy–full of menus and javascript and other stuff, but the area of interest is the set of blog posts themselves. This example shows the HTML structure of each blog post on The Dish:
Read more
Disable URL session IDs (JSESSIONID) in Tomcat 7, Glassfish v3
URL-based session tracking is intended for web clients that do not support session cookies. Every browser worth mentioning supports these cookies, and almost nobody surfs with them disabled. Most web sites either state explicitly or assume that a user’s browser supports session cookies. URL rewriting schemes that add the session ID as a parameter on every URL thus provide very little benefit, if any at all. Session IDs showing up in URLs is just bad form, and may confuse search engine spiders. Thankfully the Servlet 3.0 standard gives you two ways to disable URL session rewriting. This works in Tomcat 7, Glassfish v3, and any other Servlet 3.0-compliant servlet container.
First, you can add this to your web.xml web-app config:
<session-config> <tracking-mode>COOKIE</tracking-mode> </session-config>
Or programmatically, you can use:
servletContext.setSessionTrackingModes(EnumSet.of(SessionTrackingMode.COOKIE));
I’ve used the web.xml method in Tomcat 7, and it works. No jsessionid in the URLs when using <c:url …> in my JSPs.
Web scraping in Java with Jsoup, Part 1
In order to obtain the data to feed into my blog analyzer, content must be parsed from the pages of the blog itself. This is called “web scraping”. Jsoup will be used to parse the pages, and because this is a Spring project, Spring scheduling will be used to invoke the parser.
The following classes were created:
- BlogRequest – invokes the parser on a given blog URL, passes parsed content to service layer
- BlogRequestQueue – queues up and executes blog requests
- BlogParser – interface with parseURL method
- DishBlogParser – implements BlogParser, used to parse the blog The Dish
Each of these (aside from the interface) is configured as a Spring-managed bean. The code for BlogRequest:
Read more
A Blog Analyzer Project
In the coming days, I will be writing about a project I’m working on which will perform analysis on Andrew Sullivan’s The Dish blog, which is one of the most popular blogs on American politics. The intent of the project, which will utilize such technologies as Spring 3, JSP/JSTL, JDBC, PostgreSQL, and jQuery/Ajax, is to web scrape the blog, extract key data elements, and reorganize and present this data in new and interesting ways. Additionally, I will create a bookmarklet that will add value to the blog site itself.
Development tools used include Netbeans 7.0, Firefox with Firebug, and the always handy psql Postgres command line tool.
There are many interesting technical challenges involved, and I will write about them on this blog. Additionally, there is the question of copyright law, which is an unavoidable concern when building off of content from a third party. Copyright law was not meant to stifle innovation, though, provided certain criteria are met: the content originator must not be harmed in the marketplace, repurposed content must be transformed into a novel work, and small portions must be used. I believe my project fits these criteria.
Profile SQL statements in Java / Spring
Wouldn’t it be nice if there were a way to time your application’s SQL statements unobtrusively? This information could give you insight into the performance of your queries and updates and help you identify slow, poorly-performing SQL. Of course, there is a way to add such SQL profiling to your Spring application, by using AspectJ.
I use Spring JDBC and wanted to identify slow SQL queries in my application so that I could tune them in order to improve overall performance. Capturing the execution times whenever SQL is executed can be done by creating a pointcut on the methods of JdbcTemplate. Here is what we need:
- aspectjrt.jar and aspectjweaver.jar from here.
- An aspect with a pointcut on the JdbcOperations interface, which JdbcTemplate implements.
- @Around advice that times the execution of JdbcTemplate methods and stores this data for later retrieval.
- Configuration of Spring applicationContext.xml to get it working.
The pointcut looks like this, with the String argument being the SQL statement:
Read more
Add custom annotation to Spring MVC controller
The question of how to add custom annotations to my Spring MVC controllers puzzled me for some time, because the documentation in this area is lacking. Even the bible of Spring 3.0, Spring Recipes by Gary Mak, et. al., did not address the topic.
But then I found this great blog post detailing exactly what I was interested in. In a nutshell, you need to implement WebArgumentResolver and set your class as a customArgumentResolver of the AnnotationMethodHandlerAdapter bean. What I was interested in was adding a @RequestAttribute annotation that would work like @RequestParam, but would obviously pull the value from a request attribute rather than a request parameter.
Read more
JSON / Ajax in Spring MVC
How do you configure your Spring MVC web application to serve JSON for Ajax? It is not difficult. You probably have a view resolver in your dispatcher-servlet.xml that looks like this:
<bean id="viewResolver" class="org.springframework.web.servlet.view.InternalResourceViewResolver" p:viewClass="org.springframework.web.servlet.view.JstlView" p:prefix="/WEB-INF/jsp/" p:suffix=".jsp"></bean> |
In order to serve JSON content, you need to use the ContentNegotiatingViewResolver. The configuration looks like this:
Read more
Highlight words in text with jQuery
On the right is an example of word-highlighting in a blog post, done with jQuery (my javascript library of choice). I needed to highlight certain words in blocks of formatted text, so I went searching for an existing solution. I found a pretty good starting point, posted by bjarlestam on Stackoverflow. His highlighting function is below. It wraps the word inside a span tag with a class of your choosing.
jQuery.fn.highlight = function (str, className) { var regex = new RegExp(str, "gi"); return this.each(function () { this.innerHTML = this.innerHTML.replace(regex, function(matched) { return "<span class=\"" + className + "\">" + matched + "</span>"; }); }); }; |
This is nice, because it uses a case-insensitive regular expression to find the word to highlight. However, there is a flaw. If you call this function on an element with non-text child elements, and the tag/attribute names or attribute values match the word to highlight, this function will mess up your HTML. We’d like it to apply only to text nodes.
Read more
Combine and cache multiple CSS files for performance
This is a simple way to get a performance boost if you’re using PHP + Apache. I am using it to boost performance on GoToQuiz. If you are not using Apache, you can still use the PHP portion of this performance enhancement.
Some quick background: maintaining your site’s CSS in multiple files is frequently useful and sometimes necessary. But including multiple CSS files results in a performance hit when your pages load, because each one requires a separate request from the user’s browser to your server. Each request has associated overhead, which can be minimized if you send all CSS in one request.
This is how you can combine your CSS files with PHP and cache them, relying on a clever bit of .htaccess modification: