Les Hill github twitter facebook linked in archives
Posted June 03, 2009

We recently needed to show a truncated version of existing HTML content. Although there are several issues1 when dealing with HTML content, our specific concern was maintaining the integrity of the HTML. Some quick googling led to a nice helper written by Henrik Nyh last year. We tweaked the original a bit to append the ellipsis within the tag at the truncation point and truncate at a word (or tag) boundary. Here it is, enjoy.

 1 # By Henrik Nyh <http://henrik.nyh.se> 2008-01-30.
 2 # Free to modify and redistribute with credit.
 3 # Word truncation and fixes by Les Hill <http://blog.leshill.org> 2009-06-02
 4 #
 5 
 6 require "rubygems"
 7 require "hpricot"
 8 
 9 module TextHelper
10 
11   # Like the Rails _truncate_ helper but doesn't break HTML tags or entities.
12   def truncate_html(text, max_length = 30, ellipsis = "...")
13     return if text.nil?
14     doc = Hpricot(text.to_s)
15     doc.inner_text.chars.length > max_length ? doc.truncate(max_length, ellipsis).inner_html : text.to_s
16   end
17 
18   def self.truncate_at_space(text, max_length, ellipsis = '...')
19     l = [max_length - ellipsis.length, 0].max
20     stop = text.rindex(' ', l) || 0
21     (text.length > max_length ? text[0...stop] + ellipsis : text).to_s
22   end
23 end
24 
25 module HpricotTruncator
26   module NodeWithChildren
27     def truncate(max_length, ellipsis)
28       return self if inner_text.chars.length <= max_length
29       truncated_node = dup
30       truncated_node.name = name
31       truncated_node.raw_attributes = raw_attributes
32       truncated_node.children = []
33       each_child do |node|
34         break if max_length <= 0
35         node_length = node.inner_text.chars.length
36         truncated_node.children << node.truncate(max_length, ellipsis)
37         max_length = max_length - node_length
38       end
39       truncated_node
40     end
41   end
42 
43   module TextNode
44     def truncate(max_length, ellipsis)
45       self.content = TextHelper.truncate_at_space(content, max_length, ellipsis)
46       self
47     end
48   end
49 
50   module IgnoredTag
51     def truncate(max_length, ellipsis)
52       self
53     end
54   end
55 end
56 
57 Hpricot::Doc.send(:include,       HpricotTruncator::NodeWithChildren)
58 Hpricot::Elem.send(:include,      HpricotTruncator::NodeWithChildren)
59 Hpricot::Text.send(:include,      HpricotTruncator::TextNode)
60 Hpricot::BogusETag.send(:include, HpricotTruncator::IgnoredTag)
61 Hpricot::Comment.send(:include,   HpricotTruncator::IgnoredTag)

1 For example: preventing XSS attacks, maintaining coherent styling.

blog comments powered by Disqus

Thanks to Tom Preston-Werner for the CSS layout, Webby for the blog renderer, and GitHub Pages for the blog hosting.