We recently needed to show a truncated version of existing HTML content. Although there are several issues1 when dealing with HTML content, our specific concern was maintaining the integrity of the HTML. Some quick googling led to a nice helper written by Henrik Nyh last year. We tweaked the original a bit to append the ellipsis within the tag at the truncation point and truncate at a word (or tag) boundary. Here it is, enjoy.

 1# By Henrik Nyh <http://henrik.nyh.se> 2008-01-30.
 2# Free to modify and redistribute with credit.
 3# Word truncation and fixes by Les Hill <http://blog.leshill.org> 2009-06-02
 4#
 5
 6require "rubygems"
 7require "hpricot"
 8
 9module TextHelper
10
11  # Like the Rails _truncate_ helper but doesn't break HTML tags or entities.
12  def truncate_html(text, max_length = 30, ellipsis = "...")
13    return if text.nil?
14    doc = Hpricot(text.to_s)
15    doc.inner_text.chars.length > max_length ? doc.truncate(max_length, ellipsis).inner_html : text.to_s
16  end
17
18  def self.truncate_at_space(text, max_length, ellipsis = '...')
19    l = [max_length - ellipsis.length, 0].max
20    stop = text.rindex(' ', l) || 0
21    (text.length > max_length ? text[0...stop] + ellipsis : text).to_s
22  end
23end
24
25module HpricotTruncator
26  module NodeWithChildren
27    def truncate(max_length, ellipsis)
28      return self if inner_text.chars.length <= max_length
29      truncated_node = dup
30      truncated_node.name = name
31      truncated_node.raw_attributes = raw_attributes
32      truncated_node.children = []
33      each_child do |node|
34        break if max_length <= 0
35        node_length = node.inner_text.chars.length
36        truncated_node.children << node.truncate(max_length, ellipsis)
37        max_length = max_length - node_length
38      end
39      truncated_node
40    end
41  end
42
43  module TextNode
44    def truncate(max_length, ellipsis)
45      self.content = TextHelper.truncate_at_space(content, max_length, ellipsis)
46      self
47    end
48  end
49
50  module IgnoredTag
51    def truncate(max_length, ellipsis)
52      self
53    end
54  end
55end
56
57Hpricot::Doc.send(:include,       HpricotTruncator::NodeWithChildren)
58Hpricot::Elem.send(:include,      HpricotTruncator::NodeWithChildren)
59Hpricot::Text.send(:include,      HpricotTruncator::TextNode)
60Hpricot::BogusETag.send(:include, HpricotTruncator::IgnoredTag)
61Hpricot::Comment.send(:include,   HpricotTruncator::IgnoredTag)

1 For example: preventing XSS attacks, maintaining coherent styling.

blog comments powered by Disqus