Extract Formatted Text From Excel Cell With C# (Rich Text Format)

I was writing an application that needed to convert text in a cell in an Excel workbook to HTML. It is fairly trivial to get formatting for the entire cell, but each individual character in the cell could have different formatting itself, so I needed something more specific than cell-level formatting info.

At first, I started using the Excel.Range.get_Characters( pos, len ) method to get info out of the cell.  The code would loop through all characters, get them one by one, and check the formatting.  For example:

Microsoft.Office.Interop.Excel.Range Range = (Microsoft.Office.Interop.Excel.Range)Cell;
int TextLength = Range.Text.ToString().Length;
for (int CharCount = 1; CharCount <= TextLength; CharCount++)
{
    Microsoft.Office.Interop.Excel.Characters charToTest = Range.get_Characters(CharCount, 1);
    bool IsBold = (bool)charToTest.Font.Bold;
    bool IsItalic = (bool)charToTest.Font.Italic;
    // other formatting tests here
}

However, that method proved to be incredibly slow for cells that have more than just a few characters.  For cells that have 1000+ characters, it would take several minutes to run the test across all characters. I kept playing around with different ways to speed up the whole process, but it just became apparent that making the call to Excel to get all of this information was not going to be acceptable.

Finally, I think I’ve found the solution. It is possible to copy the text from a cell to the clipboard, and then use the Clipboard class to retrieve the formatted text, and parse it with C#. I ended up using the System.Windows.DataFormats.Rtf format to extract the data from the clipboard in the following way:

string rtf = (string)System.Windows.Clipboard.GetData(System.Windows.DataFormats.Rtf);

Then, I create a System.Windows.Forms.RichTextBox, and use that to parse the data. The following is a sample of the solution, and it is reasonably quick.

Microsoft.Office.Interop.Excel.Range Range = (Microsoft.Office.Interop.Excel.Range)Cell;
Range.Copy(System.Reflection.Missing.Value);
            
string rtf = (string)System.Windows.Clipboard.GetData(System.Windows.DataFormats.Rtf);
System.Windows.Forms.RichTextBox rtb = new System.Windows.Forms.RichTextBox();
rtb.Rtf = rtf;
            
int CharCount = rtb.Text.Length;
 
for (int CharNum = 0; CharNum < CharCount; CharNum++)
{
   rtb.Select(CharNum, 1);
   System.Drawing.Font Font = rtb.SelectionFont;
   bool IsCharBold = Font.Bold;
   bool IsCharUnderline = Font.Underline;
   bool IsCharItalic = Font.Italic;

   // other code here

I was also asked about getting the color in the comments. To get the color, you can use:

System.Drawing.Color color = rtb.SelectionColor;

There are also other properties of rtb dealing with selection, such as SelectionAlignment, SelectionBackColor, etc. See the RichTextBox class for more info.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.