Tuesday 20 July 2010

Use SSIS Script component to pull data from MS Office Word File

Well some time we face some requirements which really frustrate you..
Same scenario I got end up few days back when I have to pull data from a Word Files.
Yup our Microsoft Office Word Files. I have no choice to make it work so I have started looking into it,
Finally I was manage to write a Script task which can pull data from MS Word.
So here we are.. Following is script to pull data from MS Office Word Files…

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using System.Collections.Generic;
using System.Windows.Forms;
using Microsoft.Office.Interop;
using System.Reflection;
using Office = Microsoft.Office.Core;
using Microsoft.Office.Interop.Word;
using Microsoft.VisualStudio.Tools.Applications.Runtime;
using Excel = Microsoft.Office.Tools.Excel;
using Word = Microsoft.Office.Interop.Word;
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
public override void PreExecute()
{
base.PreExecute();
}
public override void PostExecute()
{
base.PostExecute();
}
public override void CreateNewOutputRows()
{
Microsoft.Office.Interop.Word.ApplicationClass wordApp = new Microsoft.Office.Interop.Word.ApplicationClass();
object file = "D:\\AESMetricsforMarch2010.doc"; // Specify path for word file
object nullobj = System.Reflection.Missing.Value;
Microsoft.Office.Interop.Word.Document doc = wordApp.Documents.Open(ref file, ref nullobj, ref nullobj,
ref nullobj, ref nullobj, ref nullobj,
ref nullobj, ref nullobj, ref nullobj,
ref nullobj, ref nullobj, ref nullobj,
ref nullobj, ref nullobj, ref nullobj, ref nullobj);
int count = 0;
int i = 1;
foreach (Microsoft.Office.Interop.Word.Table table in doc.Tables)
{
Microsoft.Office.Interop.Word.Table firstTable = doc.Tables[i];
foreach (Microsoft.Office.Interop.Word.Row row in firstTable.Rows)
{
List<string> cellValues = new List<string>();
foreach (Microsoft.Office.Interop.Word.Cell cell in row.Cells)
{
string cellContents = cell.Range.Text;
// add the cell contents to the array, but remove the strange termination character on the end of the data
cellValues.Add(cellContents.Remove(cellContents.Length - 1));
}
// the first row is the column header
if (count == 0)
{
// form.AddHeader(cellValues);
}
else
{
Output0Buffer.AddRow();
Output0Buffer.Column = cellValues[0].ToString();
Output0Buffer.Column1 = cellValues[1].ToString();
Output0Buffer.Column2 = cellValues[2].ToString();
Output0Buffer.Column3 = cellValues[3].ToString();
Output0Buffer.Column4 = cellValues[4].ToString();
}
count++;
}
i++;
}
}
}
Please note that you have to include some of the office assembly in your SSIS Script Component reference.

1 comment:

  1. Hi, thanks for the code. Can you please confirm,
    do we need to install MS office on server where we run this script?
    after adding Assembly references to Microsoft.Office.Interop.Word still throwing some COM class missing error.
    Thanks,
    Deepika.

    ReplyDelete