Basics of Powershell Excel & JSON Parsing

Author by Leo Ling

The Powershell scripting language is designed to help system administrators automate tasks and manage OSs: Windows is naturally supported, but so is Linux and Mac with the newest versions. It allows you to access data stored across the file system, including Excel and JSON files. Thanks to the integration on the Microsoft side, we do not need to download additional modules or files before working with these.

For those unfamiliar with Powershell, it plays pretty loose with it's objects, allowing you to mix and match its members without rigorous definition beforehand like in Java or Python. This easily allows for custom objects as outputs. In Powershell, object are made up of members: these member are functions or data structure associated with the objects.

2019-07-26-09_12_39-Clipboard.png

Excel

We begin parsing Excel files by calling a custom object:

$Excel = new-object -comobject excel.application

Using this excel object, we can open a specific Excel workbook for parsing by passing the path string as an argument.

$Workbook = $Excel.Workbooks.Open(...)

Since excel files are generally made up of multiple sheets, we are require to select a specific sheet before individual cells can be parsed. Sheets can be selected either by index or by name. Then, a sheet should be marked as the active worksheet. Marking a worksheet as active allows for the range property to be used without referencing a specific sheet beforehand.

$worksheet = $Workbook.Worksheets.Item(...)

$worksheet.activate()

At this point we are ready to directly with the sheet, I have included some helpful parsing functions below. The range property allows the specification of one or more cells. This allows for the value of multiple cells to be set simultaneously using familiar Excel syntax. This is fairly powerful, but it can difficult to directly manipulate individual cells within a range. Thus, for more custom applications, loops can still be a valuable tool.

Some useful Excel Powershell Functions

Select range of cells  $worksheet.range("A1:B30")
Select entire column  $worksheet.range("A1").EntireColumn
Select entire row $worksheet.range("A1").EntireRow
Copy range (Will return true if successful) $range.Copy()
Paste Range (Requires new range selection after copy) $worksheet.Paste($range)
Access Range Values $range.Value2
Access Individual cell $worksheet.Cells(1,1).Value2
Find last used row (Can be used to find number of rows) $worksheet.UsedRange.rows.count
Find last used column $worksheet.UsedRange.columns.count
Save Workbook $workbook.Save()
Quit Excel $Excel.Quit()

One of the stranger parts of this syntax is the Value2 used to access cell values. This is due to the each cell or range being a custom object, thus the Value2 is a member of the cell or range object. 

For my purposes, I exported the data from these excel sheets into a powershell custom object. This allows for later flexibility for what we want to do with this data. Generally, adding members to custom objects is trivial. First the object is created, like I mentioned before no properties need to be declared ahead of time...

$newObj = New-Object -TypeName psobject

Then, we can add members to the object using the following function.

$newObj | Add-Member -MemberType NoteProperty -Name ... -Value

JSON

For the last section, I concluded on constructing custom objects in powershell for Excel files. When parsing JSON using Powershell, the opposite happens. By default, the ConvertFrom-JSON function in Powershell converts the contents of a JSON file into a custom object. Each field in the JSON file is converted to a custom object property. Due to this largely one to one mapping, we are also able to convert text based custom objects to JSON. 

Get-Content -Raw -Path $JSONFile | ConvertFrom-Json

From here onwards, parsing will depend on the specifics of the JSON file. If the contents of the file is predictable, you could directly reference properties and subproperties based on known field names. For more complex, nested JSON fields, it is easier to assign variables to sections of the JSON object rather than using the path from the base object each time.

At this point it helps to take advantage of the custom object's inheritance from the PsObject class. One of the disadvantage of working with a custom object  is that it is not obvious how to get the name of a property. Take for example that you want to always store the first field in your JSON file, regardless of what it's named. Trying to index the JSON object will yield not yield any individual properties, since the object is not a list. Instead...

$JSONObj.PsObject.Properties.Name

This will yield a list of the names of the properties, or in this case, the name of the fields. Alternatively, a list of the Property values can also be generated using...

$JSONObj.PsObject.Properties.Value

One final Powershell feature I wold like to share in this guide is the Here-String. Here-Strings allow for easier formatting of text block while supporting string interpolation. In the example below, the spacing will be preserved between the first and the second line and the current date will be printed on the second line. This allows you to generate simple repeated code blocks or other block for similar text. For those of us using text editors, Powershell will complain about indents with here-strings. The solution is to leave no indents to the left of each line, in particular the opening and closing symbols @" and "@.

@"

  Today is...

  $(Get-Date) 

"@