JSON output from DF

So I'm adding more capabilities to my sysinfo.py program. The next thing that I want to do is get a JSON result from df. This is a function whose description, from the man page, says "report file system disk space usage".

Here is a sample of the output of df for one of my systems:


Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/flapjack-root 959088096 3802732 906566516 1% /
udev 1011376 4 1011372 1% /dev
tmpfs 204092 288 203804 1% /run
none 5120 0 5120 0% /run/lock
none 1020452 0 1020452 0% /run/shm
/dev/sda1 233191 50734 170016 23% /boot


So I started by writing a little Python program that used the subprocess.check_output() method to capture the output of df.

This went through various iterations and ended up with this single line of python code, which requires eleven lines of comments to explain it:


#
# this next line of code is pretty tense ... let me explain what
# it does:
# subprocess.check_output(["df"]) runs the df command and returns
# the output as a string
# rstrip() trims of the last whitespace character, which is a 'n'
# split('n') breaks the string at the newline characters ... the
# result is an array of strings
# the list comprehension then applies shlex.split() to each string,
# breaking each into tokens
# when we're done, we have a two-dimensional array with rows of
# tokens and we're ready to make objects out of them
#
df_array = [shlex.split(x) for x in
subprocess.check_output(["df"]).rstrip().split('n')]


My original df.py code constructed the JSON result manually, a painfully finicky process. After I got it running I remembered a lesson I learned from my dear friend the late David Nochlin, namely that I should construct an object and then use a rendering library to create the JSON serialization.

So I did some digging around and discovered that the Python json library includes a fairly sensible serialization method that supports prettyprinting of the result. The result was a much cleaner piece of code:


# df.py
#
# parse the output of df and create JSON objects for each filesystem.
#
# $Id: df.py,v 1.5 2014/09/03 00:41:31 marc Exp $
#

# now let's parse the output of df to get filesystem information
#
# Filesystem 1K-blocks Used Available Use% Mounted on
# /dev/mapper/flapjack-root 959088096 3799548 906569700 1% /
# udev 1011376 4 1011372 1% /dev
# tmpfs 204092 288 203804 1% /run
# none 5120 0 5120 0% /run/lock
# none 1020452 0 1020452 0% /run/shm
# /dev/sda1 233191 50734 170016 23% /boot

import subprocess
import shlex
import json

def main():
"""Main routine - call the df utility and return a json structure."""

# this next line of code is pretty tense ... let me explain what
# it does:
# subprocess.check_output(["df"]) runs the df command and returns
# the output as a string
# rstrip() trims of the last whitespace character, which is a 'n'
# split('n') breaks the string at the newline characters ... the
# result is an array of strings
# the list comprehension then applies shlex.split() to each string,
# breaking each into tokens
# when we're done, we have a two-dimensional array with rows of
# tokens and we're ready to make objects out of them
df_array = [shlex.split(x) for x in
subprocess.check_output(["df"]).rstrip().split('n')]
df_num_lines = df_array[:].__len__()

df_json = {}
df_json["filesystems"] = []
for row in range(1, df_num_lines):
df_json["filesystems"].append(df_to_json(df_array[row]))
print json.dumps(df_json, sort_keys=True, indent=2)
return

def df_to_json(tokenList):
"""Take a list of tokens from df and return a python object."""
# If df's ouput format changes, we'll be in trouble, of course.
# the 0 token is the name of the filesystem
# the 1 token is the size of the filesystem in 1K blocks
# the 2 token is the amount used of the filesystem
# the 5 token is the mount point
result = {}
fsName = tokenList[0]
fsSize = tokenList[1]
fsUsed = tokenList[2]
fsMountPoint = tokenList[5]
result["filesystem"] = {}
result["filesystem"]["name"] = fsName
result["filesystem"]["size"] = fsSize
result["filesystem"]["used"] = fsUsed
result["filesystem"]["mount_point"] = fsMountPoint
return result

if __name__ == '__main__':
main()


which, in turn, produces a rather nice df output in JSON.


{
"filesystems": [
{
"filesystem": {
"mount_point": "/",
"name": "/dev/mapper/flapjack-root",
"size": "959088096",
"used": "3802632"
}
},
{
"filesystem": {
"mount_point": "/dev",
"name": "udev",
"size": "1011376",
"used": "4"
}
},
{
"filesystem": {
"mount_point": "/run",
"name": "tmpfs",
"size": "204092",
"used": "288"
}
},
{
"filesystem": {
"mount_point": "/run/lock",
"name": "none",
"size": "5120",
"used": "0"
}
},
{
"filesystem": {
"mount_point": "/run/shm",
"name": "none",
"size": "1020452",
"used": "0"
}
},
{
"filesystem": {
"mount_point": "/boot",
"name": "/dev/sda1",
"size": "233191",
"used": "50734"
}
}
]
}


Quite a lot of fun, really.

Comments

Post a Comment

Popular posts from this blog

Quora Greatest Hits - What are common stages that PhD student researchers go through with their thesis project?

Toy Data Center update

Two Intel NUC servers running Ubuntu