JSON output from DF
So I'm adding more capabilities to my sysinfo.py program. The next thing that I want to do is get a JSON result from
Here is a sample of the output of df for one of my systems:
So I started by writing a little Python program that used the
This went through various iterations and ended up with this single line of python code, which requires eleven lines of comments to explain it:
My original
So I did some digging around and discovered that the Python
which, in turn, produces a rather nice df output in JSON.
Quite a lot of fun, really.
df. This is a function whose description, from the man page, says "report file system disk space usage".Here is a sample of the output of df for one of my systems:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/flapjack-root 959088096 3802732 906566516 1% /
udev 1011376 4 1011372 1% /dev
tmpfs 204092 288 203804 1% /run
none 5120 0 5120 0% /run/lock
none 1020452 0 1020452 0% /run/shm
/dev/sda1 233191 50734 170016 23% /boot
So I started by writing a little Python program that used the
subprocess.check_output() method to capture the output of df.This went through various iterations and ended up with this single line of python code, which requires eleven lines of comments to explain it:
#
# this next line of code is pretty tense ... let me explain what
# it does:
# subprocess.check_output(["df"]) runs the df command and returns
# the output as a string
# rstrip() trims of the last whitespace character, which is a 'n'
# split('n') breaks the string at the newline characters ... the
# result is an array of strings
# the list comprehension then applies shlex.split() to each string,
# breaking each into tokens
# when we're done, we have a two-dimensional array with rows of
# tokens and we're ready to make objects out of them
#
df_array = [shlex.split(x) for x in
subprocess.check_output(["df"]).rstrip().split('n')]
My original
df.py code constructed the JSON result manually, a painfully finicky process. After I got it running I remembered a lesson I learned from my dear friend the late David Nochlin, namely that I should construct an object and then use a rendering library to create the JSON serialization.So I did some digging around and discovered that the Python
json library includes a fairly sensible serialization method that supports prettyprinting of the result. The result was a much cleaner piece of code:
# df.py
#
# parse the output of df and create JSON objects for each filesystem.
#
# $Id: df.py,v 1.5 2014/09/03 00:41:31 marc Exp $
#
# now let's parse the output of df to get filesystem information
#
# Filesystem 1K-blocks Used Available Use% Mounted on
# /dev/mapper/flapjack-root 959088096 3799548 906569700 1% /
# udev 1011376 4 1011372 1% /dev
# tmpfs 204092 288 203804 1% /run
# none 5120 0 5120 0% /run/lock
# none 1020452 0 1020452 0% /run/shm
# /dev/sda1 233191 50734 170016 23% /boot
import subprocess
import shlex
import json
def main():
"""Main routine - call the df utility and return a json structure."""
# this next line of code is pretty tense ... let me explain what
# it does:
# subprocess.check_output(["df"]) runs the df command and returns
# the output as a string
# rstrip() trims of the last whitespace character, which is a 'n'
# split('n') breaks the string at the newline characters ... the
# result is an array of strings
# the list comprehension then applies shlex.split() to each string,
# breaking each into tokens
# when we're done, we have a two-dimensional array with rows of
# tokens and we're ready to make objects out of them
df_array = [shlex.split(x) for x in
subprocess.check_output(["df"]).rstrip().split('n')]
df_num_lines = df_array[:].__len__()
df_json = {}
df_json["filesystems"] = []
for row in range(1, df_num_lines):
df_json["filesystems"].append(df_to_json(df_array[row]))
print json.dumps(df_json, sort_keys=True, indent=2)
return
def df_to_json(tokenList):
"""Take a list of tokens from df and return a python object."""
# If df's ouput format changes, we'll be in trouble, of course.
# the 0 token is the name of the filesystem
# the 1 token is the size of the filesystem in 1K blocks
# the 2 token is the amount used of the filesystem
# the 5 token is the mount point
result = {}
fsName = tokenList[0]
fsSize = tokenList[1]
fsUsed = tokenList[2]
fsMountPoint = tokenList[5]
result["filesystem"] = {}
result["filesystem"]["name"] = fsName
result["filesystem"]["size"] = fsSize
result["filesystem"]["used"] = fsUsed
result["filesystem"]["mount_point"] = fsMountPoint
return result
if __name__ == '__main__':
main()
which, in turn, produces a rather nice df output in JSON.
{
"filesystems": [
{
"filesystem": {
"mount_point": "/",
"name": "/dev/mapper/flapjack-root",
"size": "959088096",
"used": "3802632"
}
},
{
"filesystem": {
"mount_point": "/dev",
"name": "udev",
"size": "1011376",
"used": "4"
}
},
{
"filesystem": {
"mount_point": "/run",
"name": "tmpfs",
"size": "204092",
"used": "288"
}
},
{
"filesystem": {
"mount_point": "/run/lock",
"name": "none",
"size": "5120",
"used": "0"
}
},
{
"filesystem": {
"mount_point": "/run/shm",
"name": "none",
"size": "1020452",
"used": "0"
}
},
{
"filesystem": {
"mount_point": "/boot",
"name": "/dev/sda1",
"size": "233191",
"used": "50734"
}
}
]
}
Quite a lot of fun, really.
/proc/filesystems seems easier to parse.
ReplyDelete