- October 6, 2016
- Posted by: Nagi Bombhore
- Category: Thought Leadership
Last time, I shared what role SQL plays in automation (You can read it here). This time around, let me discuss on what Shell brings to the table in terms of automation for analytics.
In the healthcare sector, there are quite a few specialists who cater to a particular kind of ailment. And though they are extremely important when we have that particular concern, the general physician is the most common doctor people go to. The reason for this is fairly simple, they help solve all the minor problems and, if the concern needs specialized service they can refer you one. In a nutshell, ‘Shell’ is the General Physician for automation concerns.
Since Shell is a basic language that allows you to do a lot of stuff you would do manually on a system, it can cater to a plethora of minor concerns. And in case you need specialized service, you can use Shell to run specialized programs and tie everything together. It would be impossible to exhaustively list the use cases of Shell; depending on your need, you can find a custom solution. I will try to list a few things that I have done, where Shell came in handy.
- Data transfer:
Shell can be used easily to copy files from one location to another. This transfer can happen within the local machine, between two machines connected via intranet/internet, between a machine and server or between two servers. e.g. ‘cp’, ‘scp’, ‘mv’ commands with various settings can be used in a basic setup
- Data Extraction:
Shell can be used to run APIs, get data from another machine/server, an email attachment, etc. Due to the environment, Shell may not always be able to do it by itself. In such cases we have other options, such as packages that help us do that, other programs that Shell can run, which extract the data for us, etc. e.g. In Cygwin you can use ‘curl’, once it’s installed, to run an API which is in the URL format. Shell is used to initiate the Cygwin environment and Cygwin takes it from there.
- Reading data:
In case the data needs to be read, Shell is pretty handy. It can read various kinds of text files and other data formats. e.g. ‘cat’ command can read text files and you can push the content in a variable.
- Cleaning data:
There are times when you need to filter out rows and/or columns from the data. At other times, you may need to remove or replace specific characters. There may be blank rows, or special characters that may need to be weeded out, before the data can be used. ‘awk’, ‘head’, ‘tail, etc. are commands that can be used to easily achieve these things.
- Modifying data:
Sometime some fields may not be in the format that you need them to be in. e.g. Date field can be in the MM-DD-YY format, whereas you need it in the YYYY-MM-DD format. You can use Shell to modify date field by using various inbuilt date functions. You can also pick up the data as text and modify it using text functions. Even in situations where the data needs to be transposed, a small Shell program can be written to achieve that.
- Iterative processes:
If a process needs to run multiple times, e.g. if you need to go over multiple files and perform the same action on each; you can do that by using various looping methods in Shell. Shell has various types of loops you can run based on your particular need, such as ‘while’, ‘for’, ‘until’, ‘select’.
- Writing data:
The data, once you have it in your desired form, may need to be written to a file for you to use later on. In other cases you may need to push the data to an SQL table. In either of the cases, Shell can be used to perform the required action. Whereas writing data to a CSV file can be quite simple, you may need to install command line packages of the SQL tool to push data directly to an SQL table.
- Running SQL commands in command line:
Like in the previous step, you may need to install the command line packages of your SQL tool, but you can set them up to work from your Shell interface. Once done, you can not only upload or delete data tables, but can run complicated SQL commands to join/modify/clean data as per your needs.
- Change data formats:
A particular concern we faced was changing the excel data to a CSV format. Since excel files are not text files, the data can’t be read directly from them. However, a quick solution that one of my colleagues found was using the command package ‘xlsx2csv’ for Cygwin. All I had to give it was the name of the excel file, and the name of the output CSV. The reason I use this example is because there is so much work already done in Shell that, more often than not, we don’t need to reinvent the wheel, and can quite often build on top of what others have already achieved.
- Mathematical modeling:
In case we need to run complex mathematical models on the data, we can use R to run them. R can be easily installed and run from Shell. Once there, you can run various functions available in R to run your mathematical models.
- Act as aggregator for different modules of your automation:
I would take an example here. In one particular instance, I had some data coming in my email attachment, and I used Power-Shell to extract it, another data source required a Python program to extract data, while a third dataset was in an excel file on a remote machine. Shell acted as the pipeline between various modules that were built for specific tasks. You can easily run Python, Batch, SQL, VBScript and various other programs from Shell.
If used correctly, Shell can act as the backbone to your automation, and every other module can be connected to it. There can be many more applications of Shell in automation that I have not listed here. I would urge you to share your experience of Shell use cases in automation, and what other tools technologies you have found helpful in automation. If you would like to know more about how Shell can help in automation in your process setup, reach out to us.