capture and stream the standard output of a process with python on linux and windows
Invoking and interacting with a program from a Python script may get complicated. In this article, I share solutions that could make your life easier. You should read till the end if you are working on code that has to run on both linux and windows.
alfred is an opensource tool that I develop. Its development confronted me with problems I had been trying to avoid for years. Handling standard and error outputs is hard. I was looking for :
- display real-time standard output and error output for a better user experience
- capture standard output and error to be able to process it at the end of execution
- display real-time standard output and error output on both windows and linux
- preserve the possibility of being able to interact with the terminal
In this article, I share answers to the first 3 problems. The last point is not complete. To go further, it requires me to explore the notion of pseudo terminal. I am looking for open source cross platform pty library that is working well on both linux and windows. If you know one, please, share it in the comments.
Handling standard outputs
In a program, I know 3 streams which serve as input and output. Standard input passes data entered by the user to a program. Standard output displays data to the user. Error output displays error data to the user.
Standard outputs are data streams. They are sequential. The parent process can only read them once. Process can access them through file descriptors. A subprocess can write to standard output and error output and it can read standard input.
import sys
sys.stdout.write("hello world")
sys.stderr.write("error message")
value = sys.stdin.read()
architecture of I/O run through popen (source)
Reading a file is a blocking operation in Python. Remember it, it will be important for the future. As long as there are no bytes and the stream remains open, the program will remain stuck on the instruction. There is no common API between Linux and Windows for non-blocking reading. It is this point which is the main problem to have a portable solution.
write a subroutine
We will work on Linux first, we will do the tests on Windows at the end of the article. A shell script will play the role of executable for the rest. You will need to assign execution rights to it with the chmod +x .suprogram
command.
./subprogram
#!/bin/sh
echo "What is your name? "
read my_name
RED='\033[0;31m'
NC='\033[0m' # No Color
echo "Hello ${RED}${my_name}${NC}"
sleep 3
echo "hello world"
run this subroutine and let the parent shell handle standard output
The simplest case is to run a program and not interact with it from our script.
import subprocess
p = subprocess.Popen(["./suprogram"], stdout=None, stderr=None)
result_code = p.wait()
print("return to the python workflow")
The subroutine display the outputs. You can perform keyboard input . But, we can’t retrieve what is on screen in our python script. If you don’t need it, this is the best solution for running a subroutine.
run a program and capture standard output
We capture the standard output and leave the error type separate. The python script receive the standard outputs. The python script print what appears on standard output. Error output will stay invisible.
import subprocess
import sys
p = subprocess.Popen(["./subprogram"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout_content = []
while True:
line = p.stdout.readline().decode("utf-8")
stdout_content.append(line)
sys.stdout.write(line)
if line == '' and p.poll() != None:
break
p.wait()
stdout_content = ''.join(stdout_content)
print("return to the python workflow")
I inspected the stdout_content
variable in the pycharm debugger. It contains the contents of the standard output.
If we want to capture standard output and error output at the same time, this pattern will fail.
readings on standard output and error output block each other
Do you remember that the read()
method is blocking? If we write to stdout, reading from stdout will be fine but reading from stderr will remain blocked. The rest of the contents of stdout will stay unvisible. The read on stderr remains blocked.
The situation may temporary evolve when the sub program writes data on stderr. But, as soon as the buffer is empty, the program will stay stuck et on the read of stderr again.
p = subprocess.Popen(["./subprogram"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
while True:
stdout_line = p.stdout.readline().decode("utf-8")
stderr_line = p.stderr.readline().decode("utf-8") # BLOCKING
# ...
if line == '' and p.poll() != None:
break
Python does not offer a mechanism for non-blocking reading on both Linux and Windows. I’ll let you discover for yourself the solution using select
that works on Linux.
capture both standard output and error output and display them in real time
The threads will allow us to capture and display the 2 flows in real time. I encapsulated the logic in the capture_output
class. The code will be easier to understand.
p = subprocess.Popen(["./subprogram"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout_capture = capture_output(p, capture_stream=p.stdout, output_stream=sys.stdout)
stderr_capture = capture_output(p, capture_stream=p.stderr, output_stream=sys.stderr)
p.wait()
stdout = stdout_capture.output()
stderr = stderr_capture.output()
print("get back control in main flow")
The capture_output
class starts a thread. It will read an output specified in capture_stream
. It will rewrite this output to another output specified with output_stream
. The thread stops when the subroutine ends.
class capture_output:
def __init__(self, process, capture_stream: Optional[IO] = None, output_stream: Optional[IO] = None):
self.capture_logs = []
self.subprocess = process
self.capture_stream = capture_stream
self.output_stream = output_stream
self.thread = Thread(target=self._run_capture)
self.thread.start()
def _run_capture(self):
if self.capture_stream is not None:
while True:
line = self.capture_stream.readline().decode('utf-8')
if line != '':
self.output_stream.write(line)
self.output_stream.flush()
self.capture_logs.append(line)
if self.subprocess.poll() is not None:
break
def output(self):
self.thread.join()
return '\n'.join(self.capture_logs)
Here is the demo of the script.
This method works on Linux. It’s time to test it on Windows.
modify the subroutine for windows
We will use a batch script instead of a shell script. I want to run this code in a CMD
terminal. I could have used a shell port like git bash
. It seemed like cheating to me.
./subprogram.bat
set /p name=What is your name ?
echo Hello %name%
sleep 3
echo Hello World !
Instead of calling .subprogram.sh
, we call subprogram.bat
. The rest of the code is identical.
p = subprocess.Popen(["subprogram.bat"], stdout=subprocess.PIPE, stderr=subprocess.PIPE, cwd=os.getcwd())
stdout_capture = capture_output(p, p.stdout, sys.stdout)
stderr_capture = capture_output(p, p.stderr, sys.stderr)
p.wait()
stdout = stdout_capture.output()
stderr = stderr_capture.output()
print("get back control in main flow")
Here is the demo. It’s exactly the same as on Linux.
To conclude
You’ve reached the end. If you have any remaining questions, come share them in the comments. I will enjoy reading and responding to them.
This solution opens up new possibilities for alfred
. Capturing output is easier to process in build scripts. It will be possible to run a python server and a nodejs server in a single command. It will show both outputs in the same terminal.
I still need to improve this solution. Interactive programs, those that have color-rich prompts, run in gradients through Popen. I want to address this point but the challenge is on another level. I have the impression that it requires taking an interest in pseudo terminals (pty).
Are you looking for an alternative to make and bash scripts to automate actions in your project ? Come try alfred. The tool might surprise you.command.