Performance – John's Random Thoughts and Discussions

None of my Python books, including Algorithms for Dummies, 2nd Edition, Beginning Programming with Python For Dummies, 3rd Edition, Machine Learning for Dummies, 2nd Edition, Machine Learning Security Principles, and Python for Data Science for Dummies, show how to compile a Python program. This is because the interpreted nature of Python makes it easier to work with scripts for these reasons:

The interpreter provides instant results to make learning faster.
It’s easier and faster to fix errors.
The use of notebooks, as is found in all of the books, makes creating output easier.
The use of literate programming techniques helps create an environment where acquired knowledge is more likely to remain acquired.
Using literate programming techniques also makes it possible to document the code in a manner that’s more like reading a textbook than looking at source code.
The use of scripts promotes experimentation, which leads to new ideas and techniques.

These are all great reasons to use scripts in books. In fact, I’m sure that many people will have other reasons to use scripts. The one thing you should note is that Python does automatically compile some files to do things like reduce loading time. Anytime you see a .pyc file, the file has been compiled by Python to bytecode through various means, including importing the script. It’s also possible to pre-compile a script using the python interpreter’s -m command line switch. The resulting output appears in the __pycache__ folder with a .pyc extension. You can further modify the compilation process by using the -o and -oo command line switches, which offer various optimizations to make the code load even faster. The problems with these outputs is that they’re only mildly obfuscated, so if your intent is to hide your code from prying eyes, this isn’t the best option.

Another built-in compilation option is to use the compile() function, which performs a compilation directly in your code. The purpose of using this function is to speed up code that is used often within your application. For example, you might use it to compile code that appears within a loop. Obviously, you get no obfuscation advantage using this approach, but you do get a speed advantage. If you don’t want to go through the bother of using the compile() function, you could always use a third party product like Numba, which reduces the task to one of adding a decorator to your code.

None of the solutions discussed so far do anything more than turn your Python script into bytecode, which is still interpreted (albeit, much faster than using a human language script). There is also an option for turning your Python code into actual machine code through various intermediate steps. A Python compiler usually turns your Python script into an intermediate language, which is then compiled into actual machine code that is native to the host platform. However, it may simply run your script online, so you need to know in advance whether you’ll end up with an executable file in the end. An executable file can offer these advantages:

The source code is fully obfuscated, protecting your development investment.
The code runs significantly faster than any other means of interacting with it.
Instead of a host of script files, you usually end up with just a few executable files, perhaps even just one.
Because it’s harder to modify, an executable file can be more secure and reliable than using scripts.

If your goal is to exclusively create an executable output, then a product like auto-py-to-exe might be your best option. This way you get to use your interpreter of choice to develop the application, then use another product to turn the result into an .exe file. The idea is to get the best of both worlds. The point of all this is that you don’t strictly have to interact with Python code in one way, using an interpreter. You have a great many options at your disposal. Let me know your thoughts about working with compiled Python code at [email protected].

Every once in a while, someone will send me a truly interesting link. Having seen a few innovations myself and possessing a strong interest in history, I read the CPU DB: Recording Microprocessor History on the Association for Computing Machinery (ACM) site with great interest. The post is a bit long, but essentially, the work by Andrew Danowitz, Kyle Kelley, James Mao, John P. Stevenson, and Mark Horowitz does something that no other site does, it provides you with a comprehensive view of 790 different microprocessors created since the introduction of Intel’s 4004 in November 1971. The CPU DB is available for anyone to use and should prove useful for scientist, developer, and hobbyist alike.

Unlike a lot of the work done on microprocessors, this one hasn’t been commissioned by a particular company. In fact, you’ll find processors from 17 different vendors. The work also spans a considerable number of disciplines. For example, you can discover how the physical scaling of devices has changed over the years and the effects of software on processor design and development.

A lot of the information available in this report is also available from the vendor or a third party in some form. The problem with vendor specification sheets and third party reports is that they vary in composition, depth, and content-making any sort of comparison extremely difficult and time consuming. This database makes it possible to compare the 790 processors directly and using the same criteria. A researcher can now easily see the differences between two microprocessors, making it considerably easier to draw conclusions about microprocessor design and implementation.

Not surprisingly, it has taken a while to collect this sort of information at the depth provided. According to the site, this database has been a work in progress for 30 years now. That’s a long time to research anything, especially something as esoteric as the voltage and frequency ranges of microprocessors. The authors stated their efforts were hampered in some cases by the age of the devices and the unavailability of samples for testing. I would imagine that trying to find a usable copy of a 4004 for testing would be nearly impossible.

You’ll have to read the report to get the full scoop of everything that CPU DB provides. The information is so detailed that the authors resorted to using tables and diagrams to explain it. Let’s just say that if you can’t find the statistic you need in CPU DB, it probably doesn’t exist. In order to provide a level playing field for all of the statistics, the researchers have used standardized testing. For example, they rely on the Standard Performance Evaluation Corporation (SPEC) benchmarks to compare the processors. Tables 1 and 2 in the report provide an overview of the sorts information you’ll find in CPU DB.

This isn’t a resource I’ll use every day. However, it is a resource I plan to use when trying to make sense of performance particulars. Using the information from CPU DB should remove some of the ambiguity in trying to compare system designs and determine how they affect the software running on them. If you feel like your CPU may be overloaded, companies like Apica Systems can help with sorting that out so your website is not underperforming. Let me know what you think of CPU DB at [email protected].

S	M	T	W	T	F	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Tag: Performance

Compiling Python

A History of Microprocessors