Friday, August 11, 2017

Summer of code 2017: Python, Day 55: Python Jargon


As explained in my Summer of code 2017: Python post I decided to pick up Python

This is officially day 55. Today I decided to look at some Python Jargon

Here are some of the more interesting ones


Dunder
Dunder (Double Underscore) is a way to pronounce names of special methods and attributes, __len__ is pronouced dunder len, __init__ is pronounced dunder init

BDFL
Benevolent Dictator For Life, a.k.a. Guido van Rossum, Python’s creator.

CPython
The canonical implementation of the Python programming language, as distributed on python.org. The term “CPython” is used when necessary to distinguish this implementation from others such as Jython or IronPython.


EAFP
Easier to ask for forgiveness than permission. This common Python coding style assumes the existence of valid keys or attributes and catches exceptions if the assumption proves false. This clean and fast style is characterized by the presence of many try and except statements. The technique contrasts with the LBYL style common to many other languages such as C.

LBYL
Look before you leap. This coding style explicitly tests for pre-conditions before making calls or lookups. This style contrasts with the EAFP approach and is characterized by the presence of many if statements.

f-string
String literals prefixed with 'f' or 'F' are commonly called “f-strings” which is short for formatted string literals. See also PEP 498.


__future__
A pseudo-module which programmers can use to enable new language features which are not compatible with the current interpreter.

By importing the __future__ module and evaluating its variables, you can see when a new feature was first added to the language and when it becomes the default


Python 3000
Nickname for the Python 3.x release line (coined long ago when the release of version 3 was something in the distant future.) This is also abbreviated “Py3k”.

Pythonic
An idea or piece of code which closely follows the most common idioms of the Python language, rather than implementing code using concepts common to other languages. For example, a common idiom in Python is to loop over all elements of an iterable using a for statement. Many other languages don’t have this type of construct, so people unfamiliar with Python sometimes use a numerical counter instead:

for i in range(len(food)):
    print(food[i])

As opposed to the cleaner, Pythonic method:

for piece in food:
    print(piece)




Friday, August 4, 2017

Summer of code 2017: Python, Day 48 Rounding numbers


As explained in my Summer of code 2017: Python post I decided to pick up Python

This is officially day 48. Today I was playing around with numeric data types. I found some interesting side effects/features with the round function in Python

For example if you round 2.675 to 2 digits in SQL Server, you get back 2.68




Now let's do that in Python shall we......

>>> round(2.675,2)
2.67

Ugh what? The issue in Python is that 2.675 is stored as a float and it can't be represented exactly in binary

Here is some more fun stuff

>>> round(0.5)
0
>>> round(1.5)
2
>>> round(2.5)
2
>>> round(3.5)
4
>>> round(4.5)
4
>>> round(5.5)
6


WTF is going on here? Here is what the documentation has to say about round()

Return number rounded to ndigits precision after the decimal point. If ndigits is omitted or is None, it returns the nearest integer to its input.
For the built-in types supporting round(), values are rounded to the closest multiple of 10 to the power minus ndigits; if two multiples are equally close, rounding is done toward the even choice (so, for example, both round(0.5) and round(-0.5) are 0, and round(1.5) is 2). Any integer value is valid for ndigits (positive, zero, or negative). The return value is an integer if called with one argument, otherwise of the same type as number.

So there you have it, Python rounds to the closest even number if two multiples are equally close


Monday, July 31, 2017

Summer of code 2017: Python, Day 44 Frequency of words in the book Moby Dick


As explained in my Summer of code 2017: Python post I decided to pick up Python

This is officially day 44. Today I wanted to see if I could get a python script to run and return me all the words and their occurrences in the book Moby Dick

Some interesting things you might want to know:

How many time is Moby used in the book?
How many distinct words in total?
How many words are used only once?
What are the top 20 most used words?


If you are interested in how many times Moby is in the book... here is the answer


As you can see Moby is in the book 90 times.

Ok so let's get started, if you want to follow along, you will need the Moby Dick book. Since Moby Dick is in the public domain, you can download the book for free. You can get it from project Gutenberg, the link is here:  http://www.gutenberg.org/ebooks/2701

Make sure to grab the  Plain Text UTF-8 version

In Python, what do we need to get the words an their counts? We need a function that will store the text of the book in a variable

It will look like the following

with open(r'C:\Downloads\MobyDick.txt', 'r', encoding="utf8") as myfile:
    doc=myfile.read().replace('\n', ' ')

As you can see we are also stripping off the line feed by replacing \n with ''. Make sure encoding is in utf8 otherwise you will get errors like the one below

Traceback (most recent call last):
  File "c:\MapReduce.py", line 11, in
    doc=myfile.read().replace('\n', ' ')
  File "C:\Program Files\Python36\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 7237: character maps to

Now that we have the file in a variable, we need to count the words, here is what that function will look like

def CountWords(text):
    output =''.join(c.lower() if c.isalpha() else ' ' for c in text)
    frequencies = {}
    for word in output.split():
        frequencies[word] = frequencies.get(word, 0) + 1
    return frequencies

What the function does is strips all non alpha characters, after that it loops through all the words created by the split function and increments the counter. The function then returns this key value pair


In order to print the output on more than 1 line, we will use pprint,  I already posted about print here: Summer of code 2017: Python, Pretty printing with pprint in Python

from pprint import pprint as pp

Finally, we need to reverse the order, sort the output and limit the output to n numbers,  I have chosen 500 here

pp(sorted(CountWords(doc).items(), key=lambda x: (-x[1], x[0]))[:500])


All of this together will look like this, make sure to change the path to the file to match your computer's path

def CountWords(text):
    output =''.join(c.lower() if c.isalpha() else ' ' for c in text)
    frequencies = {}
    for word in output.split():
        frequencies[word] = frequencies.get(word, 0) + 1
    return frequencies
 
 
with open(r'C:\Downloads\MobyDick.txt', 'r') as myfile:
    doc=myfile.read().replace('\n', ' ')
 
 
from pprint import pprint as pp
 
pp(sorted(CountWords(doc).items(), key=lambda x: (-x[1], x[0]))[:500])

Now let's ask those questions again, but this time we will have the answers as well

What are the 20 most used words?
Here we go

('the', 14718),
 ('of', 6743),
 ('and', 6518),
 ('a', 4807),
 ('to', 4707),
 ('in', 4242),
 ('that', 3100),
 ('it', 2536),
 ('his', 2532),
 ('i', 2127),
 ('he', 1900),
 ('s', 1825),
 ('but', 1823),
 ('with', 1770),
 ('as', 1753),
 ('is', 1751),
 ('for', 1646),
 ('was', 1646),
 ('all', 1545),
 ('this', 1443)

How many distinct words in total?
17,148 distinct words (you need to remove the limit in order to get the full set back, just remove [:500])


How many words are used only once?
There are 7416 words used only once, here are some of them starting with the letter z (you need to remove the limit in order to get the full set back, just remove [:500])

('zag', 1),
 ('zay', 1),
 ('zealanders', 1),
 ('zephyr', 1),
 ('zeuglodon', 1),
 ('zig', 1),
 ('zip', 1),
 ('zogranda', 1),
 ('zoroaster', 1)


How many times does captain Ahab's name appear in the book?
Captain Ahab's name appears 517 times in the book


And I will leave you with the first 100 most used words in the book

 ('the', 14718),
 ('of', 6743),
 ('and', 6518),
 ('a', 4807),
 ('to', 4707),
 ('in', 4242),
 ('that', 3100),
 ('it', 2536),
 ('his', 2532),
 ('i', 2127),
 ('he', 1900),
 ('s', 1825),
 ('but', 1823),
 ('with', 1770),
 ('as', 1753),
 ('is', 1751),
 ('for', 1646),
 ('was', 1646),
 ('all', 1545),
 ('this', 1443),
 ('at', 1335),
 ('whale', 1244),
 ('by', 1227),
 ('not', 1172),
 ('from', 1105),
 ('on', 1073),
 ('him', 1068),
 ('so', 1066),
 ('be', 1064),
 ('you', 964),
 ('one', 925),
 ('there', 871),
 ('or', 798),
 ('now', 786),
 ('had', 779),
 ('have', 774),
 ('were', 684),
 ('they', 670),
 ('which', 655),
 ('like', 647),
 ('me', 633),
 ('then', 631),
 ('their', 620),
 ('are', 619),
 ('some', 619),
 ('what', 619),
 ('when', 607),
 ('an', 600),
 ('no', 596),
 ('my', 589),
 ('upon', 568),
 ('out', 539),
 ('man', 527),
 ('up', 526),
 ('into', 523),
 ('ship', 519),
 ('ahab', 517),
 ('more', 509),
 ('if', 501),
 ('them', 474),
 ('ye', 473),
 ('we', 470),
 ('sea', 455),
 ('old', 452),
 ('would', 432),
 ('other', 431),
 ('been', 415),
 ('over', 409),
 ('these', 406),
 ('will', 399),
 ('though', 384),
 ('its', 382),
 ('down', 379),
 ('only', 378),
 ('such', 376),
 ('who', 366),
 ('any', 364),
 ('head', 348),
 ('yet', 345),
 ('boat', 337),
 ('long', 334),
 ('time', 334),
 ('her', 332),
 ('captain', 329),
 ('here', 325),
 ('do', 324),
 ('very', 323),
 ('about', 318),
 ('still', 312),
 ('than', 311),
 ('chapter', 308),
 ('great', 307),
 ('those', 307),
 ('said', 305),
 ('before', 301),
 ('two', 298),
 ('has', 294),
 ('must', 293),
 ('t', 291),
 ('most', 285)




Saturday, July 29, 2017

Summer of code 2017: Python, Day 42 args and kwargs


As explained in my Summer of code 2017: Python post I decided to pick up Python

This is officially day 42.  Today I looked at args and kwargs, these notes are mostly for me but who knows, they might be helpful for someone else in the future as well


Args and kwargs? WTF is that? I had the same thought, it turns out these are magic variables  :-)

From the docs:
args
A tuple of positional arguments values. Dynamically computed from the arguments attribute.
kwargs
A dict of keyword arguments values. Dynamically computed from the arguments attribute.

Basically this is a way to pass in a unknown number of variables into a function.

Args are prefixed with an asterisk, kwargs are prefixed with 2 asterisks

The name of these  variables does not have to be *args and **kwargs, you can name it anything

For example, here is an args named bars

def test_args(foo, *bars):

And here is a kwargs also named bars

def test_kwargs(foo,**bars):

The only difference between these two is the single and double asterisk



Let's make a very simple function, that will loop through the *args and print them out. This function accepts a normal variable foo and an args variable named *bars

def test_args(foo, *bars):
    print ('first normal argument:', foo)
    for bar in bars:
        print ("another looping through all the *bars :", bar)
 
    print ('all *bars on one line', bars)

Now let's call this function like this

>>> test_args('args','Denis','likes','playing','with','Python')

Here is the output

>>> test_args('args','Denis','likes','playing','with','Python')
first normal argument: args
another looping through all the *bars : Denis
another looping through all the *bars : likes
another looping through all the *bars : playing
another looping through all the *bars : with
another looping through all the *bars : Python
all *bars on one line ('Denis', 'likes', 'playing', 'with', 'Python')
>>> 

Let's now call this function like this

>>> test_args('args','enough','Python')

Here is the output of that call

>>> test_args('args','enough','Python')
first normal argument: args
another looping through all the *bars : enough
another looping through all the *bars : Python
all *bars on one line ('enough', 'Python')
>>> 

As you can see, you can pass a variable number of values into the function by using args

Now let's take a look at kwargs, here is our function, we now named the variable **bars to denote that this is a variable of type kwargs

def test_kwargs(foo,**bars):
    print ('first normal argument:', foo)
    for bar in bars:
        print ("another looping through all the **bars :", bar)
 
    print ('all **bars on one line', bars)

Calling this function requires a change

If you try calling it like we did with args you will get an error

>>> test_kwargs('args','enough','Python')
Traceback (most recent call last):
  File "", line 1, in <module>
TypeError: test_kwargs() takes 1 positional argument but 3 were given
>>> 

What you have to do instead is use named arguments

If we change it to this it will work fine

>>> test_kwargs('kwargs',name ='Denis1', age = 200)

Here is the output

>>> test_kwargs('kwargs',name ='Denis1', age = 200)
first normal argument: kwargs
another looping through all the **bars : name
another looping through all the **bars : age
all **bars on one line {'name': 'Denis1', 'age': 200}
>>> 

Here is another example

>>> test_kwargs('kwargs',month ='July', day = 29)

Here is the output

>>> test_kwargs('kwargs',month ='July', day = 29)
first normal argument: kwargs
another looping through all the **bars : month
another looping through all the **bars : day
all **bars on one line {'month': 'July', 'day': 29}
>>> 

You might have noticed that we didn't print the value of the month or the value of the date. Let's change our function so it looks a little different, now we will print both the key and the value

def test_kwargs(foo, **bars):
    print ('first normal argument:', foo)
    if bars is not None:
        for key, value in bars.items():
            print ("%s : %s" %(key,value))
 
    print ('all *bars on one line', bars)

Now we can make the same call

test_kwargs('kwargs',name ='Denis1', age = 200)

And here is the output

>>> test_kwargs('kwargs',name ='Denis1', age = 200)
first normal argument: kwargs
name : Denis1
age : 200
all *bars on one line {'name': 'Denis1', 'age': 200}
>>> 

As you can see we now have the name as well as the value of the key printed


If you want to use args and kwargs in a function,  you need to have the args before the kwargs



def test_args(foo, **bars, *namedbars):
                               ^
SyntaxError: invalid syntax

This is an error because we have kwargs before args

def test_args(**bars, foo, *namedbars):
                            ^
SyntaxError: invalid syntax

This is also an error because we still have kwargs before args

This is how it should be, forst normal variables, then args and finally kwargs

def test_args(foo, *bars, **namedbars):

Here is an example of such a signature

def test_args(foo, *bars, **namedbars):
    print ('first normal argument:', foo)
    for bar in bars:
        print ("another looping through all the *bars :", bar)
 
    print ('all *bars on one line', bars)
    print ('all **namedbars on one line', namedbars)


Calling the function above gives us the following output

>>> test_args('args','Denis','likes','playing','with','Python')
first normal argument: args
another looping through all the *bars : Denis
another looping through all the *bars : likes
another looping through all the *bars : playing
another looping through all the *bars : with
another looping through all the *bars : Python
all *bars on one line ('Denis', 'likes', 'playing', 'with', 'Python')
all **namedbars on one line {}

As you can see, there is nothing printed for kwargs, this is because we did not pass anything in.
Let's make a change and add a value for the kwargs

Here is the output from that call

>>> test_args('args','enough','Python', namedbars='Denis')
first normal argument: args
another looping through all the *bars : enough
another looping through all the *bars : Python
all *bars on one line ('enough', 'Python')
all **namedbars on one line {'namedbars': 'Denis'}
>>> 

There you have it.. a rather simple blog post that that explains the difference between args and kwargs