Friday, August 11, 2017

Python: function introspection with help and dir


As explained in my Summer of code 2017: Python post I decided to pick up Python

This is officially day 55. Today I decided to look at function introspection in Python

To see what methods are in a object, you can use the dir function

dir([object])
Without arguments, return the list of names in the current local scope. With an argument, attempt to return a list of valid attributes for that object.

If I just open an interactive shell and type dir(), here is what I see

>>> dir()
['__builtins__', '__doc__', '__file__', '__loader__', '__name__', '__package__'
 , '__spec__', 'test_kwargs']

To see what is available in a class or function, for example sys, I first need to import sys and then pass sys into the dir function, here is the output

>>> import sys
>>> dir(sys)
['__displayhook__', '__doc__', '__excepthook__', '__interactivehook__',
'__loader__', '__name__', '__package__', '__spec__', '__stderr__', 
'__stdin__', '__stdout__', '_clear_type_cache', '_current_frames', 
'_debugmallocstats', '_enablelegacywindowsfsencoding', '_getframe', 
'_home', '_mercurial', '_xoptions', 'api_version', 'argv', 'base_exec_prefix',
'base_prefix', 'builtin_module_names', 'byteorder', 'call_tracing',
'callstats', 'copyright', 'displayhook', 'dllhandle', 'dont_write_bytecode', 
'exc_info', 'excepthook', 'exec_prefix', 'executable', 'exit', 'flags',
'float_info', 'float_repr_style', 'get_asyncgen_hooks', 'get_coroutine_wrapper',
'getallocatedblocks', 'getcheckinterval', 'getdefaultencoding',
'getfilesystemencodeerrors', 'getfilesystemencoding', 'getprofile',
'getrecursionlimit', 'getrefcount', 'getsizeof', 'getswitchinterval',
'gettrace', 'getwindowsversion', 'hash_info', 'hexversion', 'implementation',
'int_info', 'intern', 'is_finalizing', 'maxsize', 'maxunicode', 'meta_path',
'modules', 'path', 'path_hooks', 'path_importer_cache', 'platform', 'prefix',
'ps1', 'ps2', 'set_asyncgen_hooks', 'set_coroutine_wrapper', 'setcheckinterval',
'setprofile', 'setrecursionlimit', 'setswitchinterval', 'settrace', 'stderr',
'stdin', 'stdout', 'thread_info', 'version', 'version_info', 'warnoptions', 
'winver']
>>> 


To get some details about any of the methods returned above, we can use the help function

help([object])
Invoke the built-in help system. (This function is intended for interactive use.) If no argument is given, the interactive help system starts on the interpreter console. If the argument is a string, then the string is looked up as the name of a module, function, class, method, keyword, or documentation topic, and a help page is printed on the console. If the argument is any other kind of object, a help page on the object is generated.


To get some details about settrace, we can execute the following:  help(sys.settrace)

help(sys.settrace)
Help on built-in function settrace in module sys:
 
settrace(...)
    settrace(function)
    
    Set the global debug tracing function.  It will be called on each
    function call.  See the debugger chapter in the library manual.
 
>>> 


Here is another example

help(sys.builtin_module_names)
Help on tuple object:
 
class tuple(object)
 |  tuple() -> empty tuple
 |  tuple(iterable) -> tuple initialized from iterable's items
 |  
 |  If the argument is a tuple, the return value is the same object.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(self, key, /)
 |      Return self[key].
 |  
 |  __getnewargs__(...)
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __hash__(self, /)
 |      Return hash(self).
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __le__(self, value, /)
 |      Return self<=value.
 |  
 |  __len__(self, /)
 |      Return len(self).
 |  
 |  __lt__(self, value, /)
 |      Return self<value.
 |  
 |  __mul__(self, value, /)
 |      Return self*value.n
 |  
 |  __ne__(self, value, /)
 |      Return self!=value.
 |  
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help(type) for accurate signature.
 |  
 |  __repr__(self, /)
 |      Return repr(self).
 |  
 |  __rmul__(self, value, /)
 |      Return self*value.
 |  
 |  count(...)
 |      T.count(value) -> integer -- return number of occurrences of value
 |  
 |  index(...)
 |      T.index(value, [start, [stop]]) -> integer -- return first index of value.
 |      Raises ValueError if the value is not present.
 
>>> 

You can also use help against a class or function, here is what the lengthy output looks like

help(sys)
Help on built-in module sys:
 
NAME
    sys
 
MODULE REFERENCE
    https://docs.python.org/3.6/library/sys
    
    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.
 
DESCRIPTION
    This module provides access to some objects used or maintained by the
    interpreter and to functions that interact strongly with the interpreter.
    
    Dynamic objects:
    
    argv -- command line arguments; argv[0] is the script pathname if known
    path -- module search path; path[0] is the script directory, else ''
    modules -- dictionary of loaded modules
    
    displayhook -- called to show results in an interactive session
    excepthook -- called to handle any uncaught exception other than SystemExit
      To customize printing in an interactive session or to install a custom
      top-level exception handler, assign other functions to replace these.
    
    stdin -- standard input file object; used by input()
    stdout -- standard output file object; used by print()
    stderr -- standard error object; used for error messages
      By assigning other file objects (or objects that behave like files)
      to these, it is possible to redirect all of the interpreter's I/O.
    
    last_type -- type of last uncaught exception
    last_value -- value of last uncaught exception
    last_traceback -- traceback of last uncaught exception
      These three are only available in an interactive session after a
      traceback has been printed.
    
    Static objects:
    
    builtin_module_names -- tuple of module names built into this interpreter
    copyright -- copyright notice pertaining to this interpreter
    exec_prefix -- prefix used to find the machine-specific Python library
    executable -- absolute path of the executable binary of the Python interpreter
    float_info -- a struct sequence with information about the float implementation.
    float_repr_style -- string indicating the style of repr() output for floats
    hash_info -- a struct sequence with information about the hash algorithm.
    hexversion -- version information encoded as a single integer
    implementation -- Python implementation information.
    int_info -- a struct sequence with information about the int implementation.
    maxsize -- the largest supported length of containers.
    maxunicode -- the value of the largest Unicode code point
    platform -- platform identifier
    prefix -- prefix used to find the Python library
    thread_info -- a struct sequence with information about the thread implementation.
    version -- the version of this interpreter as a string
    version_info -- version information as a named tuple
    dllhandle -- [Windows only] integer handle of the Python DLL
    winver -- [Windows only] version number of the Python DLL
    _enablelegacywindowsfsencoding -- [Windows only] 
    __stdin__ -- the original stdin; don't touch!
    __stdout__ -- the original stdout; don't touch!
    __stderr__ -- the original stderr; don't touch!
    __displayhook__ -- the original displayhook; don't touch!
    __excepthook__ -- the original excepthook; don't touch!
    
    Functions:
    
    displayhook() -- print an object to the screen, and save it in builtins._
    excepthook() -- print an exception and its traceback to sys.stderr
    exc_info() -- return thread-safe information about the current exception
    exit() -- exit the interpreter by raising SystemExit
    getdlopenflags() -- returns flags to be used for dlopen() calls
    getprofile() -- get the global profiling function
    getrefcount() -- return the reference count for an object (plus one :-)
    getrecursionlimit() -- return the max recursion depth for the interpreter
    getsizeof() -- return the size of an object in bytes
    gettrace() -- get the global debug tracing function
    setcheckinterval() -- control how often the interpreter checks for events
    setdlopenflags() -- set the flags to be used for dlopen() calls
    setprofile() -- set the global profiling function
    setrecursionlimit() -- set the max recursion depth for the interpreter
    settrace() -- set the global debug tracing function
 
FUNCTIONS
    __displayhook__ = displayhook(...)
        displayhook(object) -> None
        
        Print an object to sys.stdout and also save it in builtins._
    
    __excepthook__ = excepthook(...)
        excepthook(exctype, value, traceback) -> None
        
        Handle an exception by displaying it with a traceback on sys.stderr.
    
    call_tracing(...)
        call_tracing(func, args) -> object
        
        Call func(*args), while tracing is enabled.  The tracing state is
        saved, and restored afterwards.  This is intended to be called from
        a debugger from a checkpoint, to recursively debug some other code.
    
    callstats(...)
        callstats() -> tuple of integers
        
        Return a tuple of function call statistics, if CALL_PROFILE was defined
        when Python was built.  Otherwise, return None.
        
        When enabled, this function returns detailed, implementation-specific
        details about the number of function calls executed. The return value is
        a 11-tuple where the entries in the tuple are counts of:
        0. all function calls
        1. calls to PyFunction_Type objects
        2. PyFunction calls that do not create an argument tuple
        3. PyFunction calls that do not create an argument tuple
           and bypass PyEval_EvalCodeEx()
        4. PyMethod calls
        5. PyMethod calls on bound methods
        6. PyType calls
        7. PyCFunction calls
        8. generator calls
        9. All other calls
        10. Number of stack pops performed by call_function()
    
    exc_info(...)
        exc_info() -> (type, value, traceback)
        
        Return information about the most recent exception caught by an except
        clause in the current stack frame or in an older stack frame.
    
    excepthook(...)
        excepthook(exctype, value, traceback) -> None
        
        Handle an exception by displaying it with a traceback on sys.stderr.
    
    exit(...)
        exit([status])
        
        Exit the interpreter by raising SystemExit(status).
        If the status is omitted or None, it defaults to zero (i.e., success).
        If the status is an integer, it will be used as the system exit status.
        If it is another kind of object, it will be printed and the system
        exit status will be one (i.e., failure).
    
    get_asyncgen_hooks(...)
        get_asyncgen_hooks()
        
        Return a namedtuple of installed asynchronous generators hooks (firstiter, finalizer).
    
    get_coroutine_wrapper(...)
        get_coroutine_wrapper()
        
        Return the wrapper for coroutine objects set by sys.set_coroutine_wrapper.
    
    getallocatedblocks(...)
        getallocatedblocks() -> integer
        
        Return the number of memory blocks currently allocated, regardless of their
        size.
    
    getcheckinterval(...)
        getcheckinterval() -> current check interval; see setcheckinterval().
    
    getdefaultencoding(...)
        getdefaultencoding() -> string
        
        Return the current default string encoding used by the Unicode 
        implementation.
    
    getfilesystemencodeerrors(...)
        getfilesystemencodeerrors() -> string
        
        Return the error mode used to convert Unicode filenames in
        operating system filenames.
    
    getfilesystemencoding(...)
        getfilesystemencoding() -> string
        
        Return the encoding used to convert Unicode filenames in
        operating system filenames.
    
    getprofile(...)
        getprofile()
        
        Return the profiling function set with sys.setprofile.
        See the profiler chapter in the library manual.
    
    getrecursionlimit(...)
        getrecursionlimit()
        
        Return the current value of the recursion limit, the maximum depth
        of the Python interpreter stack.  This limit prevents infinite
        recursion from causing an overflow of the C stack and crashing Python.
    
    getrefcount(...)
        getrefcount(object) -> integer
        
        Return the reference count of object.  The count returned is generally
        one higher than you might expect, because it includes the (temporary)
        reference as an argument to getrefcount().
    
    getsizeof(...)
        getsizeof(object, default) -> int
        
        Return the size of object in bytes.
    
    getswitchinterval(...)
        getswitchinterval() -> current thread switch interval; see setswitchinterval().
    
    gettrace(...)
        gettrace()
        
        Return the global debug tracing function set with sys.settrace.
        See the debugger chapter in the library manual.
    
    getwindowsversion(...)
        getwindowsversion()
        
        Return information about the running version of Windows as a named tuple.
        The members are named: major, minor, build, platform, service_pack,
        service_pack_major, service_pack_minor, suite_mask, and product_type. For
        backward compatibility, only the first 5 items are available by indexing.
        All elements are numbers, except service_pack and platform_type which are
        strings, and platform_version which is a 3-tuple. Platform is always 2.
        Product_type may be 1 for a workstation, 2 for a domain controller, 3 for a
        server. Platform_version is a 3-tuple containing a version number that is
        intended for identifying the OS rather than feature detection.
    
    intern(...)
        intern(string) -> string
        
        ``Intern'' the given string.  This enters the string in the (global)
        table of interned strings whose purpose is to speed up dictionary lookups.
        Return the string itself or the previously interned string object with the
        same value.
    
    is_finalizing(...)
        is_finalizing()
        Return True if Python is exiting.
    
    set_asyncgen_hooks(...)
        set_asyncgen_hooks(*, firstiter=None, finalizer=None)
        
        Set a finalizer for async generators objects.
    
    set_coroutine_wrapper(...)
        set_coroutine_wrapper(wrapper)
        
        Set a wrapper for coroutine objects.
    
    setcheckinterval(...)
        setcheckinterval(n)
        
        Tell the Python interpreter to check for asynchronous events every
        n instructions.  This also affects how often thread switches occur.
    
    setprofile(...)
        setprofile(function)
        
        Set the profiling function.  It will be called on each function call
        and return.  See the profiler chapter in the library manual.
    
    setrecursionlimit(...)
        setrecursionlimit(n)
        
        Set the maximum depth of the Python interpreter stack to n.  This
        limit prevents infinite recursion from causing an overflow of the C
        stack and crashing Python.  The highest possible limit is platform-
        dependent.
    
    setswitchinterval(...)
        setswitchinterval(n)
        
        Set the ideal thread switching delay inside the Python interpreter
        The actual frequency of switching threads can be lower if the
        interpreter executes long sequences of uninterruptible code
        (this is implementation-specific and workload-dependent).
        
        The parameter must represent the desired switching delay in seconds
        A typical value is 0.005 (5 milliseconds).
    
    settrace(...)
        settrace(function)
        
        Set the global debug tracing function.  It will be called on each
        function call.  See the debugger chapter in the library manual.
 
DATA
    __stderr__ = None
    __stdin__ = None
    __stdout__ = None
    api_version = 1013
    argv = ['']
    base_exec_prefix = r'C:\Program Files\Python36'
    base_prefix = r'C:\Program Files\Python36'
    builtin_module_names = ('_ast', '_bisect', '_blake2', '_codecs', '_cod...
    byteorder = 'little'
    copyright = 'Copyright (c) 2001-2016 Python Software Foundati...ematis...
    dllhandle = 490799104
    dont_write_bytecode = False
    exec_prefix = r'C:\Program Files\Python36'
    executable = r'C:\Program Files\Python36\pythonw.exe'
    flags = sys.flags(debug=0, inspect=0, interactive=0, opt...ing=0, quie...
    float_info = sys.float_info(max=1.7976931348623157e+308, max_...epsilo...
    float_repr_style = 'short'
    hash_info = sys.hash_info(width=64, modulus=2305843009213693...iphash2...
    hexversion = 50725104
    implementation = namespace(cache_tag='cpython-36', hexversion=507...in...
    int_info = sys.int_info(bits_per_digit=30, sizeof_digit=4)
    last_value = NameError("name 'elp' is not defined",)
    maxsize = 9223372036854775807
    maxunicode = 1114111
    meta_path = [<class '_frozen_importlib.BuiltinImporter'>, <class '_fro...
    modules = {'__main__': <module '__main__' (built-in)>, '_ast': <module...
    path = ['', r'C:\Program Files\Python36\Lib\idlelib', r'C:\Program Fil...
    path_hooks = [<class 'zipimport.zipimporter'>, <function FileFinder.pa...
    path_importer_cache = {r'C:\Program Files\Python36': FileFinder('C:\\P...
    platform = 'win32'
    prefix = r'C:\Program Files\Python36'
    stderr = <idlelib.run.PseudoOutputFile object>
    stdin = <idlelib.run.PseudoInputFile object>
    stdout = <idlelib.run.PseudoOutputFile object>
    thread_info = sys.thread_info(name='nt', lock=None, version=None)
    version = '3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1...
    version_info = sys.version_info(major=3, minor=6, micro=0, releaseleve...
    warnoptions = []
    winver = '3.6'
 
FILE
    (built-in)


Finally, you can also use the __doc__ (dunder doc) attribute to get a documentation string back

Here is an excerpt

sys.__doc__
This module provides access to some objects used or maintained by the\n
interpreter and to functions that interact strongly with the interpreter.\n
\nDynamic objects:\n\nargv -- command line arguments; argv[0] is the script 
pathname if known\npath -- module search path; path[0] is the script directory,
else ''\nmodules -- dictionary of loaded modules\n\ndisplayhook 
-- called to show results in an interactive session\nexcepthook 
-- called to handle any uncaught exception other than SystemExit\n  
To customize printing in an interactive session or to install a custom\n
top-level exception handler, assign other functions to replace these.\n\nstdin
-- standard input file object; used by input()\nstdout -- 
standard output file object; used by print()\nstderr -- standard error object;
used for error messages\n  By assigning other file objects 

As you can see dir and help are helpful if you need some info about a class, function or method in Python

Summer of code 2017: Python, Day 55: Python Jargon


As explained in my Summer of code 2017: Python post I decided to pick up Python

This is officially day 55. Today I decided to look at some Python Jargon

Here are some of the more interesting ones


Dunder
Dunder (Double Underscore) is a way to pronounce names of special methods and attributes, __len__ is pronouced dunder len, __init__ is pronounced dunder init

BDFL
Benevolent Dictator For Life, a.k.a. Guido van Rossum, Python’s creator.

CPython
The canonical implementation of the Python programming language, as distributed on python.org. The term “CPython” is used when necessary to distinguish this implementation from others such as Jython or IronPython.


EAFP
Easier to ask for forgiveness than permission. This common Python coding style assumes the existence of valid keys or attributes and catches exceptions if the assumption proves false. This clean and fast style is characterized by the presence of many try and except statements. The technique contrasts with the LBYL style common to many other languages such as C.

LBYL
Look before you leap. This coding style explicitly tests for pre-conditions before making calls or lookups. This style contrasts with the EAFP approach and is characterized by the presence of many if statements.

f-string
String literals prefixed with 'f' or 'F' are commonly called “f-strings” which is short for formatted string literals. See also PEP 498.


__future__
A pseudo-module which programmers can use to enable new language features which are not compatible with the current interpreter.

By importing the __future__ module and evaluating its variables, you can see when a new feature was first added to the language and when it becomes the default


Python 3000
Nickname for the Python 3.x release line (coined long ago when the release of version 3 was something in the distant future.) This is also abbreviated “Py3k”.

Pythonic
An idea or piece of code which closely follows the most common idioms of the Python language, rather than implementing code using concepts common to other languages. For example, a common idiom in Python is to loop over all elements of an iterable using a for statement. Many other languages don’t have this type of construct, so people unfamiliar with Python sometimes use a numerical counter instead:

for i in range(len(food)):
    print(food[i])

As opposed to the cleaner, Pythonic method:

for piece in food:
    print(piece)




Friday, August 4, 2017

Summer of code 2017: Python, Day 48 Rounding numbers


As explained in my Summer of code 2017: Python post I decided to pick up Python

This is officially day 48. Today I was playing around with numeric data types. I found some interesting side effects/features with the round function in Python

For example if you round 2.675 to 2 digits in SQL Server, you get back 2.68




Now let's do that in Python shall we......

>>> round(2.675,2)
2.67

Ugh what? The issue in Python is that 2.675 is stored as a float and it can't be represented exactly in binary

Here is some more fun stuff

>>> round(0.5)
0
>>> round(1.5)
2
>>> round(2.5)
2
>>> round(3.5)
4
>>> round(4.5)
4
>>> round(5.5)
6


WTF is going on here? Here is what the documentation has to say about round()

Return number rounded to ndigits precision after the decimal point. If ndigits is omitted or is None, it returns the nearest integer to its input.
For the built-in types supporting round(), values are rounded to the closest multiple of 10 to the power minus ndigits; if two multiples are equally close, rounding is done toward the even choice (so, for example, both round(0.5) and round(-0.5) are 0, and round(1.5) is 2). Any integer value is valid for ndigits (positive, zero, or negative). The return value is an integer if called with one argument, otherwise of the same type as number.

So there you have it, Python rounds to the closest even number if two multiples are equally close


Monday, July 31, 2017

Summer of code 2017: Python, Day 44 Frequency of words in the book Moby Dick


As explained in my Summer of code 2017: Python post I decided to pick up Python

This is officially day 44. Today I wanted to see if I could get a python script to run and return me all the words and their occurrences in the book Moby Dick

Some interesting things you might want to know:

How many time is Moby used in the book?
How many distinct words in total?
How many words are used only once?
What are the top 20 most used words?


If you are interested in how many times Moby is in the book... here is the answer


As you can see Moby is in the book 90 times.

Ok so let's get started, if you want to follow along, you will need the Moby Dick book. Since Moby Dick is in the public domain, you can download the book for free. You can get it from project Gutenberg, the link is here:  http://www.gutenberg.org/ebooks/2701

Make sure to grab the  Plain Text UTF-8 version

In Python, what do we need to get the words an their counts? We need a function that will store the text of the book in a variable

It will look like the following

with open(r'C:\Downloads\MobyDick.txt', 'r', encoding="utf8") as myfile:
    doc=myfile.read().replace('\n', ' ')

As you can see we are also stripping off the line feed by replacing \n with ''. Make sure encoding is in utf8 otherwise you will get errors like the one below

Traceback (most recent call last):
  File "c:\MapReduce.py", line 11, in
    doc=myfile.read().replace('\n', ' ')
  File "C:\Program Files\Python36\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 7237: character maps to

Now that we have the file in a variable, we need to count the words, here is what that function will look like

def CountWords(text):
    output =''.join(c.lower() if c.isalpha() else ' ' for c in text)
    frequencies = {}
    for word in output.split():
        frequencies[word] = frequencies.get(word, 0) + 1
    return frequencies

What the function does is strips all non alpha characters, after that it loops through all the words created by the split function and increments the counter. The function then returns this key value pair


In order to print the output on more than 1 line, we will use pprint,  I already posted about print here: Summer of code 2017: Python, Pretty printing with pprint in Python

from pprint import pprint as pp

Finally, we need to reverse the order, sort the output and limit the output to n numbers,  I have chosen 500 here

pp(sorted(CountWords(doc).items(), key=lambda x: (-x[1], x[0]))[:500])


All of this together will look like this, make sure to change the path to the file to match your computer's path

def CountWords(text):
    output =''.join(c.lower() if c.isalpha() else ' ' for c in text)
    frequencies = {}
    for word in output.split():
        frequencies[word] = frequencies.get(word, 0) + 1
    return frequencies
 
 
with open(r'C:\Downloads\MobyDick.txt', 'r') as myfile:
    doc=myfile.read().replace('\n', ' ')
 
 
from pprint import pprint as pp
 
pp(sorted(CountWords(doc).items(), key=lambda x: (-x[1], x[0]))[:500])

Now let's ask those questions again, but this time we will have the answers as well

What are the 20 most used words?
Here we go

('the', 14718),
 ('of', 6743),
 ('and', 6518),
 ('a', 4807),
 ('to', 4707),
 ('in', 4242),
 ('that', 3100),
 ('it', 2536),
 ('his', 2532),
 ('i', 2127),
 ('he', 1900),
 ('s', 1825),
 ('but', 1823),
 ('with', 1770),
 ('as', 1753),
 ('is', 1751),
 ('for', 1646),
 ('was', 1646),
 ('all', 1545),
 ('this', 1443)

How many distinct words in total?
17,148 distinct words (you need to remove the limit in order to get the full set back, just remove [:500])


How many words are used only once?
There are 7416 words used only once, here are some of them starting with the letter z (you need to remove the limit in order to get the full set back, just remove [:500])

('zag', 1),
 ('zay', 1),
 ('zealanders', 1),
 ('zephyr', 1),
 ('zeuglodon', 1),
 ('zig', 1),
 ('zip', 1),
 ('zogranda', 1),
 ('zoroaster', 1)


How many times does captain Ahab's name appear in the book?
Captain Ahab's name appears 517 times in the book


And I will leave you with the first 100 most used words in the book

 ('the', 14718),
 ('of', 6743),
 ('and', 6518),
 ('a', 4807),
 ('to', 4707),
 ('in', 4242),
 ('that', 3100),
 ('it', 2536),
 ('his', 2532),
 ('i', 2127),
 ('he', 1900),
 ('s', 1825),
 ('but', 1823),
 ('with', 1770),
 ('as', 1753),
 ('is', 1751),
 ('for', 1646),
 ('was', 1646),
 ('all', 1545),
 ('this', 1443),
 ('at', 1335),
 ('whale', 1244),
 ('by', 1227),
 ('not', 1172),
 ('from', 1105),
 ('on', 1073),
 ('him', 1068),
 ('so', 1066),
 ('be', 1064),
 ('you', 964),
 ('one', 925),
 ('there', 871),
 ('or', 798),
 ('now', 786),
 ('had', 779),
 ('have', 774),
 ('were', 684),
 ('they', 670),
 ('which', 655),
 ('like', 647),
 ('me', 633),
 ('then', 631),
 ('their', 620),
 ('are', 619),
 ('some', 619),
 ('what', 619),
 ('when', 607),
 ('an', 600),
 ('no', 596),
 ('my', 589),
 ('upon', 568),
 ('out', 539),
 ('man', 527),
 ('up', 526),
 ('into', 523),
 ('ship', 519),
 ('ahab', 517),
 ('more', 509),
 ('if', 501),
 ('them', 474),
 ('ye', 473),
 ('we', 470),
 ('sea', 455),
 ('old', 452),
 ('would', 432),
 ('other', 431),
 ('been', 415),
 ('over', 409),
 ('these', 406),
 ('will', 399),
 ('though', 384),
 ('its', 382),
 ('down', 379),
 ('only', 378),
 ('such', 376),
 ('who', 366),
 ('any', 364),
 ('head', 348),
 ('yet', 345),
 ('boat', 337),
 ('long', 334),
 ('time', 334),
 ('her', 332),
 ('captain', 329),
 ('here', 325),
 ('do', 324),
 ('very', 323),
 ('about', 318),
 ('still', 312),
 ('than', 311),
 ('chapter', 308),
 ('great', 307),
 ('those', 307),
 ('said', 305),
 ('before', 301),
 ('two', 298),
 ('has', 294),
 ('must', 293),
 ('t', 291),
 ('most', 285)