Adding More Data Structures to Python
A set of python modules for data structures. All data structures are written in C and imported into python using cython. No python code involved to make the data structure pretty fast.
The idea is to extend python data structures to add more flexibility and to include more data structures that can help in different situations.
Github link: https://github.com/MNoorFawi/struct-pie
There is also a StructPie project on sourceforge.net that contains all these data structures as C shared libraries that can readily be used in C projects. The libraries can be downloaded from here.
Quick Start
To install the library
pip install structpie
Testing the different data structures from within python
Hash Table
Separate Chaining hash map that enables fast lookup and search. It facilitates searching by “value” O(1), in contrary to python dictionary which enables O(1) search when key is known.
For full comparison between list, dict and HashTable, try to run the example.py script in /hash_table/ folder in the github repo.
To initiate a hash table, size and type should be specified. It supports ‘int’, ‘float’ and ‘str’
from structpie import * size = 10 type = "str" ht = HashTable(size, type) arr = ["Hello", "Salam", "Hola", "Ola", "Ciao", "Konnichiwa", "Merhaba", "Privetstviye"]
If the table is of “str” type, input string should be converted into bytes for Cython to be able to convert it into C char*.
# insert data into the table for i in arr: ht.insert_val(i.encode()) # print the table ht.display() # index(0): Hello # index(1): # index(2): # index(3): Konnichiwa # index(4): Ola # index(5): Ciao # index(6): # index(7): Hola # index(8): Merhaba Salam # index(9): Privetstviye
Check the size of the table and how many indices are occupied.
print("size of the table is %d while number of filled indices is %d" % (ht.length(), ht.occupied())) # size of the table is 10 while number of filled indices is 7
Search for a value in the table, remove and search by index:
print("try now to search for and delete some values: (0 means False while 1 is True)") # try now to search for and delete some values: (0 means False while 1 is True) print("search for value Merhaba --> %d" % ht.search_value("Merhaba".encode())) # search for value Merhaba --> 1 print("search for value Namaste --> %d" % ht.search_value("Namaste".encode())) # search for value Namaste --> 0 print("search by index 3 --> value %s found" % ht.search_by_index(3)) # search by index 3 --> value Konnichiwa found print("remove value Ola --> %d" % ht.val_del("Ola".encode())) # remove value Ola --> 1 print("search again for value Ola --> %d" % ht.search_value("Ola".encode())) # search again for value Ola --> 0 print("try again to remove value Ola --> %d" % ht.val_del("Ola".encode())) # try again to remove value Ola --> 0
Print again
ht.display() # index(0): Hello # index(1): # index(2): # index(3): Konnichiwa # index(4): # index(5): Ciao # index(6): # index(7): Hola # index(8): Merhaba Salam # index(9): Privetstviye
Get index of a specific value
ht.get_index(b"Hello") # 0 ht.get_index(b"Salam") # 8
We also can get all the values in a particular index
ht.search_index(8) # ['Merhaba', 'Salam'] ht.search_index(1) # [] ht.search_index(ht.get_index(b"Salam")) # ['Merhaba', 'Salam']
Displaying first certain number of indices with display method. Acts like head in python dataframes:
ht.display(5) ##specifying how many indices to display # index(0): Hello # index(1): # index(2): # index(3): Konnichiwa # index(4):
There are some empty unused indices that’s because the hash_func used with string is so simple for now but better implementation will be available in future versions.
Meanwhile in HashTable this issue will be seen less.
LIFO & FIFO Stack
from structpie import * # initiate a stack of type "int" st = Stack("int") # push values into the stack st.push(13) st.push(11) st.push(19) st.push(91) # print the stack st.display() # [13 11 19 91 ] # pop from right (LIFO) st.pop() # 91 # pop from left (FIFO) st.popleft() # 13 # check length st.length() # 2 # print again st.display() # [11 19 ]
The stack can also contain data of type ‘float’ and ‘str’
Binary Search Tree
Binary search tree is a great data structure for quick searching and dynamic sorting while inserting new data. It is widely used in so many applications like indexing in databases for fast lookup.
Struct Pie provides a C-written binary search tree for usage in python.
Insertion is a little bit slower than appending to a python list, but searching is so much faster.
Struct Pie BST nodes can hold 3 data pieces; (int index, char* info1, float info2). This will be extended soon to hold more data.
Let’s try the binary search tree on some sample data about Uefa Champions League all-time goalscorers. Each row represents (number of goals, name, goal/game ratio).
# initiate bstree bst = BSTree() import pandas as pd data = pd.read_csv("./binary_search_tree/example_data.csv", header = None) data # 0 1 2 #0 130 Cristiano Ronaldo 0.76 #1 115 Lionel Messi 0.80 #2 71 Raul Gonzalez 0.50 #3 68 Robert Lewandowski 0.76 #4 65 Karim Benzema 0.54 #5 56 Ruud van Nistelrooy 0.77 #6 50 Thierry Henry 0.45 #7 49 Alfredo Di Stefano 0.84 #8 48 Andriy Shevchenko 0.48 #9 48 Zlatan Ibrahimovic 0.40 #10 46 Thomas Muller 0.40
Let’s insert the data into the tree
for i in range(data.shape[0]): bst.insert(data.iloc[i, 0], data.iloc[i, 1].encode(), data.iloc[i, 2]) # check that all data was inserted data.shape[0] == bst.node_count() # True
Some operations on the tree; (inorder) printing, searching and deleting
# print inorder bst.inorder() # 46 -> 48 -> 48 -> 49 -> 50 -> 56 -> 65 -> 68 -> 71 -> 115 -> 130 -> # search for an index bst.search(40) # "node doesn't exist" bst.search(48) # (48, b'Andriy Shevchenko', 0.48) bst.search(46) # (46, b'Thomas Muller', 0.4) bst.search(49) # (49, b'Alfredo Di Stefano', 0.84) # delete a node bst.remove(48) # new order after deletion bst.inorder() # 46 -> 48 -> 49 -> 50 -> 56 -> 65 -> 68 -> 71 -> 115 -> 130 -> bst.node_count() # 10 bst.remove(71) bst.node_count() # 9 bst.inorder() # 46 -> 48 -> 49 -> 50 -> 56 -> 65 -> 68 -> 115 -> 130 -> bst.search(65) # (65, b'Karim Benzema', 0.54)
Hash Table with Binary Search Tree
Implementing hash table data structure that uses separate chaining to avoid collisions. While in each indiex it uses a binary search tree instead of linked lists to store data that has similar indices produced by the hashing function to achieve faster lookup.
Insertion is slower than insertion in linked lists but searching is faster.
from structpie import * size = 5 type = "int" hbst = HashBSTree(size, type) arr = [13, 11, 19, 91, 22, 12, 19, 94] for i in arr: hbst.insert_val(i) hbst.display() # index(0): # index(1): 11 -> 91 -> # index(2): 12 -> 22 -> # index(3): 13 -> # index(4): 19 -> 19 -> 94 -> (hbst.length(), hbst.occupied()) # (5, 4) hbst.search_value(94) # 1 hbst.search_value(15) # 0 hbst.search_by_index(3) # '13' hbst.val_del(91) hbst.search_value(91) # 0 hbst.val_del(91) # The value does not exist hbst.get_index(19) # 4 hbst.get_index(11) # 1
Priority Queue
Priority queue is a very good data structure for inserting new values and ordering the queue according to certain priority so that when poping the more important ones get popped first even if they were inserted later.
pq = PQ() pq.display() # The queue is Empty
PQ takes two inputs, for now, (char * name, int priority). This will definitely be expanded soon.
C char* is represented in python as bytes by cython. So while inserting strings, they should be encoded into bytes first.
Let’s insert some tasks and priorities:
pq.push(b"exercise", 3) pq.push(b"sleep", 2) pq.push(b"repeat", 1) pq.push(b"code", 5) pq.display() # ( # {code, 5} # {exercise, 3} # {sleep, 2} # {repeat, 1} # )
Queue was dynamically ordered according to priorities.
Now let’s pop:
pq.pop() # (code) with priority (5) has been removed # 'code'
Note that pop method prints a confirmation message and returns the task name as a string
some = pq.pop() # (exercise) with priority (3) has been removed print(some) # exercise pq.display() # ( # {sleep, 2} # {repeat, 1} # ) pq.pop() # (sleep) with priority (2) has been removed # 'sleep' pq.pop() # (repeat) with priority (1) has been removed # 'repeat' ## when trying to pop from an empty queue, an empty string '' is returned. pq.pop() # The queue is empty # '' pq.display() # The queue is Empty
More data structures will be supported soon.
Leave a Reply