HRT codility

https://app.codility.com/c/feedback/E79GKP-SNG/ app link

Q1 count frequency of fractions def solution(X, Y) where X were the numerators, and Y were the denominators. Had to used GCD on num and denominator and then count the tuples and return the max values

something like:

from fractions import gcd from collections import defaultdict def solution(X, Y): freq = defaultdict(int) for a, b in zip(X, Y): divisor = gcd(a, b) a //= divisor b //= divisor freq[a, b] += 1 return max(freq.values() or [0])

Q2 Dp problem for balance of vowel and cosonant string. Return 0 if length is 0, 1 if length is 1. Else the abs diff between vowel and consonant plus the balance of floor(N/ 2) prefix, and balance of the N - floor(N/2) suffix

something like:

from functools import lru_cache

@lru_cach(maxsize=None) def dp(left, right, s): if left >= right: return 1 if left == right else 0

vowels = ['a', 'e', 'i', 'o', 'u']
balance = abs(sum(1 if s[x] in vowels else -1 for x in range(left, right + 1)))
N = right - left + 1
left_end = left + N // 2 - 1
right_end = right - (N - N // 2) + 1

return balance + dp(left, left_end, s) + dp(right, right_end, s)

Q3 Monsterous sucky question. Given S, Y, Z where S is the input string, Y is the index of the error and Z is the look ahead and behind of the error. Writer a parser that prints the line before the error or up to Z characters behind the error which is shorter. That prints the line of the error and to the right of the error or Z right of error (whichever is shorter). A carret with a newline followed after the line with an error to point to the error of the carret. Finally the line below the error or up to Z whicever is shorter (taking into account the Z characters you already looked at to the right of the error on the line of the error itself).

Code was monstrous and ugly.

OA:

  • If 2 consecutive characters are the same, shrink the string until you can’t anymore. Output final result. “ABBCDDC” “ABBCC” “ABB” “A” “AABBAA” “AAAA” “AA” ""

  • Construct an n-ary tree given a list of parents/values and k. Find max k-length path sum of the tree.

HRT Phone Screen:

Describe a recent interesting bug that you have resolved.


So without giving away any propeitary information, at my current work there was an old python library that was used for API logging. I was using it to check what kind of load patterns to expect for different jobs to this database I was setting up. One of the quality of life features in this library was fairly trivial in that it kept a list of all the API calls so you can quickly see the calling pattern. I wanted to see the load pattern when there were two of these jobs running concurrently, not something I expected often but something I could see happening for this particular ticket. So when I went to look at the list of API calls, it was fairly odd that the load pattern blew up twice in size and was exactly the same between the two processes. So I dug into the library and I realize this was written a long time ago and someone had actually instatiated the class for API calls with a mutable list, so the same list was being passed between all object instantiations. I think because it’s not a very popular library, and usually people haven’t instiated multiple instances together it went unnoticed for so long until I was playing around with it. So that was fairly interesting to see in the wild so to speak.


What’s the purpose of ‘yield’ keyword?

Yield is used when defining a generator function.


How does hashtable works?

A hashtable takes in a key and then passes that key into a hash function which hashes said key and maps it to an index of an array where the value will be stored. The hash function itself should give a uniform distrubtion to decrease the number of collisions, and possibly avoid clustering if we’re using open addressing. So when we do a lookup our hash function hashes the key, finds the corresponding index and then looks at that index - assuming there’s no collisions - to see if there is a value for the provided key in our lookup.


What happens in a lookup/insert operation when hashes of 2 different keys are the same?

So for collision resolution there are two popular models. The first being separate chaining in which there is a linked list in each bucket or index of our array and in collisions we add to the linked list. The other is open addressing, in which during a collision we do some probe sequence until we find an empty slot, and on the other hand if we’re searching we do the probe sequence until we find the value in our key value pair or we find an empty slot indicating that the value isn’t in our hash table.


What’s the time complexity of hashtable ops (lookup/delete/insert)?

The average or amortized case for these are O(1) assuming a good hash function. However, they can be degraded to O(N) where N is the nubmer of elements we’re inserting into our hash table, because all they keys we’re inserting into our table will be hashed to the same key.


When does hashtable upsize/downsize? Runtime complexity?

Hashtables upsize when the load factor is too high because it may degrade the hash function and get too many collisions. A rule of thumb for load factor I’ve seen is 75% for upsize, in which case we double it. In terms of down sizing well if memory is cheap then there is no need, however if memory is expensive a rule of thumb I’ve seen is 25% in which case we halve it. Also, the time complexity for the up and down sizing should be an amortized time of O(1).


How does computer memory work?

Memory maps to a physical address in which generally speaking there is some physical persistence that let’s us know it’s on or off. So could be electrical signals, and then the collection of those physical addresses are read in as binary which can be translated to whatever data we are attempting to store.


How does OS allocate memory to each process?

Well the MMU segregates the memory into pages and then allocates each process or application a number of pages.


Virtual mem vs physical mem?

Virtual memory is the memory granted to applications in which they have total control over the virtual memory address given to them. The applications can then not have to worry about fragmented memory and simply access memory continguously for example if that’s how it loads memory in the granted memory. Then the MMU translates the virtual memory to the corresponding page, and access the actual physical address for that memory. So the MMU can take care of memory management such as fragmenting for example.


What happens when virtual mem exceeds physical mem?

So the excess is stored in disk, and loaded into the RAM as required when there is a page fault.


Stack vs heap. What stores in which?

Stack is the space set aside for thread execution, so for example when a function is called it uses the top most block of the stack to store the function details as well as local variables, further calls to other functions take up more memory from top of the stack and when the function exits it is automatically deallocated to the top of the stack in typical LIFO fashion. Where as heap is for dynamic allocation and is generally speaking in global scope, no particular pattern for allocating and deallocating memory, and in some languages like cpp you have to do this manually. The heap space is allocated generally at application startup there’s no pattern in allocating or deallocating it, so it is typically slower than the stack in which it’s trivial to allocate or deallocate memory, such as just moving the pointer up and down.


How does GIL work? What’s the adv vs. disadv of GIL?

So the GIL is the Global Interpreter Lock it’s a mutex that protects access to python objects. It basically means the PVM the python virtual machine prevents multiple threads from executing the python bytecode at once, so you can have multi threading but it will not be in parallel.

Advantages of it would be increased speed on single threaded programs, also I believe it made for easier integration of C libraries since they are usually not thread-safe, and I believe in recent years of discussions it’s mentioned it makes the maintainabilty easier for python. Well disadvantage is simply we can’t use multi threading in parallel, and instead have to use multi processing libraries which are heavier and not as light weight as multi threading.


Difference between process and thread?

A process is a program in execution and a thread is a portion of the process, as more than one thread can exist as part of a process. Threads can share resources such as memory whereas process don’t.


How dictionary works under the hood in python?

So from Python 3.6 onwards it uses a sparse table like a normal hash table, in which there is a contiguous block of memory and then at each index it stores the index to a dense array. The index at the dense array corresponds to the triplet <hash, key, value>. It stores the triplet because when checking if a key exists it compares the hash and key as there may be a collision. Speaking collisions it also uses open addressing to resolve hash collisions, and I believe it uses random probing for the probing sequence. Finally, I believe the load factor is 2/3 so it will resize when 2/3 full.


DNS?

Domain Name System takes domain names and translates them to IP addresses. So when you enter a website address in your browser your ISP forwards the request to a DNS resolver, which forwards to a root nameserver and grabs the mapping between the website domain and the IP address.


What is the difference between hard linkage and soft linkage?

Hard link points to the specific physical address of the data, whereas a soft link or a symbolic link points to the pointer that is pointing the physical address of the data.


TCP/UDP What’s the difference?

Transport Control Protocol - User Data Protocol

I believe TCP is connection-oriented and UDP is connectionless. Meaning UDP is faster, but can suffer from packet loss, also it can’t guarantee sequencing. Whereas TCP is slower, but it can resend packets if they are lost, and it can guarante the sequencing of those packets. So for example UDP would be great for video conferencing, whereas TCP would be for downloading a zip file for example.


Describe a Garbage Collection System?

So in python the garbage collection has the reference counting collector and the generational garbage collector. The reference counting collector keeps track of how many references there are to an object, and can free up that memory slot if there are no references to an object. This is very efficient but it can’t detect cyclical reference dependencies and that’s where the generational garbage collector comes in and finds those. The GC can be disabled but reference counting is fundamental and can’t be.


Checking Endianness of a platform?

Create an integer and read its first byte, and if that byte is 1 then it’s little endian otherwise it’s big endian.

Endianness is the order of bytes of in computer memory.