Preface #
This problem is quite simple and does not allow for many algorithmic improvements. In fact it would be difficult to come up with a non-optimal (bigger than \(O(n)\)) solution. This makes this problem a good candidate for benchmarking.
I used the following tools:
- pytest-benchmark to generate candlestick charts to determine test cases with adverse behavior
- line_profiler to profile methods line by line
CAVE: All tests and benchmarks are run while exporting this page to HTML. Since performance tests are not reliably reproducible numbers inside the text (… took \(X {\mu}s\)) are from previous runs and can be slightly different from the actual numbers in the benchmark results.
Without much further ado …
Description #
Implement
atoi
which converts a string to an integer.The function first discards as many whitespace characters as necessary until the first non-whitespace character is found. Then, starting from this character, takes an optional initial plus or minus sign followed by as many numerical digits as possible, and interprets them as a numerical value.
The string can contain additional characters after those that form the integral number, which are ignored and have no effect on the behavior of this function.
If the first sequence of non-whitespace characters in str is not a valid integral number, or if no such sequence exists because either str is empty or it contains only whitespace characters, no conversion is performed.
If no valid conversion could be performed, a zero value is returned.
Note:
- Only the space character ’ ’ is considered as whitespace character.
- Assume we are dealing with an environment which could only store integers within the 32-bit signed integer range: [−2^31, 2^31 − 1]. If the numerical value is out of the range of representable values,
INT_MAX
(2^31 − 1) orINT_MIN
(−2^31) is returned.Example 1:
Input: "42" Output: 42
Example 2:
Input: " -42" Output: -42
Explanation: The first non-whitespace character is ‘-’, which is the minus sign. Then take as many numerical digits as possible, which gets 42.
Example 3:
Input: "4193 with words" Output: 4193
Explanation: Conversion stops at digit ‘3’ as the next character is not a numerical digit.
Example 4:
Input: "words and 987" Output: 0
Explanation: The first non-whitespace character is ‘w’, which is not a numerical digit or a +/- sign. Therefore no valid conversion could be performed.
Example 5:
Input: "-91283472332" Output: -2147483648
Explanation: The number “-91283472332” is out of the range of a 32-bit signed integer. Thefore INT_MIN (−2^31) is returned.
(From the problem description)
Solutions #
The problem itself is quite easy to solve. What bit me on the first run was that I did not read the problem description thouroughly:
[…] Then, starting from this character, takes an optional initial plus or minus sign followed by as many numerical digits as possible, and interprets them as a numerical value. […]
I forgot to implement the leading ‘+’. C’est la vie …
The first working version was split into two steps: detect the number and
calculate the value. Since this is not the most optimal solution (one loop for
detecting the number, one for calculating the result) I only got to beat ~29% of
the other implementations speed wise. Merging the value calculation brought me
to “better than 66%”. Good, but not perfect. Pushing to use more python
build-ins (lstrip()
, isdigit()
), and caching the result of len(string)
got
me faster than 93%. Good enough.
Problem API #
class Solution:
def myAtoi(self, string: str) -> int:
Tests #
Testing is split into functional and performance testing.
import os
import pytest
class BaseTestClass:
def test_positive_numbers_are_detected(self):
assert 42 == self.myAtoi("42")
def test_positive_numbers_with_plus_are_detected(self):
assert 42 == self.myAtoi("+42")
def test_positive_numbers_with_leading_zeroes_are_detected(self):
assert 42 == self.myAtoi("00000000000042")
def test_negative_numbers_are_detected(self):
assert -42 == self.myAtoi("-42")
def test_numbers_with_words_after_are_detected(self):
assert 4193 == self.myAtoi("4193 with words")
def test_words_before_yield_zero(self):
assert 0 == self.myAtoi("words and 987")
def test_caps_at_MIN_INT(self):
assert -2147483648 == self.myAtoi("-91283472332")
def test_caps_at_MAX_INT(self):
assert 2147483647 == self.myAtoi("91283472332")
def test_finds_above_at_MIN_INT(self):
assert -2147483647 == self.myAtoi("-2147483647")
def test_finds_below_MAX_INT(self):
assert 2147483646 == self.myAtoi("2147483646")
def test_just_text_fails(self):
assert 0 == self.myAtoi("x")
def test_just_whitespace_fails(self):
assert 0 == self.myAtoi(" ")
def test_just_plus_fails(self):
assert 0 == self.myAtoi("+")
def test_just_minus_fails(self):
assert 0 == self.myAtoi("-")
def test_leading_whitespaces_are_discarded(self):
assert 42 == self.myAtoi(" 42"), "one whitespace"
assert 42 == self.myAtoi(" 42"), "two whitespaces"
The performance tests use pytest-benchmark to generate candlestick charts to detect data that leads to longer runtime.
import os
import pytest
class BaseBenchmarkClass:
def test_benchmark_small_number(self, benchmark):
assert 42 == benchmark(self.myAtoi, string="42")
def test_benchmark_medium_number(self, benchmark):
assert 12345 == benchmark(self.myAtoi, string="12345")
def test_benchmark_long_number(self, benchmark ):
assert 2**31 == benchmark(self.myAtoi, string="4298847327493274927349723947329")
def test_benchmark_negative_number(self, benchmark):
assert -42 == benchmark(self.myAtoi, string="-42")
def test_benchmark_many_leading_space(self, benchmark):
long_string = " "*100 +"124"
assert 124 == benchmark(self.myAtoi, string=long_string)
def test_benchmark_detect_non_number(self, benchmark):
assert 0 == benchmark(self.myAtoi, string="words and 987")
Solution 1: Iterative \(O(n)\) #
class Solution:
INT_MAX = 2147483647
INT_MIN = -2147483648
def myAtoi(self, string: str) -> int:
i = 0
# Eat whitespaces
while i < len(string) and string[i] == ' ':
i+=1
if i == len(string):
return 0
if string[i] == '-':
sign = -1
i +=1
elif string[i] == '+':
sign = 1
i +=1
else:
sign = 1
if i == len(string):
return 0
msd = i
# TODO: test if '0' <= string[i] <= '9' is faster
val = 0
ord_0 = 48
while i < len(string) and (0 <= ord(string[i]) - ord_0 <= 9):
digit = ord(string[i]) - ord_0
if val:
val = val * 10 + digit
else:
val = digit
# Early baily out for large numbers
if val > Solution.INT_MAX:
break
i += 1
val *= sign
if val > Solution.INT_MAX:
return Solution.INT_MAX
elif val < Solution.INT_MIN:
return Solution.INT_MIN
else:
return val
Test Results #
With the following test result:
============================= test session starts ==============================
platform linux -- Python 3.7.5, pytest-3.10.1, py-1.8.0, pluggy-0.12.0
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/jens/Documents/leetcode.com/008-string-to-integer-atoi, inifile:
plugins: benchmark-3.2.2, profiling-1.7.0
collected 15 items
test_1a-solution.py ............... [100%]
============================ slowest test durations ============================
(0.00 durations hidden. Use -vv to show these durations.)
========================== 15 passed in 0.04 seconds ===========================
Tests ran in 0.18480283400276676 seconds
Analysis #
Although the solution is correct and reasonably fast (better than 66%), the length of the input seems to have a big impact on the runtime (\(8.8 {\mu}s\) for skipping 100 spaces), \(3.2 {\mu}s\) for a 32 bit number.
Solution 2: Iterative, more built-ins \(O(n)\) #
The algorithm itself is hardly improvable: Nothing is touched twice, all
shortcuts are taken. The next step was to improve on caching constant values
(len(string)
), to tackle space elimination via string.lstrip()
, and to use
string[i].isdigit()
instead of '0' <= string[i] <= '9'
.
class Solution:
INT_MAX = 2147483647
INT_MIN = -2147483648
def myAtoi(self, string: str) -> int:
i = 0
# Eat whitespaces
string = string.lstrip()
len_string = len(string)
if i == len_string:
return 0
if string[i] == '-':
sign = -1
i +=1
elif string[i] == '+':
sign = 1
i +=1
else:
sign = 1
if i == len_string:
return 0
val = 0
ord_0 = 48
while i < len_string and string[i].isdigit():
digit = ord(string[i]) - ord_0
if val:
val = val * 10 + digit
else:
val = digit
# Early baily out for large numbers
if val > Solution.INT_MAX:
break
i += 1
val *= sign
if val > Solution.INT_MAX:
return Solution.INT_MAX
elif val < Solution.INT_MIN:
return Solution.INT_MIN
else:
return val
Test Results #
With the following test result:
============================= test session starts ==============================
platform linux -- Python 3.7.5, pytest-3.10.1, py-1.8.0, pluggy-0.12.0
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/jens/Documents/leetcode.com/008-string-to-integer-atoi, inifile:
plugins: benchmark-3.2.2, profiling-1.7.0
collected 15 items
test_1b-solution.py ............... [100%]
============================ slowest test durations ============================
(0.00 durations hidden. Use -vv to show these durations.)
========================== 15 passed in 0.04 seconds ===========================
Tests ran in 0.1848075570014771 seconds
Analysis #
Using built ins improved runtime by ~22% from 36ms down to 28ms. That is good enough (while not perfect, but what is, ever?).
Profiling allows deeper insight into the performance characteristics. Let’s first start with cProfile and gprof2dot to detect high level bottlenecks for the slowest case, parsing the large number from the previous tests 5.000 times:
Not much surprisingly most of the time is spend in myAtoi
. Sadly cProfile does
not trace on a per line level by default. Fortunately there is a solution:
line_profiler (not shown: some magic to inject @profile
into the solution).
The test should benchmark the average case, not only the worst case. Since the test cases are unknown I rolled the dice and came out with 80% short numbers and 20% very long numbers.
kernprof -l -v test_1b-profile-lines.py
Wrote profile results to test_1b-profile-lines.py.lprof
Timer unit: 1e-06 s
Total time: 0.054652 s
File: /home/jens/Documents/leetcode.com/008-string-to-integer-atoi/solution1b.py
Function: myAtoi at line 5
Line # Hits Time Per Hit % Time Line Contents
==============================================================
5 def myAtoi(self, string: str) -> int:
6 5000 1241.0 0.2 2.3 i = 0
7 # Eat whitespaces
8 5000 1462.0 0.3 2.7 string = string.lstrip()
9 5000 1626.0 0.3 3.0 len_string = len(string)
10 5000 1462.0 0.3 2.7 if i == len_string:
11 return 0
12
13 5000 1637.0 0.3 3.0 if string[i] == '-':
14 sign = -1
15 i +=1
16 5000 1434.0 0.3 2.6 elif string[i] == '+':
17 sign = 1
18 i +=1
19 else:
20 5000 1387.0 0.3 2.5 sign = 1
21
22 5000 1309.0 0.3 2.4 if i == len_string:
23 return 0
24
25 5000 1332.0 0.3 2.4 val = 0
26 5000 1263.0 0.3 2.3 ord_0 = 48
27 22000 7844.0 0.4 14.4 while i < len_string and string[i].isdigit():
28 18000 6183.0 0.3 11.3 digit = ord(string[i]) - ord_0
29 18000 4859.0 0.3 8.9 if val:
30 13000 4059.0 0.3 7.4 val = val * 10 + digit
31 else:
32 5000 1359.0 0.3 2.5 val = digit
33 # Early baily out for large numbers
34 18000 5467.0 0.3 10.0 if val > Solution.INT_MAX:
35 1000 250.0 0.2 0.5 break
36 17000 5016.0 0.3 9.2 i += 1
37
38 5000 1423.0 0.3 2.6 val *= sign
39
40 5000 1615.0 0.3 3.0 if val > Solution.INT_MAX:
41 1000 295.0 0.3 0.5 return Solution.INT_MAX
42 4000 1160.0 0.3 2.1 elif val < Solution.INT_MIN:
43 return Solution.INT_MIN
44 else:
45 4000 969.0 0.2 1.8 return val
Much to my surprise the premature optimization (if val > Solution.INT_MAX: ...
) is responsible for roughly 10% of the total runtime!
This leads me to the question, if there is a faster way to detect a 32 bit overflow?
@profile
def benchmark_this():
INT_MAX = 2**31-1
FROM = INT_MAX - 1_000
TO = INT_MAX + 1_000 + 2
dummy_counter = 0
# The expexted way: use '>'
for i in range(FROM, TO):
if i > INT_MAX: # compare
dummy_counter += 1
else:
dummy_counter -= 1
BIT_31 = 2**31
dummy_counter = 0
# The C way: use a bitwise test
for i in range(FROM, TO):
if i & BIT_31: # compare
dummy_counter += 1
else:
dummy_counter -= 1
for i in range(0,5_000):
benchmark_this()
There is not much difference between comparison and bitwise test:
kernprof -l -v benchmark_overflow_test.py | grep '# compare'
10 10010000 2646044.0 0.3 16.4 if i > INT_MAX: # compare
20 10010000 2823236.0 0.3 17.5 if i & BIT_31: # compare
In fact comparing with >
is faster than testing bitwise. No luck there.
Solution 3: (with a bug) Iterative, more built-ins, pre-calculate \(O(n)\) #
Solution 2 showed that the test to bail out on an overflow takes roughly 10% of
the total runtime for numbers bigger/smaller than INT_MAX/INT_MIN
.
The reason for the guard condition is break out early for really big numbers (e.g. \(10^{1000}\)). A best effort approach is good enough there.
The main loop looks like this:
while i < len_string and string[i].isdigit():
# calculate val
val = ...
if val > Solution.INT_MAX:
break
The longest 31 bit number is \(\lceil log_{10}(2^{31}) \rceil = 10\) digits in base 10.
This lets us pull the bail out condition before the loop:
# 10^10 > 2^31. Any 10 digit number is bigger than 2^31
# Add 1 because the loop stops at len_string-1
len_string = min(len_string, 10+1)
while i < len_string and string[i].isdigit():
# calculate val
val = ...
Remark: This will not work for numbers with leading zeroes, e.g. “0001”.
Test Results #
With the following test result:
============================= test session starts ==============================
platform linux -- Python 3.7.5, pytest-3.10.1, py-1.8.0, pluggy-0.12.0
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/jens/Documents/leetcode.com/008-string-to-integer-atoi, inifile:
plugins: benchmark-3.2.2, profiling-1.7.0
collected 15 items
test_1c-solution.py ..F............ [100%]
=================================== FAILURES ===================================
________ Test1c.test_positive_numbers_with_leading_zeroes_are_detected _________
self = <test_1c-solution.Test1c object at 0x7f04cbaad0d0>
def test_positive_numbers_with_leading_zeroes_are_detected(self):
> assert 42 == self.myAtoi("00000000000042")
E AssertionError: assert 42 == 0
E + where 0 = <bound method Test1c.myAtoi of <test_1c-solution.Test1c object at 0x7f04cbaad0d0>>('00000000000042')
E + where <bound method Test1c.myAtoi of <test_1c-solution.Test1c object at 0x7f04cbaad0d0>> = <test_1c-solution.Test1c object at 0x7f04cbaad0d0>.myAtoi
test_solution.py:13: AssertionError
============================ slowest test durations ============================
(0.00 durations hidden. Use -vv to show these durations.)
===================== 1 failed, 14 passed in 0.05 seconds ======================
Tests ran in 0.21932784400996752 seconds
Analysis #
The new optimization got worse results than the second solution. That was unexpected!
- Parsing long numbers was slightly slower (\(~2.6 {\mu}s\) vs. \(~2.9 {\mu}s\)). Intuitively this is not completely unexpected since we now use a more coarse comparison (\(10^{10}\) in the looping condition vs. \(2^{31}\) before the looping condition)
- Parsing short (2 digit) numbers also got worse(\(~1 {\mu}s\) vs. \(~1.1 {\mu}s\)). This is unexpected because in essence we just moved one comparison from within the loop outside of the loop.
Let’s first validate the assumption that large numbers that will get capped are slower because the loop runs more often (keep in mind that the parser ran 5.000 times to smooth out the results):
Wrote profile results to test_1bc-profile-lines-long.py.lprof
Timer unit: 1e-06 s
Total time: 0.12127 s
File: /home/jens/Documents/leetcode.com/008-string-to-integer-atoi/solution1b.py
Function: myAtoi at line 5
Line # Hits Time Per Hit % Time Line Contents
==============================================================
5 def myAtoi(self, string: str) -> int:
6 5000 1443.0 0.3 1.2 i = 0
7 # Eat whitespaces
8 5000 1707.0 0.3 1.4 string = string.lstrip()
9 5000 1708.0 0.3 1.4 len_string = len(string)
10 5000 1719.0 0.3 1.4 if i == len_string:
11 return 0
12
13 5000 1678.0 0.3 1.4 if string[i] == '-':
14 sign = -1
15 i +=1
16 5000 1602.0 0.3 1.3 elif string[i] == '+':
17 sign = 1
18 i +=1
19 else:
20 5000 1465.0 0.3 1.2 sign = 1
21
22 5000 1501.0 0.3 1.2 if i == len_string:
23 return 0
24
25 5000 1459.0 0.3 1.2 val = 0
26 5000 1397.0 0.3 1.2 ord_0 = 48
27 50000 18845.0 0.4 15.5 while i < len_string and string[i].isdigit():
28 50000 18160.0 0.4 15.0 digit = ord(string[i]) - ord_0
29 50000 14345.0 0.3 11.8 if val:
30 45000 16059.0 0.4 13.2 val = val * 10 + digit
31 else:
32 5000 1422.0 0.3 1.2 val = digit
33 # Early baily out for large numbers
34 50000 16217.0 0.3 13.4 if val > Solution.INT_MAX:
35 5000 1416.0 0.3 1.2 break
36 45000 14063.0 0.3 11.6 i += 1
37
38 5000 1775.0 0.4 1.5 val *= sign
39
40 5000 1787.0 0.4 1.5 if val > Solution.INT_MAX:
41 5000 1502.0 0.3 1.2 return Solution.INT_MAX
42 elif val < Solution.INT_MIN:
43 return Solution.INT_MIN
44 else:
45 return val
Total time: 0.127051 s
File: /home/jens/Documents/leetcode.com/008-string-to-integer-atoi/solution1c.py
Function: myAtoi at line 5
Line # Hits Time Per Hit % Time Line Contents
==============================================================
5 def myAtoi(self, string: str) -> int:
6 5000 1556.0 0.3 1.2 i = 0
7 # Eat whitespaces
8 5000 1834.0 0.4 1.4 string = string.lstrip()
9 5000 1901.0 0.4 1.5 len_string = len(string)
10 5000 1828.0 0.4 1.4 if i == len_string:
11 return 0
12
13 5000 1832.0 0.4 1.4 if string[i] == '-':
14 sign = -1
15 i +=1
16 5000 1724.0 0.3 1.4 elif string[i] == '+':
17 sign = 1
18 i +=1
19 else:
20 5000 1590.0 0.3 1.3 sign = 1
21
22 5000 1668.0 0.3 1.3 if i == len_string:
23 return 0
24
25 5000 1587.0 0.3 1.2 val = 0
26 5000 1562.0 0.3 1.2 ord_0 = 48
27
28 # 10^10 > 2^31. Any 10 digit number is bigger than 2^31
29 # Add 1 because the loop stops at len_string-1
30 5000 2200.0 0.4 1.7 len_string = min(len_string, 10+1)
31
32 60000 24514.0 0.4 19.3 while i < len_string and string[i].isdigit():
33 55000 21510.0 0.4 16.9 digit = ord(string[i]) - ord_0
34 55000 17489.0 0.3 13.8 if val:
35 50000 19293.0 0.4 15.2 val = val * 10 + digit
36 else:
37 5000 1560.0 0.3 1.2 val = digit
38 55000 17927.0 0.3 14.1 i += 1
39
40 5000 1890.0 0.4 1.5 val *= sign
41
42 5000 1811.0 0.4 1.4 if val > Solution.INT_MAX:
43 5000 1775.0 0.4 1.4 return Solution.INT_MAX
44 elif val < Solution.INT_MIN:
45 return Solution.INT_MIN
46 else:
47 return val
We can clearly see that the second (“optimized”) version loops more often than the “non-optimized” version, explaining the discrepancy.
Let us now compare the performance for parsing short numbers (“42”):
Wrote profile results to test_1bc-profile-lines-short.py.lprof
Timer unit: 1e-06 s
Total time: 0.045616 s
File: /home/jens/Documents/leetcode.com/008-string-to-integer-atoi/solution1b.py
Function: myAtoi at line 5
Line # Hits Time Per Hit % Time Line Contents
==============================================================
5 def myAtoi(self, string: str) -> int:
6 5000 1428.0 0.3 3.1 i = 0
7 # Eat whitespaces
8 5000 1766.0 0.4 3.9 string = string.lstrip()
9 5000 1817.0 0.4 4.0 len_string = len(string)
10 5000 1735.0 0.3 3.8 if i == len_string:
11 return 0
12
13 5000 1748.0 0.3 3.8 if string[i] == '-':
14 sign = -1
15 i +=1
16 5000 1703.0 0.3 3.7 elif string[i] == '+':
17 sign = 1
18 i +=1
19 else:
20 5000 1477.0 0.3 3.2 sign = 1
21
22 5000 1534.0 0.3 3.4 if i == len_string:
23 return 0
24
25 5000 1484.0 0.3 3.3 val = 0
26 5000 1487.0 0.3 3.3 ord_0 = 48
27 15000 5922.0 0.4 13.0 while i < len_string and string[i].isdigit():
28 10000 3799.0 0.4 8.3 digit = ord(string[i]) - ord_0
29 10000 3157.0 0.3 6.9 if val:
30 5000 1729.0 0.3 3.8 val = val * 10 + digit
31 else:
32 5000 1479.0 0.3 3.2 val = digit
33 # Early baily out for large numbers
34 10000 3380.0 0.3 7.4 if val > Solution.INT_MAX:
35 break
36 10000 3312.0 0.3 7.3 i += 1
37
38 5000 1659.0 0.3 3.6 val *= sign
39
40 5000 1822.0 0.4 4.0 if val > Solution.INT_MAX:
41 return Solution.INT_MAX
42 5000 1783.0 0.4 3.9 elif val < Solution.INT_MIN:
43 return Solution.INT_MIN
44 else:
45 5000 1395.0 0.3 3.1 return val
Total time: 0.046275 s
File: /home/jens/Documents/leetcode.com/008-string-to-integer-atoi/solution1c.py
Function: myAtoi at line 5
Line # Hits Time Per Hit % Time Line Contents
==============================================================
5 def myAtoi(self, string: str) -> int:
6 5000 1565.0 0.3 3.4 i = 0
7 # Eat whitespaces
8 5000 1739.0 0.3 3.8 string = string.lstrip()
9 5000 1942.0 0.4 4.2 len_string = len(string)
10 5000 1897.0 0.4 4.1 if i == len_string:
11 return 0
12
13 5000 1745.0 0.3 3.8 if string[i] == '-':
14 sign = -1
15 i +=1
16 5000 1751.0 0.4 3.8 elif string[i] == '+':
17 sign = 1
18 i +=1
19 else:
20 5000 1655.0 0.3 3.6 sign = 1
21
22 5000 1518.0 0.3 3.3 if i == len_string:
23 return 0
24
25 5000 1587.0 0.3 3.4 val = 0
26 5000 1561.0 0.3 3.4 ord_0 = 48
27
28 # 10^10 > 2^31. Any 10 digit number is bigger than 2^31
29 # Add 1 because the loop stops at len_string-1
30 5000 2159.0 0.4 4.7 len_string = min(len_string, 10+1)
31
32 15000 6227.0 0.4 13.5 while i < len_string and string[i].isdigit():
33 10000 4028.0 0.4 8.7 digit = ord(string[i]) - ord_0
34 10000 3353.0 0.3 7.2 if val:
35 5000 1808.0 0.4 3.9 val = val * 10 + digit
36 else:
37 5000 1483.0 0.3 3.2 val = digit
38 10000 3471.0 0.3 7.5 i += 1
39
40 5000 1678.0 0.3 3.6 val *= sign
41
42 5000 1770.0 0.4 3.8 if val > Solution.INT_MAX:
43 return Solution.INT_MAX
44 5000 1830.0 0.4 4.0 elif val < Solution.INT_MIN:
45 return Solution.INT_MIN
46 else:
47 5000 1508.0 0.3 3.3 return val
In multiple runs the line-by-line profiler the optimized version sometimes even ran faster. So not much help there.
Solution 4: Unroll first loop \(O(n)\) #
Moving the “bail out” before the loop did not help and introduced a bug. Next
try: Move the if
condition that tests for a zero val before the loop.
In essence transform this:
# Original version
while i < len_string and string[i].isdigit():
digit = ord(string[i]) - ord_0
# Move this before the loop
if val:
val = val * 10 + digit
else:
val = digit
i += 1
into this:
# Partially unroll the loop
if not string[i].isdigit():
return 0
val = ord(string[i]) - ord_0
i += 1
while i < len_string and string[i].isdigit():
digit = ord(string[i]) - ord_0
val = val * 10 + digit
# Early baily out for large numbers
if val > Solution.INT_MAX:
break
i += 1
Test Results #
With the following test result:
============================= test session starts ==============================
platform linux -- Python 3.7.5, pytest-3.10.1, py-1.8.0, pluggy-0.12.0
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/jens/Documents/leetcode.com/008-string-to-integer-atoi, inifile:
plugins: benchmark-3.2.2, profiling-1.7.0
collected 15 items
test_1d-solution.py ............... [100%]
============================ slowest test durations ============================
(0.00 durations hidden. Use -vv to show these durations.)
========================== 15 passed in 0.03 seconds ===========================
Tests ran in 0.1924969839892583 seconds
Analysis #
Although the if
was moved before the loop, the solution ran slower. Let’s try
something else.
Solution 5: NO premature optimizations \(O(n)\) #
It seems that all premature optimizations made it only worse.
class Solution:
INT_MAX = 2147483647
INT_MIN = -2147483648
def myAtoi(self, string: str) -> int:
i = 0
# Eat whitespaces
string = string.lstrip()
len_string = len(string)
if i == len_string:
return 0
if string[i] == '-':
sign = -1
i +=1
elif string[i] == '+':
sign = 1
i +=1
else:
sign = 1
if i == len_string:
return 0
val = 0
ord_0 = 48
while i < len_string and string[i].isdigit():
digit = ord(string[i]) - ord_0
# The "if val:" check is removed
val = val * 10 + digit
# Early baily out for large numbers
if val > Solution.INT_MAX:
break
i += 1
val *= sign
if val > Solution.INT_MAX:
return Solution.INT_MAX
elif val < Solution.INT_MIN:
return Solution.INT_MIN
else:
return val
Test Results #
With the following test result:
============================= test session starts ==============================
platform linux -- Python 3.7.5, pytest-3.10.1, py-1.8.0, pluggy-0.12.0
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/jens/Documents/leetcode.com/008-string-to-integer-atoi, inifile:
plugins: benchmark-3.2.2, profiling-1.7.0
collected 15 items
test_1e-solution.py ............... [100%]
============================ slowest test durations ============================
(0.00 durations hidden. Use -vv to show these durations.)
========================== 15 passed in 0.04 seconds ===========================
Tests ran in 0.19208616401010659 seconds
Analysis #
Who would have thought it? Ditching the last premature optimization did not worsen the execution time but made the code more readable.
Summary #
Solution | Correct | Status | Runtime | Memory | Language |
---|---|---|---|---|---|
#1, parse left to right | No | Submission | Forgot to parse leading ‘+’ | Python3 | |
#1, parse left to right | Yes | Submission | 44 ms, faster than 29.9% | 12.8 MB, less than 100% | Python3 |
#1, parse right to left | Yes | Submission | Runtime: 36 ms, faster than 66.28% | 12.8 MB, less than 100% | Python3 |
#2, use more built ins | Yes | Submission | Runtime: 28 ms, faster than 93.82% | 12.7 MB, less than 100% | Python3 |
#5, use more built ins, no optimizations | Yes | Submission | Runtime: 28 ms, faster than 93.43% | 12.7 MB, less than 100% | Python3 |