But we didn't talk anything about Forecasting!

Let's fix that. In this post, I'll introduce some very basic forecasting methods. They might seem too simple, but they are still used in practice. Furthermore, they are useful to build a baseline to compare more sophisticated methods with.

I strongly recommend this methodology of building simple baselines to evaluate further and more sophisticated models. Not just for Time Series Analysis but also in any branch of Machine Learning and Statistics.

So, let's get started!

Think about the dumbest method to forecast future values.

In just a minute you might discover the amazing Forecasting Naive Method! It just consists of predicting that the next value of the Series will be equal to the last recorded one.

For example, if our Series goes

$$X_1, X_1, ...., X_t$$

our next value will be

$$X_{t+1} = X_t$$

Pretty naive isn't it?

Actually, it can be good enough when the Series is a Random Walk (when the data don't follow a pattern and is messy). So, don't underestimate it.

I won't include any code in Python for this one because you can implement it in many ways and it is not necessary to use any third-party libraries.

When the data shows a strong seasonal pattern, we can slightly improve the previous naive method.

Let's suppose we have monthly data, and we want to predict what the value for the next month (let's call it February) will be. Then we'll assume that value will be equal to the value shown for past February (the same month of the last year). Assuming that our data shows a strong monthly seasonal pattern.

More formally, if we define the seasonal period `p`

as the number of samples between one observation and the next one that lies in the same season, we can predict

$$X_{t + h} = X_k $$

where \(k = t + h - p\).

Assuming that every value before \(X_{t + h}\) is defined.

Examples of values for the seasonal period `p`

are:

- 1, for
*yearly*seasons - 4, for
*quarterly*seasons - 12, for
*monthly*seasons - 52, for
*weekly*seasons

And so on.

This is a simple Python code example of the Naive Seasonal Method:

```
import numpy as np
import pandas as pd
def seasonal_naive_forecast(series, seasonal_period, forecast_horizon):
if len(series) < seasonal_period:
raise Error("There must be at least 'seasonal_period' observations")
# We assume 'series' to be a pandas.Series object
prev_season = series.iloc[-seasonal_period:]
season_number = int(np.ceil(forecast_horizon/seasonal_period))
next_seasons = np.tile(prev_season, season_number)
# We were only asked to predict the next 'forecast_horizon' values
return pd.Series(next_seasons[:forecast_horizon])
```

We improved the naive method to consider seasonality. Then, what about trending?

It would be great to allow the forecasting method to increase or decrease over time in cases when the data shows a trending behavior. That's what the drift method does.

The general formula would be:

$$ X_{t+h} = X_t + h*\frac{Xt - X1}{t - 1} $$

Notice that the term $$\frac{Xt - X1}{t - 1}$$ is the slope of the line from the first observation to the last one. So, the drift method extrapolates that line into the future and assumes the data follows that trend.

The python code is omitted since it just consists of translating the previous formula to Python.

In this post, we have explored the more basic forecasting methods. Although they are pretty simple, they might be good enough in many situations.

But remember that even when they were not a good solution, they are a great starting point to build our more sophisticated methods on top of.

In the upcoming posts of this series, we'll be talking about residual analysis, how to measure the performance of our models, and of course, how to build and use more robust and sophisticated models.

If you liked this post, consider subscribing to my newsletter so you don't miss an article from my blog. Do you prefer a simpler format and real-time interaction? Then, follow me on Twitter. I like to share more content there.

Stay tuned!

(Cover photo by Javier Esteban on Unsplash )

]]>What do those examples have that makes them different from other datasets?

There is an implicit order relationship in the data, we can't just shuffle the data and work with it. It is extremely important which registry was collected before and which one after.

Another important characteristic (that supports the previous one) is that we assume there is some relationship between previous entries and future ones. In other words, we can write a data sample as `Xn = f(Xn-1, Xn-2, ..., X1, S) + e`

, where `S`

represents other parameters that can be useful to predict the value of `Xn`

and `e`

represents a random error. We call such a relation in which a term can be determined as a function of the previous ones a **recurrence**.

This post is the first in a series about Time Series and Forecasting. Here, we are going to explain the main components of a Time Series, how to measure the relationships between previous and future data of the Time Series, and the first steps to analyze a forecasting problem.

I include a practical exercise to illustrate what we'll be talking about. All using Python.

What do you think? Let's get started!

Let's define the most important features to look at in a Time Series.

From the fact of being a sequence of observations during a time span, Time Series can present some patterns that determine the way we analyze them.

Those patterns are:

**Seasonality**: When the data is affected by some seasonal factors such as holidays, seasons of the year, sports events, and so on. Those seasonal effects have a fixed frequency.**Trending**: When the data seems to have a long-term increase or decrease in its values. Although most of the examples present this trending quality as a linear modification of the data, this increasing/decreasing behavior might not be linear.**Cycling**: When there are other external factors with no fixed frequency that affect the data during a long term. This could be due to economic conditions for example. This pattern is not equivalent to the Seasonality pattern. It usually has a longer period and, as I already said, its frequency is not fixed.

The following image illustrates how all the previous patterns can appear in a Time Series. The image is taken from Forecasting: Principles and Practice, which is an excellent resource to get started with Time Series.

The top-left example shows a strong seasonality within each year, as well as some strong cyclic behavior with a period of about 610 years. There is no apparent trend in the data **over this period**.

The top-right example shows no seasonality, but an obvious downward trend.

The bottom-left example shows a strong increasing trend, with strong seasonality. There is no evidence of any cyclic behavior.

The bottom-right one has no trend, seasonality, or cyclic behavior. There are random fluctuations that do not appear to be very predictable, and no strong patterns that would help with developing a forecasting model.

It is important to note that the patterns identified above correspond with the time period that is represented in the graphs. If we had a longer period, other patterns could emerge.

We'll be working with a dataset that represents the monthly sales of shampoo during three years. You can download it here

Let's explore the data!

```
import pandas as pd
import matplotlib.pyplot as plt
shampoo_df = pd.read_csv('shampoo.csv')
print(shampoo_df.head())
print(shampoo_df.shape)
# Month Sales
# 0 1-01 266.0
# 1 1-02 145.9
# 2 1-03 183.1
# 3 1-04 119.3
# 4 1-05 180.3
#
# (36, 2)
```

As you can see, the dataset contains 36 rows corresponding to the sales of 36 consecutive months.

Let's see what patterns are present in our data:

```
plt.plot(shampoo_df.Month, shampoo_df.Sales)
plt.show()
```

Here you can see a strong increasing trending and a perceptible seasonal pattern. There is no evidence of any cyclic behavior.

Cool! But... What's next?

The way we use the components of a Time Series is left for the next delivery of this series. I want to talk about other important feature that allows us to determine how predictable might be the Series we are working on.

As I mentioned in the first part of this article, we hope there are some recurrent dependencies between the elements of the Series (i.e. we can predict future values from previous ones). So, it is important to measure the relationship between the elements of the Series.

In other statistical problems, we are usually interested in the **correlation** between some pair of variables. That gives us a measure of the strength of the linear relationship between those variables. Besides the correlation between two different variables, there is another metric that we are interested in when dealing with Time Series: **autocorrelation**.

It is a very similar concept but this one is used to measure the relationship that exists between the previous observations and the most recent ones. So, instead of measuring a linear relationship between two different variables, we measure the relationship between the values of the same variable through time.

We can define multiple autocorrelation coefficients. What determines a coefficient is the number of past observations that we use. For example, the coefficient `r2`

, is a measure of the linear relationship between every sample and the previous one; `r3`

takes into account the two previous observations, and so on. In general, the coefficient `rk`

can be calculated as:

Then, we can plot every autocorrelation coefficient in what is called an *autocorrelogram*. Let's do it with our data!

```
acorr = pd.plotting.autocorrelation_plot(shampoo_df.Sales)
acorr.plot()
plt.show()
```

The continuous horizontal lines correspond to the values `2/sqrt(T)`

and `-2/sqrt(T)`

, and enclose a region with an insignificant strength in the linear relationship.

As you can see, there seems to be a strong linear relationship between the sales of a month and the sales of the previous one or two months, while there is not too much evidence of a linear relationship when we look at more than five lagged months.

Let's stop here. So far we have seen what a Time Series is, what are the main components of a Time Series, how to interpret them, and how to determine if there is a linear relationship between lagged observations.

In the next post, we'll learn how to decompose a Time Series in its multiple components, how to determine the best models to explain a Time Series, and much more.

So, be sure to stay tuned! You can subscribe to my newsletter and receive a notification so you don't miss any of my posts.

Want to get my content in a simpler format and be able to interact in a different fashion? Follow me on Twitter!

See you soon!

(Cover photo by Javier Esteban on Unsplash )

]]>Today I want to start a potentially little series of threads about one of my biggest passions: Gdel's theorems. One of the greatest discoveries of all time that I'm sure will blow your mind! I'll structure the series as follows.

First: The history that preceded the theorems

Then: The theorems. Explanation and some common misconceptions

And finally: Shallow overview of the demonstrations.

Let's start with the history that preceded Gdel's results. It is pretty much the history of math so let's see if we can comprise it in a thread.

The study of math as science started in ancient Greece. Although math was applied earlier in other ancient civilizations, philosophers like Tales were the first that studied mathematical abstractions like shapes without asking for the practical purpose of that study. Math started to have its very own questions that only could be answered inside Math itself.

Math became a science. Pythagoreans were a philosophy group that defended that the essence of the universe was numbers.

But their theory was based on the hypothesis that all numbers could be expressed as a fraction of two integer numbers. It seemed an acceptable supposition until...

A member of the Pythagorean school discovered some numbers that cannot be expressed that way.

For example `sqrt(2)`

and `sqrt(5)`

. Those are presumably the first known irrational numbers. So all the Pythagorean theory was reduced to ashes. That was the first big crisis of math.

What was wrong with the Pythagorean theory? They assumed as true a proposition that is false **inside that theory**. You can prove some numbers cannot be expressed as a fraction using just the theorems, elements, and operations of the theory.

Greeks realized they needed to change the way math was developed so far. And then Euclid wrote one of the most important books of the history of science: "The Elements". Almost all the geometry we learned until high school was written by Euclid 2500 years earlier. But the best part was the method Euclid used to formulate his geometry.

"The Elements" showed the first example of an axiomatic theory. An axiomatic theory is built on top of very simple propositions that are assumed as true (axioms). Every other proposition needs to be demonstrated from the axioms by following a set of rules that state how we can go from proposition A to proposition B.

The process of going from the set of axioms to some proposition A is called demonstration. When we demonstrate proposition A we say that A is a theorem in our theory. Let's see how history continues.

Euclid built his geometry on top of five axioms. The first four of them seemed pretty simple but the fifth was trickier. It's well known that Euclid himself tried to demonstrate the fifth axiom from the other four.

More than 2000 years after the first publication of "The Elements", mathematicians were still figuring out how to remove the fifth axiom by demonstrating it. The result of those studies unveiled an astonishing fact: Euclid just defined one of the many possible geometries🤯. If you change the fifth axiom a little bit, you can end up with a perfectly defined (although very crazy) geometry

As a side note, the General Theory of Relativity demonstrated that the geometry of our universe is non-Euclidean (it's one of the new crazy ones).

But this was a big problem! Mathematicians thought the fifth axiom could be demonstrated someday and they built the entire Math building on top of the robust and unique Euclidian geometry. This meant the second deep crisis of Math. We need to rebuild the whole thing again!

But what do we mean by building the entire Math on top of something?

It is defining some axioms in a way that any mathematical proposition can be either proved or refuted by a demonstration process. Many of the greatest mathematicians of all time worked hard on that problem. And then, in 1931, a 25 yo man destroyed that intention. Kurt Gdel proved that such a system was impossible to build. He proved that there are true propositions that cannot be proved in some theories. He proved some things cannot be proved! 🤯🤯🤯

First things first. Let's talk about some important concepts.

We saw what an axiomatic theory is. Well, there are two properties that you'd like to have if you were an axiomatic theory:

Consistency and Completeness

**Consistency**: A consistent theory is one in which a proposition can be either true or false but not both. In other words, a theory without contradictions. **Inconsistent** theories are useless because you can prove anything from them... Yeah, *anything*. There's a funny story of Bertrand Russell proving that if 2+2=5 then he was the Pope😆

**Completeness**: A complete theory is one in which all the *true* propositions are provable inside the theory. The doctoral thesis of Gdel was the demonstration of the completeness of the first-order logic (with 23 yo).

Now we can continue with the history.

So, mathematicians were trying to build math on top of other ground different from geometry. They picked number theory (arithmetic, natural numbers) as the new foundations. The main reason: it was axiomatized some years before. To give you an idea of the magnitude of Gdel discovery I'm going to mention some of the mathematicians trying to rebuild math:

- David Hilbert💪
- Bertrand Russel💥
- Ackerman
- John Von Neuman🔥😱🤯💫

They were trying to prove that the number theory was both Consistent and Complete. That way Math would be safe. The entire Math would be contradictions free and everything could be proven. It seemed to be a matter of time before the proof arrived. Actually, some sub-theories of arithmetic were proven to be both consistent and complete. Gdel himself was working on that but he realized this:

**Theorem 1**: About incompleteness.

For any axiomatic theory

that includes a certain part of arithmetic, if it is consistent then, it is incomplete

This means that all theories that include the number theory, contain true propositions that we'll never be able to prove inside that theory!

All the work of some of the greatest mathematicians of all times was in vain. Jon Von Neuman never worked again in Logic.

But for those who have some hope in their hearts. I remind you that there are two theorems.

**Theorem 2**: About consistency.

For any consistent theory

that contains a certain part of arithmeticthe consistency of the theory is not provable.

Precisely one of those true but not provable propositions is the consistency of the theory itself! So, 0 out of 2. No consistency and no completeness. Math can't be built that way. We have to live with that. There are true propositions out there we'll never prove😔. End of story.

Now, let's talk about some misconceptions generated from the theorems. First I'd like you to note that both theorems say "with a certain amount of arithmetic".

We will be talking about that amount in the next section. For now, just suppose a theory containing the arithmetic.

Misconception number one:

Gdel said: for any sufficiently complex theory if it is consistent, then it is incomplete

There is this idea of anything more complex than number theory has the conditions to apply Gdel's incompleteness theorem. Real numbers is a complete theory. And real numbers are at least as complex as natural numbers. It is not about complexity. It is about how natural numbers are defined. That definition has the "poison".

Misconception number two:

The truth is unreachable for scientists

Ok, some true propositions can't be proven in some theories. But maybe there are other alternative theories. Furthermore, experiments and observations are other methods to discover the truth about our universe.

Misconception number three:

There is no philosophic system that can explain the universe

The explanation of the universe doesn't have to do with natural numbers necessarily. And Gdel's theorems don't apply when there is no arithmetic in the theory.

Of course, there are lots more misconceptions about Gdel's results. But I'll stop here🥵. What bout some demonstrations?

Let's try to understand how Gdel was able to prove that there are not provable propositions and let's do it as smoothly as we can🙄.

By the end, we'll have proved one of the most mind-blowing results ever.

From now we denote with T an axiomatic theory that contains the number theory (theory of natural numbers).

The first Gdel's theorem states that: If T is consistent, then it is incomplete.

It means that there are true propositions in T that can't be proven.

Think about this proposition: "This proposition is not a theorem of T", or equivalently, "This proposition is not provable in T".

If the proposition was true, then it was a true but unprovable proposition in T. Problem solved. End of the thread. WAIT

Well, there is a little problem with that last proposition. It talks about the theory T, but it is not a proposition *in* T. There's a difference between talking about something and being part of it. Let's see how Goedel did it.

He created a code for every proposition and proof in T, in a way that every one of those propositions and proofs had its own and unique natural number that identifies it. That way, we can talk about the theory from the language of numbers. That's what we call Gdel numeration. But remember that T contains the number theory. And that's the fact Gdel took advantage of.

Expressing propositions with numbers is a way to talk about T from within T itself. But how?

By saying: "N is not the code of any theorem in T", we are talking about T. But being the code of a theorem in T is an arithmetic property, and T contains the arithmetic.

So now we can state the proposition: "The code of this proposition is not the code of any theorem in T".

But that previous proposition can't still be formulated in T. That's a non-valid syntax. A proposition can't talk about itself in that way.

We need to achieve that in a more subtle way.

For that, we'll use the method proposed by Quine that is called "quinning". Let's see the following statement:

"yields a proposition with property P when appended to its own quotation." yields a proposition with property P when appended to its own quotation.

We can substitute the quotation for any other sentence. But when using the same sentence the statement starts to talk about itself!

Gdel proposed another method but it is trickier.

Let's denote the previous sentence with th letter G. So, if G is true then G has the property P *and* if G has the property P, then G is true. Let's do the last twist!

Let's make P = "its code is not the code of any theorem in T". Now, if G is false, then it is not provable, and that makes P true, and that makes G also true. So, T would be inconsistent. If T is consistent, then G is true, then P is also true, and G is not provable!

We did it!

G is true but not provable in T!!!

What about the second theorem?

"The consistency of T is not provable in T"

That's a "direct" result from the previous demonstration!

If T is consistent, then G is true. And that's a demonstration of G, and G is not provable!

What we know is that if T is consistent then G is true but not provable. But (and precisely because of the previous sentence), the consistency of T is not provable either.

And that's it. We proved the Goedel theorems!

Of course, these demonstrations are not too accurate but they comprise the main ideas behind those mind-blowing results.

So, this is the end of the series😢. Gdel's theorems started a revolution. A revolution that resulted in the birth of Computer Science. But that's another history. Maybe I'll write my own version of that history in the form of threads in the future😉.

]]>In the next section, well be talking about what Predictive Analytics is and what are the fields it comprises. Then, we show different Predictive Analytics approaches. Finally, we describe the stages that usually are part of the Predictive Analytics pipeline.

So, what do we mean by Predictive Analytics?

Predictive Analytics is an application of many techniques from Statistics and Artificial Intelligence, especially Machine Learning. It uses current and past facts to predict what to expect from the future.

For example, large industries can use Predictive Analytics to predict what would be the demand for some products and then adjust the production. Insurance companies use Predictive Analytics to determine whether an accident can occur with high probability and then charge more money.

Those analyses are carried on by statistical methods and Machine Learning techniques like Data Mining, Hypothesis Tests, and Predictive Modeling. The boom of these techniques is also motivated by technological advances that allow us to make more computations in much less time.

So, the idea is to accurately predict the future based on current and past data. That is why data has become one of the most valuable assets these days. Big companies like Google and Facebook are big mostly because of all the data they own. There is nothing to do without data. Data is the fuel of the Predictive Analytics engine.

If we have data, there are many ways to go, depending on what we want to do. The next section is about those different ways.

Usually, we refer to Predictive Analytics as a set of techniques to calculate the likelihood of a future event. That can suggest that Predictive Analytics is all about predictive models that calculate some probabilities and that kind of stuff. While that is often the way to go, other approaches can be considered part of the Predictive Analytics spectrum. Furthermore, for some difficult tasks, it is possible to combine two or more approaches.

As stated above, this is the most frequent approach. We have many samples of past data, that are called training samples. For example, if we are trying to predict whether a certain transaction could be a fraud, we should have samples of past transactions. Every sample might have a date, a cardholder, the amount of money, the credit card number, and whether that specific transaction was fraudulent or not.

With those samples, we can train our model and it could be able to predict if a future transaction will be a fraud by knowing the rest of the attributes (cardholder, credit card number, etc). That prediction can be done by calculating a probability. Calculating how likely is a certain transaction to be fraudulent.

We can use predictive models to make other kinds of predictions. For example, instead of calculating a probability, we can calculate earnings for a future month, based on previous months. Our samples would be now the previous months, every month with attributes like the amount of money invested in advertising, the cost of production, and the earnings. Then we can train a model to predict future earnings. This can help businesses to make important decisions.

Examples of predictive models are Linear and Logistic Regression, Support Vector Machines, and Neural Networks.

Sometimes we just want to discover some patterns or relationships between the samples. For example, a streaming company or an online retailer wants to recommend similar things to similar people. Then, we might need to identify some categories for customers. There will be customers that like technology, others that can have a low budget to spend, etc.

We can also use descriptive models to help us to understand better the data. And it could be just a previous step after which we can apply predictive or decision modeling.

Examples of Descriptive Models are Clustering and Principal Component Analysis.

The last approach well be talking about is decision modeling. This one is used when we want to find the relation between all the elements that influence a decision.

For example, in medicine, we want to build methods that help a doctor to diagnose a disease. But having a black-box that receives the symptoms and prints yes or no, or just pneumonia or heart attack is not useful. We need some explanation about why the model got that result. This way, the doctor can determine whether to trust the algorithm and it can also help the doctor to see some other insights about the patient.

In this approach, we find relationships between the attributes of the samples and make decisions. Examples of decision models are Decision Trees, Random Forests, and Bayesian Networks.

After describing the different ways we can encounter Predictive Analytics, lets see a common pipeline of a Predictive Analytics task.

Although there are many ways to approach Predictive Analytics depending on the task to solve, we can split the process into some stages that are present in almost all solutions.

This is the first and more difficult stage. It usually takes about 75-80% of the total time spent.

Data is not always easily available. Sometimes you might need to scrap websites and to process thousands of web pages to obtain that precious data. Other methods to obtain data could be even more time-consuming. For example, posting a poll, or a survey and waiting for people to fill them.

Cleaning data is about getting rid of samples and attributes that arent useful. Maybe we have samples with some missing attributes, or with values that are not correct. This step is extremely important because data often contain noise that can lead to terrible predictions.

Having clean samples doesnt mean we have the data we need. We must always process the data. Sometimes we need the numerical values to be in a range between 0 and 1, or the literal values to be numerical values, and so on.

In this processing step, we can do what is called feature engineering. Feature engineering is a group of tricks to make our data even more suitable for the problem. We can make transformations between some attributes and obtain others that are more descriptive or that highlight a pattern.

After processing our data we can state some hypotheses about that data, and how the future can be predicted from it. For example, we can assume that there is a linear relationship between some attributes of the samples. We can also assume that the behavior of an attribute is independent of the behavior of another attribute.

Those assumptions can be tested. Sometimes we can test whether they hold or not. This formal proof can be done via Hypothesis Testing, which is a powerful method to formally prove the validity of an assumption.

By visualizing some features of the samples we can also determine whether our assumptions seem to hold or not. Data visualization doesnt give us formal proof but it is extremely helpful in practice. We can visualize some patterns or determine that some assumptions dont seem to hold. Hence, data visualization is useful for both, understanding the data, and asserting whether a hypothesis seems to be correct.

Once we have tested our hypothesis about the data, we are ready to build a model to predict the future.

Our model can be any of the examples mentioned in the previous section or others that were not mentioned. The model can even be a combination of several approaches and models.

The training process varies depending on the model and the approach we are following. But after the process is completed we should end with a result that should be validated. That means that we need to verify if our model is really solving the problem we want to solve.

The validation is commonly carried on with another set of samples that are not used in the training process. This way we can determine if our model can solve the task or if it cheated and just perform well with the training data. The validation process can also vary a lot depending on the approach we decided to follow.

After training and validating, we can obtain two results. If our model seems to perform well, then we can proceed to the next stage. But if our model is not performing as we expected, then we need to think about the cause of that underperformance. Maybe our hypotheses are wrong and we didnt make a correct test. Or it could be some problems with the data (including not enough data). In any case, we need to return to a previous step and repeat part or all the process.

By productizing, we mean to make the solution available to be used. It could be an online service like the recommendation system of Amazon or Netflix. It could be a desktop application to help a company in the decision-making process. The idea is to make the solution available to the end-user.

Cloud services have become a suitable and useful environment to host Predictive Analytics products in recent years.

There is one last stage. After the release of our product, a lot of new data will appear. After some time, the world will change significantly, and our model might get outdated. So, collecting new fresh data and retraining our model is the last step of this process that we can view now as a cycle.

In this article, we explained what Predictive Analytics is, and how it is used.

Every single successful company these days uses Predictive Analytics. Actually, they are successful in part because of using Predictive Analytics.

This is a weapon that combines the power of Computer Science and Mathematics to predict the future and help to make that future better or worse for many people. By understanding the methods and processes involved in Predictive Analytics, we can help to make that future better for everyone.

]]>Similarly, there are problems for which greedy algorithms don't yield the best solution. Actually, they might yield the worst possible solution. But there are other cases in which we can obtain a solution that is good enough by using a greedy strategy.

In this article, I'll write about greedy algorithms and the use of this strategy even when it doesn't guarantee to find an optimal solution.

The next section is an introduction to greedy algorithms and well-known problems that are solvable using this strategy. Then I'll talk about problems in which the greedy strategy is a really bad option and finally, I show you an example of a good approximation through a greedy algorithm.

Note: Most of the algorithms and problems I discuss in this article include graphs. It would be good if you are familiarized with graphs to get the most out of this post.

Greedy algorithms always choose the best available option. In general, they are computationally cheaper than other families of algorithms like dynamic programming, or brute force. That's because they don't explore the solution space too much. And, for the same reason, they don't find the best solution to a lot of problems.

But there are lots of problems that are solvable with a greedy strategy, and that strategy is precisely the best way to go.

One of the most popular greedy algorithms is Dijkstra's algorithm to find the path with the minimum cost from one vertex to the others in a graph. This algorithm finds such a path by always going to the nearest vertex. That's why we say it is a greedy algorithm.

This is a pseudocode of the algorithm. I denote with `G`

the graph and with `s`

the source node.

```
Dijkstra(G, s):
distances <- list of length equal to the number of nodes of the graph, initially it has all its elements equal to infinite
distances[s] = 0
queue = the set of vertices of G
while queue is not empty:
u <- vertex in queue with min distances[u]
remove u from queue
for each neighbor v of u:
temp = distances[u] + value(u,v)
if temp < distances[v]:
distances[v] = temp
return distances
```

After running the previous algorithm we get a list `distances`

such that `distances[u]`

is the minimum cost to go from node `s`

to node `u`

.

This algorithm is guaranteed to work only if the graph hasn't edges with negative costs. A negative cost in an edge can make the greedy strategy to choose a path that is not optimum.

Another example that is used to introduce the concepts of the greedy strategy is the Fractional Knapsack.

In this problem, we have a collection of items. Each item has a weight `Wi`

greater than zero, and a profit `Pi`

also greater than zero. We have a knapsack with a capacity `W`

and we want to fill it in such a way that we get the maximum profit. Of course, we cannot exceed the capacity of the knapsack.

In the fractional version of the knapsack problem, we can take either the entire object or only a fraction of it. When taking a fraction `0 <= X <= 1`

of the i-th object, we obtain a profit equal to `X*Pi`

and we need to add `X*Wi`

to the bag. We can solve this problem by using a greedy strategy. I won't discuss the solution here. If you don't know it I recommend you try to solve it by yourself and then look for the solution online.

The number of problems that we can solve by using greedy algorithms is huge. But the number of problems that we cannot solve this way is even bigger. The next section is about the latter ones.

In the previous section, we saw two examples of problems that are solvable using a greedy strategy. This is great because those are pretty fast algorithms.

But, as I said, Dijkstra's algorithm doesn't work in graphs with negative edges. And the problem is even bigger. I can always build a graph with negative edges in a way that Dijkstra's solution was as bad as I wanted! Consider the following example that was extracted from Stackoverflow

Dijkstra's algorithm fails to find the distance between `A`

and `C`

. It finds `d(A, C) = 0`

when it should be -200. And if we decrease the value of the edge `D -> B`

, we'll obtain a distance that we'll be even farther from the actual minimum distance.

Similarly, when we can't break objects in the knapsack problem (the 0-1 Knapsack Problem), the solution that we obtain when using a greedy strategy can be as bad as we want. We always can build an input to the problem that makes the greedy algorithm to fail badly.

Another example is the Travelling Salesman Problem (TSP). Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city?

We can greedily approach the problem by always going to the nearest possible city. We select any of the cities as the first one and apply that strategy.

As happened in previous examples, we can always build a disposition of the cities in a way that the greedy strategy finds the worst possible solution.

In this section, we have seen that a greedy strategy could lead us to disaster. But there are problems in which such an approach can approximate the optimal solution quite well.

We have seen that greedy strategy is as bad as we want for some problems. This means that we cannot use it to obtain the optimal solution nor even a good approximation to it.

But there are some examples in which greedy algorithms provide us with very good approximations! In these cases, the greedy approach is very useful because it tends to be cheaper and easier to implement.

The vertex cover of a graph is the minimum set of vertices such that every edge of the graph has at least one of its endpoints in the set.

This is a very hard problem. Actually, there isn't any efficient and exact solution for it. But the good news is that we can make a good approximation with a greedy algorithm.

We select any edge `<u, v>`

from the graph, and add `u`

and `v`

to the set. Then, we remove all the edges that have `u`

or `v`

as one of their endpoints, and we repeat the previous process while the remaining graph had edges.

This might be a pseudocode of the previous algorithm.

```
vertexCover(G):
VertexCover <- {} // empty set
E' <- edges of G
while E' is not empty:
VertexCover <- VertexCover U {u,v} where <u,v> is in E'
E' = E' - {<u, v> U edges incident to u, v}
return VertexCover
```

As you can see, this is a simple and relatively fast algorithm. But the best part is that the solution will always be less or equal to two times the optimal solution! We'll never obtain a set that is bigger than two times the smaller vertex cover, no matter how the input graph was built.

I'm not going to include the demonstration of this statement in this post, but you can prove it by noticing that for every edge `<u, v>`

that we add to the vertex cover, either `u`

or `v`

are in the optimal solution (i.e. in the smaller vertex cover).

Many computer scientists are working to find more of these approximations. There are more examples, but I'm going to stop here. This is an interesting and very active research field in Computer Science and Applied Mathematics. With these approximations, we can get very good solutions for very hard problems by implementing pretty simple algorithms.

In this post, I made a shallow introduction to greedy algorithms. We saw examples of problems that can be solved using the greedy strategy. Then, I talk about some problems for which the greedy strategy is too bad, and finally, we saw an example of a greedy algorithm to get an approximated solution for a hard problem.

Sometimes we can solve a problem using a greedy approach but it is hard to come up with the right strategy. And demonstrating the correctness of greedy algorithms (for exact or approximated solutions) can be very difficult. So, there are a lot of things we can talk about greedy algorithms!

If you enjoyed this post and want me to keep this type of content coming let me know by reacting and/or commenting. You can also follow me on Twitter for more CS-related content.

]]>We are always implementing design patterns and abstractions to adapt our programming language to the domain of the problem. We create the classes `Store`

, `Product`

, and `Customer`

. We create functions to represent the operations of selling, buying, update the inventory, etc. But all those building blocks are always expressed in terms of a general-purpose language that was not thought to fit our very immediate necessities.

In this article, I'll talk about the paradigm that is used in the Lisp programming language. With Lisp, we not only create a program, but we also create a specific language at the same time. Thus, we end up with a Domain-Specific Language (DSL) and a short, elegant, and clean program.

We always create new entities and operators when programming. We implement functions with meaningful names that express part of the domain of our problem. But those functions and classes are created in the same way, with a rigid syntactical structure. No matter how elaborated your abstractions are, you always build them in the same way. For example, creating a class in Python will always be like:

```
class <ClassName>(<ParentClass>):
<ClassBody>
```

Maybe you can imagine another more expressive way to create classes in the domain you are working in, but there is no way to change that syntax into a kind of shortcut to create those domain-specific classes. You can solve a lot of problems with inheritance and other mechanisms, but a function always needs to be built with the same syntactical structure, and the same happens with classes.

You can redefine operators like `+`

, and `==`

. But you can't modify what the `class`

reserved word does. You can't add a new operator to create classes neither.

All the programs you create in Python are written with the same vocabulary. Your coding expertise can make the code to be expressive, but no matter the expertise, that expressiveness will always have a limit. A limit imposed by the syntax of the programming language and its reserved words.

Most of the developer community is so used to these limitations that not even question them. We learn to code in a specific language, we learn some algorithms and the way we can translate them to that programming language. We get used to build programs in that way.

But there's another way. I just want to make a little detour right now and then dive into the main topic.

If you have programmed in C/C++, Assembly, or some other language that supports macros (but not the Lisp-like macros), you could think that the previous section problem does not apply to that language.

There is a big difference between C-like macros and what we're going to talk about in the next section.

With C-like macros, we can define a pattern and then substitute any occurrence of that pattern in our program for a piece of code, in compilation time. For example, we can do

```
#define MAX 1000000
...
int array[MAX];
...
```

Before actually compiling the program, the C compiler substitutes any occurrence of `MAX`

for `1000000`

. Extending that functionality, we could modify the syntax for `for`

loops and `if`

statements if we wanted. But there's still an important limitation.

We don't have all the power of the language to define those macros. We just can define some mapping between string patterns and pieces of code. And macros can't make any computation in compilation time. They're just a text substitution, nothing more.

Let's see the real power of Lisp.

As I said before, when you code in Lisp (in the right way), you are doing two things. Firstly, creating a new language that is specific to your domain, and secondly, coding a program in that expressive and very convenient language you've just created.

Actually, there's no order. You create both the language and the program simultaneously. Both of them evolve during the development process and the final product is a highly readable and clean program.

You can use another brand new operator to create classes that represent stores for example:

```
(define-store my-store (:capacity 100 :budget 100000 :description "My very first store template"))
```

That could be the equivalent to build a class named `my-store`

, which might implicitly inherit from a `store`

parent class. We are passing a `capacity`

, a `budget`

, and a `description`

. Then, in compilation time, the `define-store`

operator can do all the things you can imagine, maybe it is not even a class definition. The thing is, we are declaring an entity in a very clear and domain-specific way.

It might be a toy example and it is totally out of context. But the cool thing is that you can imagine any sort of complicated computations running under the hood, and that will be a possible scenario. With enough expertise, you can transform Lisp into the language you need to make the shortest and cleanest program.

In the above example, the operator `define-store`

is a macro. A macro in Lisp is like a special function. They can receive parameters and they are executed in compilation time, in a step called *macroexpansion*. Then, the macro call is substituted by its own result in compilation time. For example, we can define the following macro:

```
(defmacro hundred () (* 2 50))
```

We defined a macro called `hundred`

that receives no parameters. It returns the result of multiplying 2 and 50.

Notice that we write the arithmetic operator first and the operands later. One of the advantages is that we can make

`(* 2 50 100 200)`

and it will return the product of all those factors. But of course, you can use some advanced Lisp magic to change this syntax ;-)

Now, what happens if we write

```
(+ hundred 30)
```

The first step is *macroexpansion*. The compiler substitutes the macro call for the result of that call and the code becomes:

```
(+ 100 30)
```

Then, the program is executed and we obtain the result `130`

.

Notice the difference with C-like macros. There was more than just text substitution, there was computation and then, code substitution.

If we change the macro definition by adding just one more character (notice the quote):

```
(defmacro hundred () '(* 2 50))
```

Then the *macroexpansion* changes a little bit too:

```
(+ (* 2 50) 30)
```

But the output remains the same. Now the macro returns the code as-is, and there are no further computations in compilation time.

Talking deeper about Lisp macros requires to have a basic knowledge of Lisp programming language. In this article, I don't assume any prior knowledge about Lisp. So, the examples I can use are very limited. It wouldn't make any sense to include a complex macro definition. But my point is not to teach you how to define some macros in Lisp but to show you another way of thinking about programming.

The idea of building your program and the language in which your program is written at the same time is, at least, beautiful. The elegance of the process can only be compared with that of the final result.

We end up with two products. The desired program, but also a language that we can use totally or just partially in our next journey.

Just knowing about this philosophy is totally worth it.

In this article, I have talked about Lisp's macros.

They are a powerful feature that allows us to define new Domain-Specific Languages for every problem we face. With them, we can create the cleanest programs and get the language in which that program is written at the same time.

That immense power makes us approach programming in a very different way.

This article is part of a series called "Cool things about Lisp". I plan to add more articles to this series in the future. Trust me, there's a lot of more cool things to talk about. Let me know if you like this content and want me to keep writing about it. Leave your impression and/or your comment.

You can follow me on Twitter. I'm always writing about Computer Science stuff.

]]>In this post, I explain "the trick" behind NBC and I'll give you an example that we can use to solve a classification problem.

In the next sections, I'll be talking about the math behind NBC. Feel free to skip those sections and go to the implementation part if you are not interested in the math.

In the implementation section, I'll show you a simple NBC algorithm. Then we'll use it to solve a classification problem. The task will be to determine whether a certain passenger on the Titanic survived the accident or not.

Before talking about the algorithm itself, let's talk about the simple math behind it. We need to understand what conditional probability is and how can we use Bayes's Theorem to calculate it.

Think about a fair die with six sides. What's the probability of getting a six when rolling the die? That's easy, it's 1/6. We have six possible and equally likely outcomes but we are interested in just one of them. So, 1/6 it is.

But what happens if I tell you that I have rolled the die already and the outcome is an even number? What's the probability that we have got a six now?

This time, the possible outcomes are just three because there are only three even numbers on the die. We are still interested in just one of those outcomes, so now the probability is greater: 1/3. What's the difference between both cases?

In the first case, we had no prior information about the outcome. Thus, we needed to consider every single possible result.

In the second case, we were told that the outcome was an even number, so we could reduce the space of possible outcomes to just the three even numbers that appear in a regular six-sided die.

In general, when calculating the probability of an event A, given the occurrence of another event B, we say we are calculating the conditional probability of A given B, or just the probability of A given B. We denote it by `P(A|B)`

.

For example, the probability of getting a six given that the number we have got is even: `P(Six|Even) = 1/3`

. Here we, denoted with Six the event of getting a six and with Even the event of getting an even number.

But, how do we calculate conditional probabilities? Is there a formula?

Now, I'll give you a couple of formulas to calculate conditional probs. I promise they won't be hard, and they are important if you want to understand the insights of the Machine Learning algorithms we'll be talking about later.

The probability of an event A given the occurrence of another event B can be calculated as follows:

```
P(A|B) = P(A,B)/P(B)
```

Where `P(A, B)`

denotes the probability of both A and B occurring at the same time, and P(B) denotes the probability of B.

Notice that we need P(B) > 0 because it makes no sense to talk about the probability of A given B if the occurrence of B is not possible.

We can also calculate the probability of an event A, given the occurrence of multiple events B1, B2,..., Bn:

```
P(A|B1,B2,...,Bn) = P(A,B1,B2,...,Bn)/P(B1,B2,...,Bn)
```

There's another way of calculating conditional probs. This way is the so-called Bayes's Theorem.

```
P(A|B) = P(B|A)P(A)/P(B)
P(A|B1,B2,...,Bn) = P(B1,B2,...,Bn|A)P(A)/P(B1,B2,...,Bn)
```

Notice that we are calculating the probability of event A given event B, by inverting the order of occurrence of the events.

Now we suppose event A has occurred and we want to calculate the prob of event B (or events B1, B2,..., Bn in the second and more general example).

An important fact that can be derived from this Theorem is the formula to calculate `P(B1, B2,..., Bn, A)`

. That's called the chain rule for probabilities.

```
P(B1,B2,...,Bn,A) = P(B1 | B2, B3, ..., Bn, A)P(B2,B3,...,Bn,A)
= P(B1 | B2, B3, ..., Bn, A)P(B2 | B3, B4, ..., Bn, A)P(B3, B4, ..., Bn, A)
= P(B1 | B2, B3, ..., Bn, A)P(B2 | B3, B4, ..., Bn, A)...P(Bn | A)P(A)
```

That's an ugly formula, isn't it? But under some conditions, we can make a workaround and avoid it.

Let's talk about the last concept we need to know to understand the algorithms.

The last concept we are going to talk about is independence. We say events A and B are independent if

```
P(A|B) = P(A)
```

That means that the prob of event A is not affected by the occurrence of event B. A direct consequence is that `P(A,B) = P(A)P(B)`

.

In plain English, this means that the prob of the occurrence of both A and B at the same time is equal to the product of the probs of events A and B occurring separately.

If A and B are independent, it also holds that:

```
P(A,B|C) = P(A|C)P(B|C)
```

Now we are ready to talk about Naive Bayes Classifiers!

Suppose we have a vector X of n features and we want to determine the class of that vector from a set of k classes y1, y2,..., yk. For example, if we want to determine whether it'll rain today or not.

We have two possible classes (k = 2): rain, not rain, and the length of the vector of features might be 3 (n = 3).

The first feature might be whether it is cloudy or sunny, the second feature could be whether humidity is high or low, and the third feature would be whether the temperature is high, medium, or low.

So, these could be possible feature vectors.

```
<Cloudy, H_High, T_Low>
<Sunny, H_Low, T_Medium>
<Cloudy, H_Low, T_High>
```

Our task is to determine whether it'll rain or not, given the weather features.

After learning about conditional probabilities, it seems natural to approach the problem by trying to calculate the prob of raining given the features:

```
R = P(Rain | Cloudy, H_High, T_Low)
NR = P(NotRain | Cloudy, H_High, T_Low)
```

If R > NR we answer that it'll rain, otherwise we say it won't.

In general, if we have k classes y1, y2, ..., yk, and a vector of n features `X = <X1, X2, ..., Xn>`

, we want to find the class yi that maximizes

```
P(yi | X1, X2, ..., Xn) = P(X1, X2,..., Xn, yi)/P(X1, X2, ..., Xn)
```

Notice that the denominator is constant and it does not depend on the class yi. So, we can ignore it and just focus on the numerator.

In a previous section, we saw how to calculate `P(X1, X2,..., Xn, yi)`

by decomposing it in a product of conditional probabilities (the ugly formula):

```
P(X1, X2,..., Xn, yi) = P(X1 | X2,..., Xn, yi)P(X2 | X3,..., Xn, yi)...P(Xn | yi)P(yi)
```

Assuming all the features Xi are independent and using Bayes's Theorem, we can calculate the conditional probability as follows:

```
P(yi | X1, X2,..., Xn) = P(X1, X2,..., Xn | yi)P(yi)/P(X1, X2, ..., Xn)
= P(X1 | yi)P(X2 | yi)...P(Xn | yi)P(yi)/P(X1, X2, ..., Xn)
```

And we just need to focus on the numerator.

By finding the class yi that maximizes the previous expression, we are classifying the input vector. But, how can we get all those probabilities?

When solving these kind of problems we need to have a set of previously classified examples.

For instance, in the problem of guessing whether it'll rain or not, we need to have several examples of feature vectors and their classifications that they would be obtained from past weather forecasts.

So, we would have something like this:

```
...
<Cloudy, H_High, T_Low> -> Rain
<Sunny, H_Low, T_Medium> -> Not Rain
<Cloudy, H_Low, T_High> -> Not Rain
...
```

Suppose we need to classify a new vector . We need to calculate:

```
P(Rain | Cloudy, H_Low, T_Low) = P(Cloudy | H_Low, T_Low, Rain)P(H_Low | T_Low, Rain)P(T_Low | Rain)P(Rain)/P(Cloudy, H_Low, T_Low)
```

We get the previous expression by applying the definition of conditional probability and the chain rule. Remember we only need to focus on the numerator so we can drop the denominator.

We also need to calculate the prob for NotRain, but we can do this in a similar way.

We can find `P(Rain) = # Rain/Total`

. That means counting the entries in the dataset that are classified with Rain and dividing that number by the size of the dataset.

To calculate `P(Cloudy | H_Low, T_Low, Rain)`

we need to count all the entries that have the features H_Low, T_Low, and Cloudy. Those entries also need to be classified as Rain. Then, that number is divided by the total amount of data. We calculate the rest of the factors of the formula in a similar fashion.

Making those computations for every possible class is very expensive and slow. So we need to make assumptions about the problem that simplify the calculations.

Naive Bayes Classifiers assume that all the features are independent from each other. So we can rewrite our formula applying Bayes's Theorem and assuming independence between every pair of features:

```
P(Rain | Cloudy, H_Low, T_Low) = P(Cloudy | Rain)P(H_Low | Rain)P(T_Low | Rain)P(Rain)/P(Cloudy, H_Low, T_Low)
```

Now we calculate `P(Cloudy | Rain)`

counting the number of entries that are classified as Rain and were Cloudy.

The algorithm is called Naive because of this independence assumption. There are dependencies between the features most of the time. We can't say that in real life there isn't a dependency between the humidity and the temperature, for example. Naive Bayes Classifiers are also called Independence Bayes, or Simple Bayes.

The general formula would be:

```
P(yi | X1, X2, ..., Xn) = P(X1 | yi)P(X2 | yi)...P(Xn | yi)P(yi)/P(X1, X2, ..., Xn)
```

Remember you can get rid of the denominator. We only calculate the numerator and answer the class that maximizes it.

Now, let's implement our NBC, and let's use it in a problem.

I will show you an implementation of a simple NBC and then we'll see it in practice.

The problem we are going to solve is determining whether a passenger on the Titanic survived or not, given some features like their gender and their age.

Here you can see the implementation of a very simple NBC:

```
class NaiveBayesClassifier:
def __init__(self, X, y):
'''
X and y denotes the features and the target labels respectively
'''
self.X, self.y = X, y
self.N = len(self.X) # Length of the training set
self.dim = len(self.X[0]) # Dimension of the vector of features
self.attrs = [[] for _ in range(self.dim)] # Here we'll store the columns of the training set
self.output_dom = {} # Output classes with the number of ocurrences in the training set. In this case we have only 2 classes
self.data = [] # To store every row [Xi, yi]
for i in range(len(self.X)):
for j in range(self.dim):
# if we have never seen this value for this attr before,
# then we add it to the attrs array in the corresponding position
if not self.X[i][j] in self.attrs[j]:
self.attrs[j].append(self.X[i][j])
# if we have never seen this output class before,
# then we add it to the output_dom and count one occurrence for now
if not self.y[i] in self.output_dom.keys():
self.output_dom[self.y[i]] = 1
# otherwise, we increment the occurrence of this output in the training set by 1
else:
self.output_dom[self.y[i]] += 1
# store the row
self.data.append([self.X[i], self.y[i]])
def classify(self, entry):
solve = None # Final result
max_arg = -1 # partial maximum
for y in self.output_dom.keys():
prob = self.output_dom[y]/self.N # P(y)
for i in range(self.dim):
cases = [x for x in self.data if x[0][i] == entry[i] and x[1] == y] # all rows with Xi = xi
n = len(cases)
prob *= n/self.N # P *= P(Xi = xi)
# if we have a greater prob for this output than the partial maximum...
if prob > max_arg:
max_arg = prob
solve = y
return solve
```

Here, we assume every feature has a discrete domain. That means they take a value from a finite set of possible values.

The same happens with classes. Notice that we store some data in the **init** method so we don't need to repeat some operations. The classification of a new entry is carried on in the classify method.

This is a simple example of an implementation. In real-world applications, you don't need (and is better if you don't make) your own implementation. For example, the sklearn library in Python contains several good implementations of NBC's.

Notice how easy it is to implement it!

Now, let's apply our new classifier to solve a problem. We have a dataset with a description of 887 passengers on the Titanic. We also can see whether a given passenger survived the tragedy or not.

So our task is to determine if another passenger that is not included in the training set made it or not.

In this example, I'll be using the **pandas** library to read and process the data. I don't use any other tool.

The data is stored in a file called titanic.csv, so the first step is to read the data and get an overview of it.

```
import pandas as pd
data = pd.read_csv('titanic.csv')
print(data.head())
```

The output is:

```
Survived Pclass Name \
0 0 3 Mr. Owen Harris Braund
1 1 1 Mrs. John Bradley (Florence Briggs Thayer) Cum...
2 1 3 Miss. Laina Heikkinen
3 1 1 Mrs. Jacques Heath (Lily May Peel) Futrelle
4 0 3 Mr. William Henry Allen
Sex Age Siblings/Spouses Aboard Parents/Children Aboard Fare
0 male 22.0 1 0 7.2500
1 female 38.0 1 0 71.2833
2 female 26.0 0 0 7.9250
3 female 35.0 1 0 53.1000
4 male 35.0 0 0 8.0500
```

Notice we have the Name of each passenger. We won't use that feature for our classifier because it is not significant for our problem. We'll also get rid of the Fare feature because it is continuous and our features need to be discrete.

There are Naive Bayes Classifiers that support continuous features. For example, the Gaussian Naive Bayes Classifier.

```
y = list(map(lambda v: 'yes' if v == 1 else 'no', data['Survived'].values)) # target values as string
# We won't use the 'Name' nor the 'Fare' field
X = data[['Pclass', 'Sex', 'Age', 'Siblings/Spouses Aboard', 'Parents/Children Aboard']].values # features values
```

Then, we need to separate our data set into a training set and a validation set. The latter is used to validate how well our algorithm is doing.

```
print(len(y)) # >> 887
# We'll take 600 examples to train and the rest to the validation process
y_train = y[:600]
y_val = y[600:]
X_train = X[:600]
X_val = X[600:]
```

We create our NBC with the training set and then classify every entry in the validation set.

We measure the accuracy of our algorithm by dividing the number of entries it correctly classified by the total number of entries in the validation set.

```
## Creating the Naive Bayes Classifier instance with the training data
nbc = NaiveBayesClassifier(X_train, y_train)
total_cases = len(y_val) # size of validation set
# Well classified examples and bad classified examples
good = 0
bad = 0
for i in range(total_cases):
predict = nbc.classify(X_val[i])
# print(y_val[i] + ' --------------- ' + predict)
if y_val[i] == predict:
good += 1
else:
bad += 1
print('TOTAL EXAMPLES:', total_cases)
print('RIGHT:', good)
print('WRONG:', bad)
print('ACCURACY:', good/total_cases)
```

The output:

```
TOTAL EXAMPLES: 287
RIGHT: 200
WRONG: 87
ACCURACY: 0.6968641114982579
```

It's not great but it's something. We can get about a 10% accuracy improvement if we get rid of other features like Siblings/Spouses Aboard and Parents/Children Aboard.

You can see a notebook with the code and the dataset here

Today, we have neural networks and other complex and expensive ML algorithms all over the place.

NBCs are very simple algorithms that let us achieve good results in some classification problems without needing a lot of resources. They also scale very well, which means we can add a lot more features and the algorithm will still be fast and reliable.

Even in a case where NBCs were not a good fit for the problem we were trying to solve, they might be very useful as a baseline.

We could first try to solve the problem using an NBC with a few lines of code and little effort. Then we could try to achieve better results with more complex and expensive algorithms.

This process can save us a lot of time and gives us immediate feedback about whether complex algorithms are really worth it for our task.

In this article, you read about conditional probabilities, independence, and Bayes's Theorem. Those are the Mathematical concepts behind Naive Bayes Classifiers.

After that, we saw a simple implementation of an NBC and solved the problem of determining whether a passenger on the Titanic survived the accident.

I hope you found this article useful. You can read about Computer Science related topics in my personal blog and by following me on Twitter.

]]>With an array, I can get the i-th element really fast (this will be called the *index* operation. I provide an index `i`

and get the `i-th`

element of the list), no matter what the value of i is. But when it comes to inserting or removing an element from the middle of the array things change a little bit.

On the other hand, linked lists offer a slower index operation. We need to traverse a long segment of the list to get the element we are looking for. But we can make faster insertions and deletions.

In this article, I will write about a solution I came up with. This is a "new" data structure that somehow combines arrays and linked lists to get a middle point that suited my requirements. I don't know whether this is a truly novel data structure. Furthermore, I don't think it is an awesome idea. I just used some clever and relatively popular techniques to achieve my goal.

The next section is about the differences between arrays and linked lists, their respective strengths, and their weaknesses. If you are just interested in the data structure specifics, you can skip it. Then I'll talk about the core idea, and later, I'll give some implementation details.

IMPORTANT! I assume you have some basic knowledge about algorithmic complexity and big-O notation.

Arrays are contiguous memory slots to store sequences of data. We arrange some elements in such a way that we keep a reference to every single one of them. For example, if we have an array that contains numbers from 1 to 10, we have a reference to every memory slot that contains each of those ten numbers. That allows us to get any element by the index it has in the array really fast.

```
# Example of array
array = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
array[3]
# >> 4
array[0]
# >> 1
```

We have an implicit mapping from index to element. No matter the value of the index, the operation will always be equally fast. That means that doing `array[0]`

is as fast as doing `array[9]`

because we keep all the references at the same time. That's the most useful feature of arrays, we have access to any position in constant time, no matter the size of the array, nor the position itself.

This great feature is also the reason we can't insert or delete elements with the same ease. We need to keep elements in contiguous positions and maintain the references to them. When deleting an element, we are creating holes in the middle of the array, and that is not allowed. So we need to fix the references manually and fill the holes. The same happens with insertions, in this case, we need to create the hole first and then, fill it with the new element.

Thus, to insert/delete an element, we need to rearrange the array. If we have an array with `N`

elements, we'll need to rearrange `O(N)`

of those elements when doing an insertion/deletion. I needed to do it faster.

Linked lists don't maintain all those references in the same way. They are composed of nodes that store an element but also the reference to the next node. This way we only maintain a direct reference to the first node of the list (called the head), this node has a reference to the second one, the second one has a reference to the third one, and so on. In the case of doubly-linked lists, every node has a reference to both the next and the previous node, and we also keep a direct reference to the tail of the list (the last node).

Thus, looking for an element in a certain position implies to traverse `O(N)`

nodes in the worst case. The good thing is that we can insert or delete an element without worrying about the holes because we only need to change a few references. But I needed to have access to positions in the lists in a faster way.

I needed to make faster insertion/deletion (better than `O(N)`

), but yet be able to retrieve the `i-th`

element of the list in less than `O(N)`

operations as well. Is important to note that when I say insertion I mean an operation that receives an element and **a position**, and then inserts the element in that position. The delete operation is defined in the same way.

In the next section, I'll introduce a data structure that lets us fulfill such requirements.

As we saw in the previous section, the "problem" with arrays is that having all those references make the insertion/deletion slow. But the few references of linked lists are a problem as well, because we need to make a lot of operations to get the element at a given position. Thus, it seems reasonable to think about finding a middle point in the number of references.

First of all, I need to say that no data structure lets us make all the index, insert, and delete operations in constant time. So, a "perfect world" is out of the discussion. Let's move on.

I was thinking about having a doubly-linked list with more direct references. Instead of just keeping a reference to the first and the last node, I will keep also references to inner nodes, but how many references would I need? If I maintained just a few of them, the index operation might be slow, but if I maintained too many references, then insertions and deletions would take more time.

After thinking for a while, I came up with the right number of references: `sqrt(N)`

(here `sqrt`

stands for the square root operation). The idea is to keep a list of references, the length of that list will be `sqrt(N)`

. The other important property is that every referenced node will be at a distance of `sqrt(N)`

from the next node in the reference list.

For example, suppose we have a doubly-linked list with the numbers `1, 2, 3, 4, 5`

. Then our reference list will contain two references because of `2 < sqrt(5) < 3`

. Also, every node referenced in the list will be at a distance of `sqrt(5)`

of the other node referenced in the list. So a possible list will be one that contains the second and the fourth node. Another possibility would be a list with the third and the fifth nodes. In both cases, the distance between nodes is less than `sqrt(5)`

.

When I say

`sqrt(N)`

I mean the largest integer`n`

such that`n <= sqrt(N)`

By keeping such a reference list we can make all the index, insert, and delete operations in `O(sqrt(N))`

. Note that we got a slower index operation to achieve a faster insertion/deletion.

By the way, I borrowed some ideas from a technique called

sqrt decomposition. You can check it out here.

Now I'm going to explain the insert, delete, and index operations.

As I said, I maintain a list of selected nodes. The length of this list is always `sqrt(N)`

and the distance between two consecutive selected nodes in the actual list will be also `sqrt(N)`

.

Every time we insert a new node in the list we need to check whether the new length of the list has a new integer `sqrt(N)`

. In that case, we need to update the selected nodes list. It would be like:

```
import math
class IndexedLinkedListNode:
def __init__(self, content, prev=None, next=None):
self.content = content
self.prev = prev
self.next = next
def __eq__(self, other):
return self.content == other.content
class IndexedLinkedList:
def __init__(self, elements=[]):
self.selected_nodes = []
self.length = 0
self.head = None
self.tail = None
for e in elements:
self.insert(e)
def insert(self, element):
'''
Inserts an element at the end of the list
'''
new_node = IndexedLinkedListNode(element)
sqlen = math.floor(math.sqrt(self.length))
if self.length == 0:
self.selected_nodes.append(new_node)
self.head = new_node
self.tail = new_node
else:
self.tail.next = new_node
new_node.prev = self.tail
self.tail = new_node
self.length += 1
if math.floor(math.sqrt(self.length)) > sqlen:
self.update_list_cause_insertion(new_node)
def update_list_cause_insertion(self, new_node):
'''
Here we make every selected node become its next node in the list and add the
new node to the list
'''
self.selected_nodes = list(map(lambda node : node.next, self.selected_nodes))
self.selected_nodes.append(new_node)
def insert_at(self, element, index):
'''
Here we insert the new node before the node that is in the specified position.
You can implement yourself a very similar method to insert the new element
AFTER the specified position.
'''
new_node = IndexedLinkedListNode(element)
target = self[index]
sqlen = math.floor(math.sqrt(self.length))
self.length += 1
if index == 0:
self.head.prev = new_node
new_node.next = self.head
self.head = new_node
else:
target.prev.next = new_node
new_node.prev = target.prev
new_node.next = target
target.prev = new_node
if math.floor(math.sqrt(self.length)) > sqlen:
self.update_list_cause_insertion(self.tail)
```

It can be proved that we always obtain a selected nodes list of length `sqrt(N)`

and with at most `sqrt(N)`

distance between any pair of consecutive nodes. I'm going to skip the demonstration to keep the article as simple as possible.

This is a very similar method. Is like doing the inverse operation. We just need to take care of some corner cases like deleting the only node in the list and deleting the first or the last node.

```
...
def remove_at(self, index):
target = self[index]
sqlen = math.floor(math.sqrt(self.length))
selected_pos = None
try:
selected_pos = self.selected_nodes.index(target)
except:
pass
if self.length <= 1:
self.head = None
self.tail = None
self.selected_nodes = []
self.length = max(0, self.length - 1)
return
if target.prev:
target.prev.next = target.next
else:
self.head = target.next
if target.next:
target.next.prev = target.prev
else:
self.tail = target.prev
self.length -= 1
if selected_pos:
for selected in self.selected_nodes[selected_pos:]:
selected = selected.next
if math.floor(math.sqrt(self.length)) < sqlen:
self.update_list_cause_removal()
def update_list_cause_removal(self):
'''
Here we make every selected node become its prev node in the list and remove
the last selected node
'''
self.selected_nodes = self.selected_nodes[:-1]
self.selected_nodes = list(map(lambda node : node.prev, self.selected_nodes))
...
```

Until here we have achieved the requirements for insertion and deletion methods. But that is assuming that the index operation (`self[index]`

) could be done also in the required time.

Let's see how to implement the index operation. This is the last piece of the puzzle. It is a trickier method that would require more explanation to understand the insights. As I said, I would like to keep this article as simple as possible so I'm not going to explain too many details about this operation.

```
...
def __getitem__(self, index):
if len(self.selected_nodes) == 0 or index < 0 or index >= self.length:
return None
first_index = math.floor(math.sqrt(self.length)) - 1 # index of the first selected node
first_node = self.selected_nodes[0]
if index <= first_index:
return self.prevn(first_node, first_index - index)
aux_idx = index - first_index
target_k = self.integer_positive_root(aux_idx)
target_index = target_k**2 + target_k + first_index
index -= target_index
return self.nextn(self.selected_nodes[target_k], index)
def integer_positive_root(self, index):
'''
Solves the equation K^2 + K - index
'''
return math.floor((-1 + math.sqrt(1 + 4*index))/2)
## Utility functions to make the prev and next operations N times
def prevn(self, node, n):
for _ in range(n):
node = node.prev
return node
def nextn(self, node, n):
for _ in range(n):
node = node.next
return node
```

That's it. We have completed our Indexed Linked List implementation!

In this article, we defined a new data structure that allows us to do insertions, deletions, and indexing in `O(sqrt(N))`

where `N`

is the length of the list.

I didn't want to go into the implementation details and proofs of correctness and time complexities because I wanted the article to be accessible for a wider audience. I know some parts of the code can be obscure right now. But I consider that talking about a problem and sharing the ideas to solve it is very helpful for everybody.

You can see a Lisp implementation of the data structure along with some insights here

If you want me to make a more detailed article with all the formulas and demonstrations just let me know. You can react to this article and leave a comment, or you can follow me or @-me on Twitter to talk about this or any other CS-related topic.

]]>I'd love to have tons of people reading my blog and liking my content. The only social media I'm actively using right now is Twitter. I only use it to get in touch with the community of developers and computer scientists out there. I only tweet about those topics and it'd be amazing to have a lot of followers to debate with. Yes, having a strong online presence is amazing, but you already knew that. The thing is, I don't write for you, nor anyone but me. I really appreciate you writing a comment on any of my posts, and liking them, and following me on Twitter too. Furthermore, I'm very interested in your opinion and feedback and want everyone can enjoy what I'm writing. But I write to myself.

Writing to myself means I'm not going to write about something I'm not interested in. I know that I could write something about a more popular subject and maybe that way I could get more followers and reactions. That's not my goal, at least right now. I really love to write, it's an amazing learning exercise, I could be helping others which is an amazing feeling, and I'm storing my thoughts for my future self at the same time, which could be an amazing advantage for him. But if I begin to take care of what is popular and what isn't, I'd be losing all the fun.

I try to write about different things, but always being consistent with why I think is interesting. Notice that doesn't imply that I just write about topics inside my comfort zone, it just means I won't write about what I think is not interesting.

But why did I begin to write now?

I must recognize the main reason was the pandemic situation. I love to share my knowledge and to discuss with other people about Computer Science, Tech, and Software Engineering. Entering a classroom and spending 90 minutes teaching is almost a therapeutical exercise for me. With the pandemic shut down, I got far from the University and the classrooms. In the first months, I was finishing my thesis and doing some work. But eventually, I started feeling nostalgic about my routine. So I began to use Twitter actively (I made my account in 2018 or 2019 but I rarely used it) and started to have great debates with people from the community. I must say the Twitter community of devs and computer scientists is amazing!

But I wanted to post differently. I admire people that can make those great, visually attractive, and interesting threads. I try it sometimes, it takes a lot of effort, at least to me. Is easier for me to write an article without taking care of the number of characters.

I just wanted to care about writing. Even my blog is the most minimalistic Hashnode blog you can do. Sometimes I feel curious about what would happen if I add a picture or some styles. It would be interesting if that really increase the impact of my posts, but I'm lazy enough to don't do it.

So it's all about the content. But what I have written so far?

My first article was a motivational one. It was about the necessity of testing and the right way to do it. I also wrote about testing here but making emphasis on some Design Patterns I have used to build testing frameworks.

I wrote about algorithms and problem-solving strategies here and here. Those are two of my favorite topics.

I don't call myself a developer but I have been earning some money by making software for a while. For those who are starting their developer journey, maybe you can encounter some entertainment and advice in this article.

I have written about how to make amazing slides and my thoughts on [competitive programming] (https://jj.hashnode.dev/try-competitive-programming-out).

My last article was really special. While I was writing it, I was telling myself: "nobody will ever care about this!". But as I said, I find it interesting and I really wanted to write about it. So, that's pretty enough.

Then I decided to check whether people will really like it or not. I said the Discord server I was thinking of writing more about the topic (which is true). Then, I asked people to read and tell me whether the article seems interesting to them (this is almost a trick because I'm going to write the articles anyway). The result was nice. The post received a lot of reactions (compared with the previous posts, more than 20 reactions are a lot for me) and some acceptance on social media. You can read it here.

I have been writing at a rate of two articles per week. Yes, I found interesting the 2 articles per week challenge and I accomplished it, but I think I missed something because I haven't earned my badge yet. It's not important anyway, I just wanted to achieve consistency.

So, what's next?

I think to write more about Lisp. Not about the language but the paradigm it promotes.

I have been thinking of writing about Probabilities and Statistics. Those are the subjects I teach. There are a lot of awesome things to talk about but I need to sort them out. It would be nice to include some Artificial Intelligence and Machine Learning as well. I have prepared some exercises and projects for the students, but I need to think a little bit to make articles about this great topic.

I really want to write more about theoretical topics like computability, algorithmic complexity, optimization, etc. Although I write for myself, I really want to make my content attractive to a lot of people. That's difficult to achieve when talking about those topics. I'd love to write about my research too, but I need to be cautious about future problems when trying to publish in scientific journals.

And we always need to make room for future, unforeseen experiences and inspirational thoughts!

What have I got from writing so far? Well, I have learned a lot about what I have written and about myself. I have received feedback from very talented people and we've had great debates about the articles. I have also improved my English!

More than 1K people have read at least one paragraph of my content in this first month. That's mind-blowing for me. Really! I wonder whether it would be 2K if I add pictures and styles (bad joke). Is really nice to think I might help at least one fraction of those people.

And all that while staying consistent with my idea of what an interesting topic is! Great experience.

As I always say, leave your comment and/or reaction, I'd be glad to receive it. You also can follow me on Twitter, the link is at the top of the page. Stay tuned if you liked this one! I plan to post periodically.

]]>Common OOP languages have a very limited version of polymorphism. In this post, I'm going to talk about a wider version of polymorphism that is included in Lisp.

Let's see first what can be done with the polymorphism we commonly use.

As a simple example, we can have a class called `Shape`

with a method called `Area`

. We also can have other classes called `Triangle`

, `Rectangle`

, `Circle`

, etc. Those classes inherit from `Shape`

, and each one provides its own implementation of the `Area`

method.

We can make a list that only contains shape instances and treats every element as an instance of the `Shape`

class. But when we do `element.Area()`

we get different implementations every time. No matter that we treat every element as a `Shape`

instance, every one of them behaves as a `Triangle`

, a `Circle`

, or a `Rectangle`

according to its class.

That's the kind of polymorphism you might use in your favorite OOP language. But what if I tell you that's a very limited kind of polymorphism?

In the previous example, we had a method inside a class that got overridden by every child class. That was good because we wanted to have different implementations of the `Area`

method for every shape.

Let's think about another example. Now we have a class `Drum`

, and also let's create the class `Stick`

. Where do you include the `Play`

method? Inside the `Drum`

class? Inside the `Stick`

class?

If you choose to include it inside `Drum`

, then you can have different implementations of `Play`

for every different drum you create. But what happens when you have a different stick? You can't have polymorphism with respect to the `Stick`

instance! Exactly the opposite happens if you decide to include the method inside the `Stick`

class: you'll have polymorphism on the `Stick`

instances but not on the drums.

Where's your polymorphism now?!

Let's think about this issue. What happens when you do `classInstance.ClassMethod(<args>)`

?

We can say we are invoking the `ClassMethod`

inside the `classInstance`

. But let's change the previous code a little bit:

```
ClassMethod(classInstance, <args>)
```

Now we see things slightly different. We are invoking the `ClassMethod`

passing the `classInstance`

as the first argument.

Maybe this is familiar to you. For example, in Python, you can invoke class methods using both ways interchangeably. Both of them are equivalent.

Using the second way, we can achieve different `ClassMethod`

behavior by changing the `classInstance`

argument. That's equivalent to polymorphism we use every day. That's called *polymorphism in the first argument*.

Let's return to our "drum and stick" example. The solution would be having something like this:

```
Play(drumInstance, stickInstance, <args>)
```

Now we can have different behavior for every Drum-Stick combination. Nice!

But that's not a natural solution in many programming languages. Usually, every method has the "special" first parameter reserved for the instance of the class where it is defined. Hence, we can't have many "special" parameters representing many different instances of different classes. Even more, many OOP languages don't allow you to define methods outside of a class. So yes, you probably have been using a very limited version of polymorphism during your coding life.

Let's see what Lisp has to say!

Lisp is a very old programming language. It's so old that there was no OOP when it was created. But in the '80s a new update of the language came up with the Common Lisp Object System (CLOS), And then, Lisp became an OOP language.

As a matter of fact, CLOS is implemented in Lisp itself. There are a lot of great things about Lisp. I will definitely write about more of those things shortly.

There are no class methods in Lisp. Methods don't belong to the class. You have classes with properties and methods that can receive instances of those classes as arguments. It's the solution we talked about! That's how Lisp provides us with *polymorphism on multiple arguments*.

Of course, you can solve all kinds of problems with the "regular" polymorphism. You also can solve all kinds of problems without OOP at all! You can think about solving the Drum-Stick problem with a design pattern or some abstraction. But polymorphism in multiple arguments is explicit and simple.

Lisp's CLOS has a lot of more interesting features. This was the first one that really impressed me. Also was the first one I exploited a lot for solving some problems. Talking about other features of CLOS requires some prior knowledge of the language, but I plan to write about topics that don't demand such knowledge. Like this one!

Let's summarize what we have been talking about.

Polymorphism is a cool feature in OOP. It is the ability of classes to behave differently, in different scenarios. But the kind of polymorphism that is present in popular OOP languages is limited.

Yes, there are different types of polymorphisms. We commonly use *polymorphism on the first argument*, but we can have *polymorphism on multiple arguments*. To achieve that, we need to separate methods from classes.

Lisp is a multiparadigm (Object-Oriented included) programming language. In Lisp, methods are created outside classes, and we can have polymorphism on multiple arguments.

Using this more general polymorphism you can get rid of some abstractions and design patterns that maybe you have been using to solve problems similar to the Drum-Stick one.

There are a lot of other features about Lisp that are mind-blowing. It really changes the way you think about programming. I plan to write about some of those features with a focus on the change of paradigm instead of the language.

But I recommend you to take a look at Lisp, and to experience those features and different ways of thinking by yourself.

If you liked the post let me know. You can leave a comment with whatever you want to say. You can also follow me on Twitter for debates and thoughts about related topics.

]]>In this post, I will talk about what I think is good about ICPC and some possible drawbacks.

This is the core activity an ICPC contestant makes. You need to train a lot to classify to a higher instance of the competition (eg: a Regional or a World Final). So the first benefit is this one, you get used to work a lot.

But the benefits are not limited to the amount of training. The training way is the main feature a contestant can develop.

Great contestants (not me, although I luckily know a bunch of them) spend entire days solving a lot of challenging problems.

It's not just learning binary search, it's solving a lot of problems using binary search. I have never seen a better preparation.

By the way, you can solve a lot of awesome problems using Binary Search. I could write a post about it in the future but there is a lot of great material all over the internet.

Think about how many times you, or some of your peers, read about something and right after finish claim to have learned something new. You haven't learned anything until you can apply it to solve several problems. You need to aim to become that new thing in something natural for you.

That's what a good ICPC contestant does.

In ICPC you have two teammates and all of you need to share a single PC. Contest last about 4-5 hrs that are rarely enough. A good team is not just the sum of the individual value of the three members. Teamwork and partnership can play an important role.

So you don't only train yourself, you need to make training sessions with the rest of the team, sharing a single PC. You need to explain to others your reasoning and debate with them your solutions.

Here the only issue is that you get used to communicate with people that think pretty much like you. Then, when you need to talk to someone that is not a contestant about your work, it can be tricky.

But the important thing here is that you get used to share resources and respect the work of your peers.

In our society, this skill is priceless.

Problems you solve in ICPC are well-posed and you know that a bunch of people has solved it. If you become a researcher tomorrow you'll find real life is far different.

You only make little console applications, only use C++ because is faster, don't use AI, nor ML, nor Design Patterns. If you've only coded ICPC problems you are a terrible developer right now. You don't have the most important skills a regular company is looking for.

There are a lot of choices out there to sharp whatever skill you want. You can dedicate part of your time to learn Object-Oriented Programming or ML. You can contribute to an Open Source project and do some research. It would be a shame you only make one activity. If you have good possibilities to achieve awesome results in ICPC, train a lot. But you also can train a little bit and make other activities with the rest of your time.

I see ICPC training as a way to shape your learning process. But the great thing about it is that you actually learn a lot of awesome techniques by training! And you really learn them, you won't forget them tomorrow (if you did it right).

When a contestant poorly performs a real-life task, that's on the contestant, not on the contest. Training competitive programming doesn't erase the rest of your capabilities, but taking care of them is up to you.

I know World Finals competitors that are good in other activities like AI. They spent months training a lot, they were pros. Now they are a great choice for any company or research department.

I have heard a lot of different opinions about the about Competitive Programming. I really don't think that activity harms you in any way.

Yes, you need to be aware that the outside world is different. Is up to you to stay tuned with real life.

Being a contestant allowed me to know very talented people (that's another benefit!). I don't know if I would like to have trained harder, but I'm sure I'm better because of my training.

Of course, I recommend you to try competitive programming out. If you are a University student you can even get involved in ICPC competitions. But you can always try to solve one or two competitive programming problems each week. I assure you it doesn't hurt.

Feel free to leave your comment with any impression, doubt, or suggestion. You can follow me on Twitter for more talks about related topics.

]]>In general, E2E testing is difficult to automatize. First of all, we need tools that can interact with the application that is being tested, we need to fill forms, wait for a page to load completely, and that kind of stuff. We also need to get the results from the user interface, we don't have functions returning objects but HTML elements containing the information. Mocking a real user can be challenging and might require a lot of maintenance.

In this post, I will talk about my own experience building an E2E Testing Framework, I applied some cool Design Patterns so I think this could be interesting for you even if you have nothing to do with E2E Testing Automation.

This post is *language and tool agnostic*, it means that I won't refer to a specific programming language nor a specific E2E tool like Selenium, Puppeteer, or Playwright. By the way, those are great tools for automatizing E2E tests. Furthermore, this post focuses on E2E Testing for websites.

I was required to design a framework to perform different E2E tests on different websites. More precisely, I needed to make some tests over specific React components inside those websites. Every component has the same structure and CSS selectors no matter the website and just changes slightly from one site to another. We needed to make tests for every possible viewport (mobile, tablet, and desktop), and the components change their structure when the viewport changes.

Furthermore, I know nothing about the developers. Thus, I needed to be prepared to manage some unforeseen changes in the interface relatively easy. In other words, one critical requirement the framework needed to fulfill was to be easy to maintain.

How to make an E2E Testing Framework that doesn't care too much whether developers changed the id attribute of some button that is clicked in some test? How to be able to write tests for some component that is not created yet? And, if possible, how to make every test easy to read and understand?

All those goals can be achieved by applying some abstractions and design patterns. Here we go!

The first thing we need to do is to create an abstraction for a page. This is important for several reasons. It will increase readability. For example, you don't want to have a line in your test that reads `tool.getByCssSelector("button.btn.btn-submit").click()`

, instead you want to have a line like this one: `page.clickSubmitLoginFormButton()`

or something similar. You also need to keep all the CSS selectors and DOM related stuff in a single place, this way, when something in the interface is changed you only need to modify one single file (or maybe two, but not more ;-) ).

That abstraction is called the **Page Object Model**. You make a class that represents only the elements that you are interested in from the page. You put all the DOM related stuff in those classes.

In my case, I did it slightly different. I created two classes for every page. A **PageModel** and a **Page Object**. In the first one, I put the elements of the page. For example, suppose we are testing a login page, then my **LoginPageModel** would be like:

```
class LoginPageModel
constructor(tool)
this.tool = tool
loginUsernameInput()
return this.tool.getById('username-input')
loginPasswordInput()
return this.tool.getById('password-input)
loginSubmitButton()
return this.tool.getById('submit-login-button')
```

If any of those elements change in the future we only need to modify the corresponding **PageModel** class.

In the **PageObject** class, I add the actions that you can perform on the page. An example of a **LoginPageObject** class would be:

```
class LoginPageObject
constructor(pageModel)
this.model = pageModel
typeUsername(username)
this.model.loginUsernameInput().type(username)
typePassword(password)
this.model.loginPasswordInput().type(password)
clickLoginSubmitButton()
this.model.loginSubmitButton().click()
```

Here we can get advantage of a statically typed language that can get all the methods of the model class in compilation time, that way some IntelliSense tool can remind us the name of every method that represents a page element, and we also get more compilation errors and fewer runtime errors, which is very good for us and our mental health.

Why to separate page elements from page actions? A single class that contains both the elements and the actions can be very large. We can say that by doing this we are applying the Single Responsibility Principle and that would be cool, but in this case, that has not too many practical significance beyond readability and keeping classes simple.

With the **Page Object** abstraction we can make tests that only depend on page objects instead of writing some tricky CSS selectors in the middle of the test code. We keep all the DOM related stuff in a single place and our tests can be more expressive and easy to understand.

Now we have many classes that contain all the elements and actions of several pages. What we need to do now is to build our tests. Those tests will provide a simple interface that exposes to the client the `run`

functionality. This functionality returns a test result. The client doesn't have to worry about accessing any element or doing any action, just needs to instantiate the test and run it.

When we provide a simple interface that hides a more complex infrastructure we are applying the Facade Pattern. I know that's only a fancy name for something that is clear we needed to do. Continuing with our Login Page Test example, the **LoginTest** would be something like this:

```
class LoginTest
constructor(loginPageObject)
this.pageObject = loginPageObject
run()
this.pageObject.typeUsername("TestUser")
this.pageObject.typePassword("TestPassword")
this.pageObject.clickLoginSubmitButton()
assert that the login was successful
```

The last line of the `run`

method is an assertion. Depending on the complexity of the assertions you use, you can either define them separately or inside the Page Object. By choosing the first option you can reuse and extend assertions, but if your assertions are very specific for each case and simple enough the first option can be overkill and you probably will be good with the second one.

We are also injecting the Page Object dependency in the Test. We are not doing `this.pageObject = new LoginPageObject()`

but receiving the dependency as an argument in the constructor. This is called *Dependency Injection*. That way, we can instantiate the same test for another page. We also inject the Page Model in Page Object instances. Then, we can have the same Page Object with another model (eg: same LoginPageObject instance with a LoginMobilePageModel instead of a regular LoginPageModel).

But now, to instantiate a test, we need to instantiate one or more Page Models, then one or more Page Objects, and finally the Test. It seems too much work. That's precisely one of the drawbacks of using Dependency Injection, but it is solvable!

If it's difficult, let's delegate the responsibility to another abstraction. In this case, we'll make some Factories. These are classes that are used to instantiate other classes. Every Factory class will be responsible for instantiating a specific test. That's the Factory Pattern in action!

So we can create a **LoginTestFactory** for our LoginTest:

```
import tool
class LoginTestFactory
create(config)
if config.viewport == 'mobile'
then return new LoginTest(new LoginPageObject(new LoginMobilePageModel(tool)))
else
return new LoginTest(new LoginPageObject(new LoginPageModel(tool)))
```

Here we are representing with `tool`

any possible technology you could use to get the elements of a page and interact with them. Maybe you don't pass the imported tool as is, but you create some objects using that tool and then pass those objects as parameters. But the idea is that all the relatively complex logic to make an instance of a Test is encapsulated in a Factory object.

To run our test we only need to do something like this:

```
runLoginTestDesktop()
factory = new LoginTestFactory()
config = new ConfigObject(viewport = 'desktop')
test = factory.create(config)
test.run()
runLoginTestMobile()
factory = new LoginTestFactory()
config = new ConfigObject(viewport = 'mobile')
test = factory.create(config)
test.run()
```

Now, in the Conclusions section, we'll check whether we have accomplished our initial goals

Building our framework in the way I have shown you in this post, can dramatically decrease the cost of a change in the user interface. All the code that depends on the user interface is isolated in specific classes that abstract the concept of a page.

That abstraction also allows us to write the tests for the next week! I mean the test for components that have not been created yet. We just make the required new PageModels and PageObjects to mock the elements on the page that will be created and we can build the rest of the process in the same way we have seen so far. When we had the specific elements on the interface we can change the page models and verify whether the application behaves as expected.

We also have tests that are very easy to read and understand since we make expressive actions like `this.pageObject.clickLoginSubmitButton()`

. Thus, our tests can describe the requirements of our application and can be easily maintained.

E2E testing automation is difficult because is difficult to keep it simple, and a complex test is not a test. In this post, I have shown some design patterns and good practices to make it smoother. I have tried to make it language and tool agnostic in order you can apply these practices in your project no matter what language or technology you are using. I only assumed an Object-Oriented programming language.

No matter if you are not making an E2E Testing Framework, I made this post because I think some of these tricks can be applied in a relatively wide variety of problems.

Feel free to comment any suggestion, impression, or doubt about this article below. You can also follow me on Twitter for debates about related topics.

]]>So, let's get started!

As a rule of thumb, if someone can read your slides and get all the info, then you're doing it wrong.

Your info must be split among the slides and the discourse. If one of them is missing so it is part of the info.

That's because your slides need to deliver information in a way that make your discourse lighter and easier to understand for the audience (that doesn't care about you).

No matter what is showed in that specific slide, you need to add a title to it.

If someone in the audience gets distracted at some point of your presentation, and try to return and understand your explanation, it would be better that he could see what are you talking about right now. That's the main purpose of the slide title.

Hence, you need your titles to be concise, and you need they to tell the audience what you are talking about right now. For example, "Prerequisites", "Problem definition", "State of the art", etc.

If you want to explain an algorithm, do it by using animations and images. If you want to explain a process that has many stages, use illustrations for every stage. Your audience will appreciate that.

You now what they say: *A picture is worth a thousand words*. People don't want to read, they don't even want to understand you! You need to make your message important and interesting to them. And you definitely want your message to be as easy to digest as possible.

More than 10-15 secs without any change in your slide... Baaang!!! A half of the room is sleeping and the other half in their social media. Yes, your family included.

That doesn't mean you need to move from one slide to another with that frequency, just make something appear or disappear, or some animation perhaps, but keep moving. Maintain the audience engaged.

When including text just write up to 7 lines per slide and up to 7 words per line.

Try not to exceed those limits, but you can exceed them by one or two words/lines in some special cases.

Adding too much text is bad for many reasons. We have stated earlier that people don't want to read, we (as audience) prefer other smoother channels. But another drawback related with adding too much text is explained in the next tip.

People don't want to read, but if you put some text in your slide they prefer to read it instead of paying attention to your discourse.

If you want people to keep listening to you, show the text with the info that you are talking about right now. After that, show the next line about the next topic, and so on.

When people in the audience read a lot of text from the slide, they get lost when returning to your speech. They read much faster than you speak. Just apply the brake.

*Just for larger presentations*

Your ideas are arranged in some order in your head. Transmit that order relation to your audience explicitly. Before and after talking about some topic, show your Table of Contents.

Make sure to highlight the next topic you will be talking about and differentiate the topics you have explained already. Use different colors and opacities maybe.

You did it quite well! You managed to catch the attention of the audience during all the presentation. They are amazed and wondering how can they reach you. Maybe an email or some social media links? Nop, all they can see is a "Thank you", that is what your last slide says.

Always end your presentation with your very first slide, that which contains your name, the title of your presentation, and your contact info.

A presentation is not a bunch of slides. A presentation is a conception, a method you elaborate to transmit your ideas and results.

The answer to the question "Have you finished your presentation?" can be "Yes, I just have to finish my slides".

But slides can make the difference between a great and a poor presentation. When used in the right way, they can be a very effective weapon, but they also can make your presentation a total disaster if you use them badly.

I have given you some tips to make your slides an effective weapon.

Remember to approach your presentation as a fight to get attention from your audience. With that in mind, try to make your message easy to digest, and add dynamism to your slides. That's the way to maintain people engaged.

Let me know if you enjoyed this post by reacting. Feel free to comment any doubts or recommendations. You can also follow me on Twitter for more content like this.

]]>I assume you have some knowledge about big-O notation and algorithm design.

In this post I focus on *time complexity*, so I am going to give you some methods to demonstrate an algorithm is "as fast as can be". Some of that ideas are also applicable to memory complexity analysis.

When I say that an algorithm `A`

such that `T(A) = O(f(x))`

(i.e runtime complexity function of algorithm `A`

is `O(f(x))`

) cannot be improved any further, I mean that there is no an equivalent algorithm `B`

such that `T(B) = O(g(x))`

and `T(A) != O(g(x))`

(i.e runtime complexity function of algorithm `A`

is worse than `O(g(x))`

).

Most of the time you'll be required to design an algorithm that run in a certain amount of time for a certain bounded input. So in most cases you don't need to find an optimum algorithm. But this post will give you some tricks you can use to decide whether you should try to fulfill the requirements or just change your requirements since it can be proved that those requirements are impossible to fulfill.

I'm going to talk about some "tricks" in the next sections but let's talk about the straight way first. Given the problem, and some restrictions maybe, let's demonstrate we can't design an algorithm that ran (asymptotically) faster than a given one.

I'll just ilustrate one of those demonstrations. Let's demonstrate that it's not possible to sort a list doing just comparisons between the elements of the list in less than `O(n * log n)`

.

A permutation of a set of elements is an arrangement of those elements. For a set of size `n`

there are `n!`

different arrangements (`n*(n-1)*...*2*1`

). So the task is, given a permutation, how to find the sorted permutation using just comparisons? When I say using just comparisons I mean that the only way to gain some knowledge about the input permutation is by comparing two elements of that permutation.

We'll represent such comparison-based algorithm as a binary tree, where the leaves are the different permutations and every interior node represents a comparison between two specific elements of the input. Here's an example of a tree for an input of length 3.

So we start doing the comparison represented in the root, if the answer is "yes", then we know the permutation we're looking for is in the left subtree, otherwise it's in the right subtree. We continue applying the comparisons until we reach a leaf, which represents the sorted permutation.

For example the leaf

`<2,3,1>`

represents the permutation where the first element of the input is in the third position, the second element is in the first position and the third element is in the second position.

Then the length of the path from the root to one of the leaves is the amount of comparisons that we need to do to sort an specific input permutation. So, the longest possible path represents the worst case since by following it we need to do the bigger amount of comparisons. Let's demonstrate that the longest path will always contain at least `O(n * log n)`

comparisons.

We know the number of different permutations is `n!`

so that's the same number of different leaves in the tree. But if the longest path in the tree has length `h`

then the number of leaves cannot exceed `2^h`

(`2`

to the power of `h`

). Then we can apply some math like it's shown in the image.

And that `n/2 * log(n/2)`

is `O(n*log n)`

.

So here we represented the algorithm operations through an abstraction (a binary tree) and based on that we demonstrated that there is always a case in wich the algorithm has to make certain amount of operations (`O(n * log n)`

), thus we'll never get an equivalent algorithm that can do less than that certain amount of operations.

Does that mean that we can't sort in less than

`O(n * log n)`

? No! We assumed that the algorithm just makes comparisons to gain knowledge about the input (likebubble sort,quick sortandmerge sortdo). But there are sorting algorithms that assume some prior knowledge about the input and can sort in`O(n)`

! Take a look at counting sort and radix sort.

The first "trick" I'm going to talk about is known as the "Adversarial Lower Bound Technique". Now, there will be an adversary (or devil) `D`

. The adversary has access to all the possible inputs, and we'll see the execution of the algorithm `A`

as a game in which `D`

is suposed to select one input, then `A`

will ask for the result of one operation on the input to `D`

, and `D`

must answer the result. The game ends when `A`

has an output given the input `D`

has selected.

The trick here is that `D`

can always change its mind about the input as long as at the end of the game the input selected by `D`

was consistent with the answer `D`

gave throughout the game. The task is to design an *adversary strategy* that forces `A`

to ask at least as many times as we want.

Let's consider the problem of finding the maximum in a list of length `n`

. The algorithm `A`

is allowed to ask whether the number at a certain position in the list is greater than the current maximum, assuming that the initial maximum is the number at a certain fixed position in the list. Let's prove that `D`

can force `A`

to make at least `n-1`

operations.

The strategy of `D`

would be place in every position `A`

asks about, a bigger number every time. And the answer will always be "NO" (i.e the number your asking about is not greater than the current maximum). So, after `n-1`

operations `A`

can state that the maximum is the number in the position it has not asked about.

Let's consider the problem for a list of length `4`

. The initial maximum will be the number in the first position. Then `A`

asks to `D`

whether the second number is greater than the current maximum. `D`

answers "NO" and puts the number `1`

in the second position of the list. Then `A`

asks whether the third number is greater than the current maximum and `D`

answers "NO" and put the number `2`

in the third position of the list. Finally `A`

asks about the fourth position and `D`

answers again "NO" and put the number `3`

in that position, but then `A`

can state that the maximum is the first number since there is no other number greater than it, then `D`

put a `4`

in the first position and shows the input to `A`

. That input (`4, 1, 2, 3`

) is consistent with all the answers `D`

gave to `A`

so we found a case where `A`

is forced to make `3`

operations to find the maximum in a list of `4`

elements.

The demonstration for a list of `n`

elements is just a generalization of the method described above.

This was a simple example. You can try to prove the lower bound of sorting algorithms described in the previous section via Adversary Technique for a more challenging demonstration.

The last "trick" I am going to talk about is the Problem Reduction Technique. The idea in this case is to prove that if we could solve certain problem in time less than `T(n)`

then we could solve another problem in a time we already know is impossible.

For example, we have the problem of the Epsilon Distance, which is about telling whether in a list of numbers there are two elements `x`

and `y`

such that `|x-y| < e`

for a fixed `e`

. This problem is known to have a runtime complexity that is `O(n * log n)`

where `n`

is the length of the list. We won't prove that in this article.

Instead we are going to prove that the problem of finding the closest pair of points among a set of `n`

points in a plane cannot be solved with less than `O(n * log n)`

operations.

Supose we can solve the Closest Pair of Points Problem in less than `O(n * log n)`

. Now think about the Epsilon Distance Problem, if we have a list of `n`

numbers `x1, x2, ..., xn`

, then we can transform that list into `(x1, 0), (x2, 0), ..., (xn, 0)`

. That transformation can be done with `O(n)`

operations. Then we can solve the Closest Pair of Points Problem in less than `O(n * log n)`

and if the distance between those points is less than `e`

we can answer "YES", otherwise we answer "NO". And we have solved the Epsilon Distance Problem with less than `O(n * log n)`

operations. But that's impossible! Since we know that `O(n * log n)`

is the lower bound for solving that problem. Hence, we cannot solve the Closest Pair of Points Probem with less than `O(n * log n)`

operations neither.

Before the end of this article I want to talk about problems that are NP-complete. Making an oversimplification we'll say that an NP-complete problem is a problem for which we haven't found a solution that makes a polynomial amount of operations. So, until now, the known lower bound for the amount of operations those solutions perform is an exponential function of the input.

Maybe you have heard about NP-complete problems before, and probably you have been told that if one can find a polynomial solution for any of those problems then you'd be proving that all of them have a polynomial solution as well (and then you'd be solving one of the hardest problems in the history of humanity, and your name would be written in any single book about computer science, and you'd be millionare, among other benefits).

That's because the way we can prove that a problem is NP-complete is by proving that if we find a polynomial solution for the problem then we can solve another already known NP-complete problem in polynomial time. In other words, to prove a problem is NP-complete we use the reduction "trick" we've seen in the previous section.

Note that in this case we are not proving that a better runtime is not possible, we are just proving that the humanity hasn't found a polynomial solution for that problem so far.

Maybe you are wondering how the first NP-complete problem was discovered. Well, I can tell you when: 1971. I also can tell you who: Stephen Cook. I even can tell you what: the 3-SAT problem. But the how is up to you.

Proving the lower bound in runtime complexity of an algorithm can be a very dificult task. In this post I have talked about some examples and "tricks" to do so.

I have made a lot of oversimplications all along the article. If you want to learn about this topic in a more technical and formal way you can search other good sources in internet. You can also leave your comment with any doubt or recommendation. You also can @-me on Twitter to start a fruitful debate or you can follow me for some tweets about computer science.

]]>I'll include some side notes like this one. I'll use these notes to briefly introduce some concepts and tools for those who are starting their programming journey right now.

When it comes to choosing a framework or a library there is often a lot of discussion. At some point in my evolution as a programmer I thought that flexibility is always good. After a bunch of lines of code I learned that rigidness is not that bad.

Frameworks, libraries, and tools in general are here because we are bad programmers by nature (among other causes). I really appreciate when a tool forces me to make things in a unique specific and very good way, so I can acomplish my goals by following that way.

In a smooth scenario, when definitions and requirements don't change abruptly during a week I always choose rigidness. I always choose the straight path.

Then here I am, a guy that prefers the straight path in the middle of this jungle with no path at all. Definitions and requirements are changing every day in the most abrupt possible way and every week I am doing a very different project. I had to learn to embrace flexibility, in the hard way.

Now I am going to ilustrate that hard way through examples.

They decided to build a website using React. I learned React while doing the project so I started to consume a lot of tutorials. I decided to use `create-react-app`

(CRA) to scafold the project since it is used in every single tutorial. So far, so good. I had a good workplace and was able to make the production build with a single command and so on.

I highly recommend you to learn something while doing a project and, if possible, while getting paid for it. I think it is the best way to learn, at least when it comes to frameworks and tools in general.

CRA is a very useful way to get started with your React project. With just typing

`npx create-react-app <project-name>`

you get the whole project scafolding. I think every React developer can take advantage from this tool no matter if you are novice or senior. I won't complain about CRA, but I think some tutorials should warn us about the possible drawbacks resulting from using it.

Then the boss learned what a Progressive Web App (PWA) is, and after a couple of weeks we were required to make the site progressive. This is not a big deal since CRA provides us with a service worker.

PWA is a mechanism to make websites working offline and to behave like native apps in mobile devices, among other benefits. This is achieved with the colaboration of several components like the browser, a special script called service worker, a manifest file, and some code to install the service worker in the browser. As I said, CRA provides us with the required components in order we can make our React site progressive.

Once you have a service worker in your site, you are able to receive push notifications from a server, and of course, after two weeks we were required to implement this too. But to make that possible you need to make some adjustments in the service worker and guess what, the CRA service worker is not customizable at all!

Maybe you have encountered some sites that ask you whether you'd like to receive notifications. If you answer yes, they can send you the so called push notifications with the content they like. This notifications pop up in your device no matter that you have the browser closed. This is possible because the browser keeps the service worker script running in the background.

Then I had to build a new service worker from scratch and the code to install it. That is not a big deal because I had have to do it no matter I had used CRA or not. So let's move to other issues.

After building the whole payment process, they decided that we needed to use some service that verifies your identity from some images of your face and of an identity document. So we needed to change a lot of things in the already built payment process. I made possible that the user could take a selfie, and take the picture of his ID document. But after that, we were required to use a specific SDK provided by the service in order to make the images to fulfill some features required for a reliable evaluation.

The SDK was a vanilla Javascript file that needed to be imported via HTML script tag. It is intended to be integrated in applications that use vanilla javascript. You know, the non-NodeJS one. So I needed to make the impossible to get everything working without making `eject`

. After knowing about webpack, I'd like to go back in time and start my project from scratch, without using CRA or anything and taking care of my own webpack configuration. Belive me, this is just a single example because I don't want to make the story so long, I just want to make my point.

Webpack is a tool that allows us to translate our NodeJS code into a vanilla javascript bundle that can be used in our production site. It's hard to master and that's why tools like CRA manage all the webpack related issues for you. But I haven't found a smooth enough way to make changes in the CRA webpack configurations, so I think that if you are facing the sort of complexities I am telling you about, you should consider using some more flexible tool (and then tell me what tool is that) or doing your own webpack configurations. CRA allows you to run

`npm eject`

and after that, you are by your own with webpack and everything, but I am not brave enough yet.

My first task in this company, before the React website I have written about above, was to build a responsive website. After a lot of effort and passing through all sort of dificulties, the site have never been used. Yes, I got paid, but this post is not about making money, this post is about what I think is wrong.

The problem here is not the changing. There are always unstable scenarios, businesses that are unstable by nature, or at least that become unstable due to new events. The problem is the rush in the decision making, the lack of connection between developers and the business men and the misconception by some of those business men that the cost of developing is negligible.

I think you just learn about good programming practices when you find yourself repeating the same code everywhere and fixing the same bug in diferent places. Well, with this experience I have learned the necessity of planning and writing. You need to define your problem and to write down that definition. You need to keep a record of your requirements, and even keep a record of the infrastructure you need in order to fulfill those requirements. I don't like to overthink, but it's just a lack of common sense thinking that you can get something by just hitting the keys when the project is complex enough.

The problem is not the need to make big changes in the project per se. But if you can include those changes in the initial plan then they are not changes but initial requirements, and the developing process is faster, cleaner and no one do fruitless job. The sooner the change is predicted, the smaller and the easier the refactoring.

Those have been some of my experiences in this firs year as a developer. I'll certainly write about some others, don't think everything have been that painful.

I hope you or your boss don't be suffering the WIPCA syndrome (Why Isn't Paul Coding Anything?!). So always try to write your problem definition, the requirements and everything you think is necessary, like the infrastructure. No matter if you are using Scrum, Waterfall or whatever. Just write a little bit and try to make changes, which will always be needed, less painful. Of course this only applies to complex enough projects, but don't underestimate projects, actually a way to assert whether a project is complex enough is defining the problem and the requirements.

If you liked this post hit the like button. Feel free to comment whatever you want. You can also follow me on Twitter for debates about Computer Science.

]]>If you have written code for more than one month is almost sure you have experienced the necessity of trade-offs when programming or at least have heard about it. Sometimes you sacrifice performance in the name of security, security in the name of scalability, beauty and readability in the name of performance, and so on. Don't forget you also sacrifice parties and fun in general in the name of programming so make it worth it.

In the specific case of algorithms, the main resources are time and memory, so the trade-offs always involve those resources. It's common to find several solutions for the same problem because one of them is faster, but the other one is cheaper when it comes to storage. Of course, there are other factors like implementation, simplicity, and security. In this post, I'm going to write about combining several solutions to get the one that fulfills our requirements.

First I'm going to show an interesting idea that E.W. Dijkstra proposed to find prime numbers. After that, I will show an idea I came up with to obtain a data structure that combines the power of arrays and linked lists.

I assume you have basic programming skills as well as some knowledge about data structures like arrays, linked lists, and heaps.

It is necessary to have some notions of big-O notation to calculate complexity.

It would be good you were familiarized with the algorithm called Sieve of Eratosthenes, in case you don't, you can check this link.

There is no other previous knowledge required to understand what I'm about to say.

The problem is to find all prime numbers from 0 (zero) to N. The kind of problems you learn to do when you are beginning your journey in programming. And I want to start with the simplest solution.

In the naive algorithm, we iterate over all numbers `x`

from 2 to N. Then we check whether `x`

has any divisor besides itself and one. For the last step, we can check for every number `d`

between 2 and `x - 1`

whether it is a divisor or not.

There is room for improvement in the last step because we only need to check the divisors that are less than or equal to the square root of x. A pseudocode of the algorithm is written below

```
primes(N):
prime_numbers <- [] # empty list
for x = 2 to N:
is_prime <- true
for d = 2 to sqrt(x):
if d divides x:
is_prime <- false # we found a divisor of x so x is not a prime number
if is_prime:
prime_numbers.add(x) # if we didn't find a divisor then x is prime
return prime_numbers
```

What's the runtime complexity of the previous algorithm? Well, we take every number from 2 to N and for each one, we iterate over all its possible divisors, so we make `O(N*sqrt(N))`

operations, where `sqrt`

stands for the square root function.

What about memory? We only store the primes we find. For a big enough N the amount of primes is relatively small compared with N. So, let's denote the memory complexity as `O(Primes(N))`

, where `Primes(N)`

is the number of primes between 0 and N. Note that this is optimum in terms of memory because we need at least to store all the prime numbers.

Probably you already know that there is a better algorithm to find all the prime numbers in a given range: The sieve of Eratosthenes. Let's look at it.

The idea is to maintain an array of length N where every entry is either false or true. If the i-th position in the array is true, then we say that `i`

is a prime number, otherwise `i`

is not prime.

We start with all positions with a true value and then, for every position, starting with `i = 2`

we make false every multiple of `i`

. We end up with an array in which every position with a true value represents a prime number. The code with some optimizations is presented below.

```
primes(N):
prime_numbers <- [] # empty list
sieve <- a boolean list with length equal to N and all its elements set to true
for i = 2 to sqrt(N): # we only need to analyze the numbers less than or equal to sqrt(N)
if sieve[i]: # if sieve[i] is true then i is prime
prime_numbers.add(i)
for j = i*i to N: # we can start the inner loop in i*i
sieve[j] <- false
j <- j + i
return prime_numbers
```

It can be proved that the algorithm described above does O(N log(N)) operations which is better than the naive approach. Even it can be improved to O(N)! But we need to store an array with all the N numbers, and thus, we end up with a more expensive algorithm in memory terms. Here we have the trade-off: time in exchange for memory.

Can we design an algorithm that is faster than the naive idea but cheaper in memory terms than the sieve of Eratosthenes?

In the '60s, E.W. Dijkstra wrote in one of his manuscripts [PDF] an algorithm that combines the naive and the sieve ideas. But before we talk about it, let's analyze the differences between the two previous algorithms.

When applying the naive algorithm we focus on analyzing whether every number is prime or not. When applying the sieve we analyze, for every prime number, all its multiples. The difference can be illustrated with the following analogy.

Imagine we want to build several products, like action figures. We have two options: building them one by one or applying a production line and in every stage, we add one component of the action figure. With the latter, we end up producing more in less time, but we need the "line infrastructure".

Dijkstra combined those ideas by analyzing a number at a time but taking advantage of the previous operations. We can maintain a pool of primes that have been already discovered, and, for each of those prime numbers store the least multiple that has not been analyzed yet. For example:

If we are analyzing number 6, then we have stored primes 2, 3, and 5, along with their least-not-analyzed-multiples that are: 6 for 2, also 6 for 3, and 10 for 5.

Then, when analyzing a new number we take the smallest multiple stored until that moment and if that multiple is greater than the new number, then we have found a new prime, otherwise, we have a composite number and we need to update the multiples of the stored primes that have the new number as its least multiple.

We begin storing the prime number `2`

with the least multiple `4`

as well. Then when analyzing `3`

we find that `4 > 3`

so `3`

is prime. We store `3`

along with its smallest multiple that has not been analyzed yet (`6`

). When analyzing `4`

we find that `4`

is stored as a multiple of `2`

, then we update the multiple of `2`

that now will be `6`

. When analyzing `5`

we find that `6 > 5`

so `5`

is a prime number and we store it along with `10`

, and so on... The code is presented below.

```
primes(N):
primes_pool <- Heap( (4, 2) ) # a heap that contains a tuple with a prime number (2) and its least multiple that hasn't been analyzed yet (4).
prime_numbers <- [2] # a list that contains the number 2
for i = 3 to N:
tuple <- getMin(primes_pool) # get the tuple with the minimum multiple, the prime number attached to it is irrelevant
mult <- tuple.first # the multiple is stored in the first position of the tuple
if mult > i:
prime_numbers.add(i) # i is prime!
primes_pool.insert( (i*i, i) ) # we insert i along with their least multiple, but it can be inserted i*i instead and the algorithm remains correct
continue
# otherwise i isn't prime and then...
for t such that t is in primes_pool and t.first is equal to mult:
t.first <- t.first + t.second # we update every tuple in the pool that has i as it's least multiple
return prime_numbers
```

The previous method only stores the prime numbers and another number for each of those primes. So we have a memory complexity of `O(Primes(N))`

which is the same as the naive idea. If we store the prime numbers along with their multiples in a structure like a heap, we get a time complexity of `O(N*log N)`

which is the same as the sieve! So we got what we wanted!

The trick here is that we don't need to mark every multiple of the given prime but just the least multiple.

I need to say that this is not a practical idea in the sense that the memory complexity of the sieve of Eratosthenes is not that bad and it is an algorithm very easy to implement. My point is that sometimes you have several ideas, each of them cannot be applied because of some flaw, then maybe is a good idea to combine their strengths and you can get a hybrid applicable solution for your problem. The Factory of Primes of Dijkstra has taught me to think in that way, although I have never implemented that algorithm in real-life scenarios.

Arrays are simple structures that allow us to get an element by its index in constant time. But we need `O(N)`

operations to insert or remove an element to/from the array in the worst case, where N is the length of the array.

On the other hand, linked lists are structures composed of nodes. Every node has a reference to the next one, and, in the case of doubly-linked lists, each node has also a reference to the previous one. In this case, we need `O(N)`

operations to reach a node in the worst case, but the insert and remove operations can be done in constant time. I think it's natural to think about a "perfect data structure" that allows us to make the three operations in constant time. Sadly, such a structure does not exist, but I have found a middle point between the two opposite poles.

The "issue" with arrays is that they maintain a reference to all of their elements. That allows us to retrieve any element with the same amount of operations no matter where the element is. But maintaining those references is what makes the insertions and deletions so costly. When it comes to linked lists we only maintain a reference to the first and the last node and each node has a reference to its neighbors. So, when inserting or removing a new element we only need to change a few references. But that lack of references is the cause that we have to spend so many operations to get an element in the worst case.

Viewing the problem from that angle, the idea of finding a middle point in the number of references seems natural. What happens if we maintain a reference to `sqrt(N)`

nodes of the linked list instead of referencing just the first and the last element?

That allows us to have a list of length `sqrt(N)`

such that the distance between each of those nodes in the actual list is `sqrt(N)`

. Having that, we can do each operation (index, insertion, and deletion) in `O(sqrt(N))`

.

If you want to see more details about this structure and a Lisp implementation of it, you can do it here.

Update: I have written a post about this data structure. You can read it here

We have seen two examples of combining existing solutions to get another one that has some of the good qualities of each of the previous ones. My purpose was to show you this way of thinking, not just specific examples.

Note that in the case of Dijkstra's idea, we could achieve the time of the fastest solution and the memory complexity of the naive algorithm. In the second example, we just got a middle point, so the fastest solution is still faster and the memory-cheaper solution is still cheaper, but the new structure is like a decathlon athlete, it is good in every specificity, but not the best in anyone. So don't try to find the silver bullet, remember the no free lunch. Even Dijkstra's idea has the disadvantage of being harder to implement and to understand.

Hope you have learned or found attractive this post. Feel free to ask whatever you want or give me some feedback, I will appreciate that a lot. Stay tuned for the next post!

]]>Once the project was finished it turned out that people couldn't log in with their accounts! Our developer was both stunned and sad at the same time. Luckily this was just a school project and wasn't a bank management software.

Have you experienced such situations? I certainly have. At some point in the development process he made some changes in the database structure, those changes made the login logic to fail. But he was logged in the page already and didn't remember to retest the login page. It was a hard to find error. Damn!

Does that mean that we are bad at making software? Yes, in a way.

Yes, coding that way is certainly wrong. But is not the error itself what was wrong. We are all humans, we make mistakes in every activity we carry on and coding is not an exception. Bugs are part of the game.

What we need to do is to be prepared to detect that error as soon as possible. We cannot avoid bugs, but we can make a lot of things instead of just login once or twice in order to detect bugs.

Hope at this point you are motivated enough to keep reading about how to get rid of bugs. But maybe you need to know some examples and figures of how harmful bugs can be. I recommend you to take a look at this page

There will be bugs in the code we'll write tomorrow. Why not take care about the "pesticide" today? So, start the project that will test your actual project right now!

There are several ways testing can be carried on, and there are several kinds of tests as well. This post is not about the characteristics of each type of test. Instead I want to make a shallow introduction about the necessity of testing and the right way to test.

Companies that are large enough have entire departments of Quality Assurance (QA) for the software they produce. Testing projects are comparable with the actual product in both length and resources consumption. So yes, you maybe will need to code a lot for the sole purpose of "capturing" your bugs. But it's worth every second and line of code when you are working on a complex enough project.

A bug is harmful only if it finds its way to the released product. The cost of fixing a bug decreases dramatically when it is discovered in an early stage.

Although testing can reduce cost a lot, it seems kind of expensive as well. Yes, I know it is order of magnitude less expensive. But how to make it cheaper? What is the right way to go?

The first and most important thing to take into account is that testing **can detect the presence of bugs but not their absence**. So you need to focus your testing on finding bugs, instead of trying to prove the correctness of the software.

In order to be relatively easy to test, your project should be built according with good practices and principles. Actually testing is an effective way to find the best architecture and design patterns. Yeah, you should test even in the earliest stages of the project! Your software should be divided in several modules that fulfill very specific requirements, and we need to be able to test those modules either in isolation or interacting with each other.

We need to automatize as much as we can. And the scripts that contains tests should be light enough to be run lot of times with nearly no cost. Every task that is carried on by humans is prone to errors and is (at least in this context) more expensive. But it might happen that our software would require some test that can't be automated or that would be more expensive if we automatize it, due to the necessity of constant maintenance that it would need, among other possible issues.

But then... Don't we need a test to find bugs on the test itself?

Testing is not bug free. You can certainly introduce some bugs when coding an automated test. Then it seems natural to think about a test that tests the test. But then we'll need a test that tests the test that tests the test. And of course we can come up with the idea that testing is a fruitless effort since it tries to capture some bugs while introducing some others.

Don't worry. I don't like to do fruitless work either, and making this post would be so if that last idea was true.

I just want to mention another important requirement that every automated test should fulfill: **it needs to be simple**.

Every complex logic needs to be out of the test and, potentially, be tested by a short, readable and simple piece of code

Thus, test will always be almost error free. Readability is an important feature as well since it allows to find any possible bug introduced in the test very quickly.

This post is an overview of testing as a necessity in the software development. I approached the subject as an effective way to discover errors during the software construction, which is dramatically cheaper than discovering them when the harm is done. But testing is more than that, it is the most effective way to build good quality software. It should be present in every stage of the development and even be in charge of every stage. Testing is the heart of the software quality assurance.

As important as testing is testing in the right way. So you need your test to be simple, short and readable. That last feature is an important one! Not just because the reason mentioned above but because a test suite can represent the requirements of your product. Yes, you can "define" your product by testing it, but just if that definition can be easily read.

This post is far from being a deep guide for testing or a treaty about every single aspect of testing. My sole purpose is to motivate you to learn about it. I hope this requirement was fulfilled. Thanks for reading

]]>