[SER222] Analytical Analysis (5/5): Input Dependence
Summary
TLDREn este video se discute cómo el carácter del input puede influir en la eficiencia de un método. Anteriormente se asumía que solo importaba el tamaño del input, pero el orden de los datos también afecta. Se usan ejemplos como la búsqueda lineal y el ordenamiento por selección e inserción, destacando que algunos algoritmos dependen del estado del input, como si está ordenado o no. También se menciona la dificultad de escribir una función de crecimiento precisa debido a esta dependencia, pero se puede realizar un análisis en términos de notación O grande.
Takeaways
- 💡 La complejidad de un algoritmo puede depender de la forma en que se presente la entrada, no solo del tamaño de esta.
- 📊 Las funciones de crecimiento normalmente solo consideran el tamaño de la entrada, pero este no es siempre el único factor.
- 🔄 Un algoritmo de búsqueda lineal se comporta de manera diferente según cómo esté ordenada la entrada.
- ⏳ El peor caso en la búsqueda lineal ocurre cuando el elemento buscado está al final de la lista, lo que requiere revisar todos los elementos.
- ⚠️ No siempre es posible escribir una función de crecimiento precisa si la entrada afecta el comportamiento del algoritmo.
- 🧠 Una alternativa es utilizar una aproximación, como una cota superior, para representar el peor caso de un algoritmo.
- 🔀 La suposición de que los datos están aleatoriamente ordenados puede permitir una estimación promedio del rendimiento, por ejemplo, la búsqueda lineal puede completarse en la mitad de los elementos.
- 📈 Los algoritmos de ordenación, como el de selección y el de inserción, pueden tener tiempos de ejecución diferentes según cómo esté ordenada la entrada.
- ⚙️ La ordenación por selección siempre toma el mismo tiempo, independientemente del orden de la entrada, mientras que la ordenación por inserción es más eficiente cuando la entrada ya está ordenada.
- ⏱️ Aunque la ordenación por inserción tiene un peor caso de O(n²), si la entrada está ordenada, puede comportarse en tiempo O(n), lo que la hace más eficiente en ciertos casos.
Q & A
¿Por qué no es suficiente considerar solo el tamaño del input al analizar el rendimiento de un método?
-Porque el rendimiento de un método puede depender no solo del tamaño del input, sino también de cómo están organizados los datos en el input. Por ejemplo, en una búsqueda lineal, el número de comparaciones puede variar significativamente si el input está ordenado o no.
¿Qué problema plantea el suponer que todos los inputs del mismo tamaño son equivalentes?
-Este supuesto ignora cómo está estructurado el input, lo que puede afectar significativamente el rendimiento. Dos inputs del mismo tamaño pueden tener diferentes configuraciones, lo que puede llevar a un número diferente de operaciones realizadas por un algoritmo.
¿Cómo afecta la organización de los datos en el input a la búsqueda lineal?
-En una búsqueda lineal, si el elemento buscado está al inicio del array, el algoritmo puede terminar rápidamente. Sin embargo, si el elemento está al final del array o no está presente, el algoritmo tendrá que recorrer todo el array, lo que aumenta el número de comparaciones.
¿Es posible escribir una función de crecimiento exacta para la búsqueda lineal?
-No, no es posible escribir una función de crecimiento exacta para la búsqueda lineal porque el número de comparaciones depende de cómo está organizado el input. Sin embargo, se puede usar una aproximación mediante notación O, donde el peor caso es O(n).
¿Cómo podría ‘arreglarse’ el problema de escribir una función de crecimiento exacta?
-Una forma de 'arreglar' este problema es hacer suposiciones adicionales sobre el input, como asumir que ya está ordenado. Bajo estas suposiciones, se puede escribir una función de crecimiento más precisa.
¿Qué es el análisis amortizado en el contexto de la búsqueda lineal?
-El análisis amortizado es una técnica que permite analizar el comportamiento promedio de un algoritmo. En el caso de la búsqueda lineal, se podría suponer que el input está ordenado aleatoriamente, lo que lleva a una función de crecimiento promedio de f(n) = 1/2 * n.
¿Por qué no existe una función de crecimiento para algunas funciones, como el retorno en una búsqueda lineal?
-Porque el número de comparaciones necesarias para determinar si un elemento está en la colección depende de la posición del elemento en el input. En algunos casos, el retorno será inmediato (1 comparación) y en otros, se necesitarán múltiples comparaciones.
¿Cómo afecta la estructura del input a los algoritmos de ordenamiento como insertion sort y selection sort?
-En el caso de insertion sort, si el input ya está ordenado, el algoritmo actuará de manera mucho más eficiente, en tiempo O(n). Sin embargo, en el caso de selection sort, el número de operaciones será constante sin importar cómo esté estructurado el input.
¿Cuál es la diferencia clave entre insertion sort y selection sort en términos de dependencia del input?
-Insertion sort depende de la organización del input. Si el input está desordenado, tendrá un comportamiento O(n^2), pero si está ordenado, tendrá un comportamiento O(n). Selection sort, por otro lado, siempre realizará la misma cantidad de operaciones independientemente del orden del input.
¿Por qué es útil el análisis del peor caso en algoritmos como insertion sort?
-El análisis del peor caso es útil porque proporciona un límite superior sobre la cantidad de tiempo que el algoritmo puede tardar. En el caso de insertion sort, aunque puede ser muy rápido en algunos casos, en el peor caso puede comportarse como O(n^2), lo cual es importante saber al analizar su eficiencia.
Outlines
🤔 Importancia del carácter del input en el rendimiento de un método
En este párrafo se discute la influencia del carácter del input en el rendimiento de un algoritmo, utilizando el tamaño del input como un ejemplo inicial. Se menciona que, hasta ahora, se ha asumido que el tamaño del input es el único factor importante, pero esto no es del todo correcto. Utilizando el ejemplo de una búsqueda lineal, se ilustra cómo el orden de los elementos dentro del input afecta directamente la cantidad de comparaciones necesarias, y por ende, el rendimiento del método. Se cuestiona la validez de una función de crecimiento simple cuando el rendimiento depende del carácter del input.
📉 Análisis del peor caso y complejidad Big-O
Este párrafo profundiza en el análisis del peor caso para un algoritmo, utilizando la búsqueda lineal como ejemplo. Se explica cómo, aunque no se puede escribir una función de crecimiento precisa debido a la dependencia del carácter del input, es posible realizar un análisis utilizando notación Big-O, como O(n) para el peor caso. También se menciona cómo hacer suposiciones adicionales, como el hecho de que el input esté ordenado aleatoriamente, permite escribir funciones aproximadas de crecimiento. Se introduce el concepto de análisis amortizado para casos en los que se asume un input aleatorio.
🔄 Dependencia del input en algoritmos de ordenación
Este párrafo introduce la idea de que los algoritmos de ordenación también tienen problemas de dependencia del input. Se utilizan ejemplos de algoritmos comunes como el ordenamiento por inserción y el ordenamiento por selección, los cuales solucionan el mismo problema pero con diferentes comportamientos dependiendo de cómo está estructurado el input. Mientras que el ordenamiento por selección toma siempre el mismo tiempo sin importar el estado del input, el ordenamiento por inserción varía, siendo mucho más rápido si el input ya está ordenado. Esto refleja cómo el carácter del input afecta el rendimiento de estos algoritmos.
Mindmap
Keywords
💡Tamaño del input
💡Crecimiento de funciones
💡Dependencia del input
💡Búsqueda lineal
💡Big-O
💡Análisis amortizado
💡Ordenación por selección
💡Ordenación por inserción
💡Análisis del peor caso
💡Input aleatorio
Highlights
The input's structure, not just its size, significantly impacts the performance of algorithms.
The assumption that arrays of the same size behave similarly in terms of algorithm performance is flawed.
In linear search, input order affects performance; sorted inputs can be processed much faster.
The number of comparisons in linear search depends on the input structure, with sorted arrays requiring fewer comparisons.
It's challenging to define a precise growth function for an algorithm when its performance depends on input characteristics, not just size.
Big O notation provides a way to approximate performance, even when a precise growth function can't be determined.
The worst-case scenario for linear search leads to O(n) complexity, where n is the number of elements.
In some cases, an additional assumption, like assuming the input is sorted, can allow for a more accurate growth function.
Randomly shuffled input assumptions help create amortized analysis for performance estimation, predicting average cases.
The behavior of sorting algorithms like selection sort and insertion sort differs based on the input order.
Selection sort performs the same number of operations regardless of input order, making it consistent but not always efficient.
Insertion sort behaves differently depending on input order: in the best case (sorted input), it runs in O(n) time, while in the worst case (unsorted), it runs in O(n^2).
Worst-case analysis is crucial for understanding the upper limit of algorithm performance.
Input dependence is a common issue for many algorithms, affecting their time complexity.
When growth functions don't exist, worst-case and amortized analysis can provide a useful approximation for algorithm performance.
Transcripts
- In this video, I wanna discuss the impact
of the character of the input on how well a method runs.
So in the past, basically we've been making this assumption
that the only thing I really care about, in terms of input,
is the size of it.
So any array of size 10, you know, it'll look the same.
And maybe you can get that, right, from the growth function.
The growth functions are written something like f(n).
There's only one thing there, right?
There's only an n there.
So if I have, you know, different arrays, I'll have,
maybe, different stuff inside.
They all, you know, if they're the same size,
then, well, they all look the same to that growth function.
All right, that's what we've been assuming so far.
Let me actually draw a little bit here.
So let's say, for example, we have N equal...
five.
So if we were to go over here and say something like
A...
B...
C...
D...
E...
That's an array of size five.
I give it to a sorting program, right, it'll do its stuff.
But I also have different inputs, right, of that same size.
Maybe I just have the reverse order, E, D...
C...
B.
And there's a little bit of an issue here, right,
in that these things look different
but they're the same size.
Up until now, we've made the assumption that, oh,
they're the same size, so they're equivalent.
And that just really isn't the best assumption to make.
So let's say that we have this linear search program.
So linear search is a program, right, that basically looks
at an input pool of elements, and it tells you is the sum
in that collection or not, is it in that array or not.
Well, if we think about maybe searching
for something like...
A, right?
A, in the input, right, and the input
can be the same size here, right?
So I have here, right, two different inputs, okay?
Say they're size five here, right?
Feed them both to this program as the pool, right?
Our target may be something like A.
If I run this program on that first input,
say this is input one, what happens is I finish immediately.
Right, so because the data is set up a certain way,
because it's sorted, right, A, B, C, D,
if I run linear search on that and I'm looking for A,
it's basically done, right?
The first loop, right, looks to the first L,
and the first L matches and returns true, done, right?
Really, really easy, right?
And very little combination to do.
So maybe if we were saying here, you know,
we're doing some analysis on this, and we wanna do analysis,
you know, the right way and choose a good cause metric,
maybe our cause metric could be something like the equality.
Right, equalities are nice because there's something...
Well, here it's nice
because it happens inside a loop, right,
something happens very frequently,
and it's very kind of core to the idea of the algorithm,
right, checking to see whether something's in a collection.
So it'd be reasonable to count up
the number of times that occurs.
But if I have my input of size five that goes from A to E,
what happened is this is only...
gonna cause us to do one comparison,
which is okay, it's nice.
But what happens if I put the second input in there?
If I put the second input in there, in the reverse order,
E, D, C, B, A...
well...
I'm going to need more than one comparison.
In fact, I need five comparisons because my linear search
is gonna go and look at each of those elements in turn
'til it gets to the very end.
And so now I have this over here, this goes to five.
So what is our result?
Our result, right, well, conceptualize our result, here,
is a linear search is dependent on its input size.
Excuse me, a linear search is dependent on
how its input looks.
So if it looks one way, it'll work quickly.
If it works another way, it'll, you know, not work quickly.
But because it has that kind of character,
it's no longer fair game for us
to write an exact growth function, right?
I can't have a function that, you know, I have...
I can't have, you know, this, right?
This is bad, right?
F(5) equal one, f(5) equal...
five.
That's absolutely horrible, don't ever do that,
that should make you very sad if you've paid attention
in math classes, that's not a function.
And again, right, the issue here, right, is that
how the function works depends on how the input data looks.
Generally what this implies
is that we do not have a growth function, right?
It may not be possible, actually, to write a growth function
for a particular program if there's this dependency.
And so, unfortunately, this happens with...
Well, it's uncommon, right?
So you'll see it although it's not, you know, super common.
So there's a lot of things that a growth function
does not exist for.
And, in that case, what we're left with is using something
that's an approximation, something like a bound.
So, in this case, right, my...
If I were to summarize my analysis here, I would say
that my f(n)...
We'll say that does not...
exist.
I'll actually come back to that 'cause there's a way
we can force it to work, but the only thing
we could really say is that this is maybe something
like O(n), right, so this particular function here...
Even though I can't tell exactly how many times
I'll need you to make that comparison,
what I can see is, oh, that comparison happens
within a four-loop, that four-loop happens up to n times.
Well, that's kinda like a worst case, right?
N comparisons, so I go, hey, that's big O,
that's the worst case right there, so this whole thing
could be O(n).
So this would be a fair analysis, that would be good to go.
But again, the issue is that our straight-up growth function
simply does not exist because the function basically
isn't predictable enough, right?
The amount of work it has to do isn't predictable
from the input size alone or from the size array,
and that is an issue.
One thing we could do...
maybe if we wanted to somehow "fix" this, is say,
okay, I can make an additional assumption
about how my input looks,
and so I can say, okay, I can assume, for example,
that it's already sorted.
If it's already sorted, then, hey, my growth function
is basically gonna be f(n) equal one or f(n) equal zero,
depending on whether it's there or it's not there.
If it's sorted in reverse order, right, I can make
the assumption that it's gonna be f(n).
F(n) equal n or something like that.
Point is, if I make additional assumptions about my input,
I can maybe then use those constraints to actually write
a growth function for that.
But I need additional assumptions.
The size of the input is not alone, is not enough.
One very common thing to do here is basically to assume
that our input is randomly shuffled.
If our input is randomly shuffled,
we can actually say the following.
F(n)...
equal 1/2...
n.
And what that means, is it means we say, okay,
on average, right, I will have seen my input
by the time I get through half of the data, right?
That's that one half doing there, right?
We have to explore one half of those n elements
before we're likely to see something.
And once we see it, well, hey, you know, we can return true.
So this is (murmurs), and it's called amortized analysis,
and we're not gonna talk a lot about that in this class.
The only time we'll use it is...
The only time we'll use the idea is basically
to say, okay, sometimes when you have a pool of elements,
we just randomly assume that they're sorted.
We'll assume that they're randomly ordered, right,
because then we can, don't have to worry
about these (murmurs) of, oh, it's done really quickly
or it's done really slowly.
If it's just all random, then we know there's
for sure gonna be work to do.
So just kinda keep that in the back of your head, right?
That's another way to approach this,
but we need additional assumptions.
So, to answer that question there at the bottom, right,
hopefully you got it, right?
No, it's not possible to write a growth function for that.
It's also not possible to write a growth function
for something like the return.
The return is something that, basically, I have to know,
is the element in the collection or not, right?
If I do, then the growth function for that is one.
Otherwise, it's zero.
And the opposite holds for the return false over here.
Having said that, having worked out an analysis,
let's actually look at sorting algorithms for a second
because sorting algorithms have, basically,
this same issue of input dependence.
So hopefully in your previous courses, like CR200 or CSC205
or CSD200 or many of the other classes that serve
as prereqs for this, you've seen, you know, the algorithms
for insertion sort and selection sort.
So those are two algorithms that, basically, we give them
some pool of elements, and at the end of the day,
they produce for us those, basically, that same data
but now in a sorted order.
So maybe it's ascending, maybe it's descending.
You know, it doesn't really matter.
Point is, it's ordered according to some rule, right,
less than or greater than.
Okay, so think about those sorting algorithms
for a second, right?
Insertion sort, right, is about this idea
of taking an element and basically shifting it
into the place it belongs, whereas selection sort
is about reaching into your collection of things to sort,
taking out the smallest and putting it at the front,
and then repeating it.
So it's basically adjusting one element at a time
versus insertion sort is about shifting over one element,
which may kind of involve touching many other things
along the way, and then getting it into its right position.
So you have those two different algorithms,
and these algorithms are basically an example, right,
of things that solve the same problem,
but one of them has a little bit of a different behavior
depending on how the input looks.
Now, if I have something like a selection sort,
I can run it on...
on any input of a particular size,
and it'll all act the same.
So if I have an input array of maybe, of size 10, right,
it just doesn't matter what is in there.
So it could be sorted already,
it could be completely unsorted, it could be reverse order.
No matter how it looks, selection sort will take
the same amount of time, it'll always apply
the same algorithmic kind of shuffling or de-shuffling
to generate its output.
So that's what I'm trying to basically indicate here
in this graph.
This graph on the right-hand side, what I have here is
this axis over here is supposed to be number of operations.
And, notice here, this is basically constant,
so what I'm saying is, oh, hey, no matter what you give it,
the same number of operations will be executed.
And then over here, on this axis,
which is a little bit tricky, this stuff over here,
these are basically different inputs.
So maybe, you know, this could go from ordered inputs
over here to unordered.
And even if something's really ordered, right,
it still takes the same amount of time as something
that's really unordered, right?
That, how the input looks, just doesn't impact
how selection sort acts.
Okay, so that's pretty nice, and that's actually
one of the advantages of selection sorts,
they're very kind of reliable in terms of how much memory
it uses or how much computation it uses.
On the flip side, insertion sort, which also solves
the same problem, does not have,
kind of, that level of reliability.
It's actually going to vary with what the input looks like.
So worst case for insertion sort is looking at something
that's completely unordered.
So if something's unordered, insertion sort
will look something like a couple of nested loops,
it'll do something that's like O(n-squared).
It's this line over here.
So this is unordered.
So that's okay, right?
If we were to look at something like a selection sort,
a selection sort is also O(n),
so maybe there the behaviors are comparable.
Now, what happens, though, is that in insertion sort,
if I already have an input that's sorted,
it acts completely different.
It doesn't actually have nested loops anymore,
it just has one loop, one loop that goes and looks
at every element and checks to see whether that element
is in its proper place.
For an unordered case, right, its insertion sort
will have two loops because it uses the inner loop
to push the unordered elements into their proper position.
But here, right, if we say, oh, hey, things are ordered,
the inner loop basically never needs to run
because that element doesn't need to be shifted
into its proper position, so we're good to go.
So it turns out that if I have things that look sorted,
insertion sort acts much, much more
like a linear time algorithm.
So it looks like O(n) because somehow it turns
from the sorting problem
into the verification problem, right?
Check to see whether this array is already sorted or not,
that's what insertion sort is effectively doing
on an input that's already sorted.
So this is another example of input dependence.
A lot of times, right, when I'm running code
that has this dependency, and it's basically something
for us to, well, we do need to worry about it.
Usually what happens is...
Well, what we hope happens is something like insertion sort,
where we can find a good upper bound like O(n-squared)
for it, and then in the back of our heads,
we think to ourself, oh, cool, you know what?
That algorithm, there are cases where it'll do better
than that, it'll look more like O(n)
or have a growth function that looks like n.
That's nice, kinda, knowing your back of your head, right?
Not just that it'll only take at most n-squared time,
but that there is a good chance it'll take a lot less,
so that's a very nice thing to know.
Although, if we're writing this down, we do need
to pay attention, right, to that worst case
because the worst case generally how it gets us in trouble.
All right, so that should give you guys a general feel
for how, basically, methods may depend on their input.
Try to keep in mind, right, that there's this dependency,
and where the dependency exists, growth functions
typically don't exist, but we still have a chance
of doing some sort of big-O-style analysis instead.
Browse More Related Video
Notación Big O | Análisis de algoritmos de forma sencilla
Pensamiento computacional: Ingreso de datos por el usuario
La Dilatación
[SER222] Analytical Analysis (4/5): Analysis of ThreeSum
[SER222] Characterizing Algorithms (3/5): Reviewing Terminology
53. Programación en C++ || Búsquedas || Búsqueda Secuencial en un arreglo
5.0 / 5 (0 votes)