Autoload type loading code, stop loading everything on boot!

@kengey

Ok. My original post was about autoloading/load-on-demand. You’ve replied with a post minimally discussing that, and then discussing a pattern that somehow affects it?

First of all, regarding the concept of creating one source file from many, have you compared the load time of separate files vs the single source file? From your description, I assume you have both available…

Regarding source files, you threw out the number “1,000s”. Might be a little bit high…

On to your other subject…

but the code will only be called when needed

That’s true of any well designed app. How does the pattern you use, and the articles you linked to change anything?

Since SU is an app with a UI, there’s no reason to create objects unless they’re required for loading, or required due to a user action. Some dev’s may make a choice to create objects on load, which causes slower loading but quicker response with the UI, but that’s a reasonable choice.

how having a codebase of 1000’s separate files that only get loaded when needed but together with poor coding practices

Please clarify what you mean by ‘poor coding practices’.

Finally, I would caution against using patterns designed for one type of language with another. Using a pattern without a clear reason for doing so will just make code more complex.

I am not trying to argue against your post. The purpose of your post is to show how it manages to not load unneeded code and i am questioning if it is really the amount of code that affects load time, or the way that code is written and the number of objects that are initialized upon read time. And I question if it really is that significant, the amount of code you manage not to load at startup. Buttons in a toolbar not only invoke some command but also reflect a certain state in your app to make them checked or grayed.

Poor code? Use modules for everything… they defeat OO and they are initialized immediately even if you do not need them, or only need parts of them. All my extensions up till around 18 months ago used modules quite a lot for what would be singletons, such as data repositories, because that made them globally accessible. That was slow on boot. Because the entire repo would be created and stored in memory even if the user might never use the app. Then I got involved in a few asp.net core projects and learned about seeing everything as a service, using a service provider and constructor dependency injection and eventually tried to map those same practices on my ruby code. It did solve the issues I was experiencing at least.

Anyhow, if you prefer, we could continue this discussion elsewhere, so this one can remain about autoloading.

I have done a comparison on loading write_wlsx. This is the largest library I have combined into one file. Here are all the separate files. It has a dependecy on zip, so that is merged into that same large file:



C

All these files are combined into 1 file with 26k lines:

And here is the comparison.
Separate files:


All into 1 with 26k lines:

There is no one here creating extensions that are made up of 26k lines.

Not so much a ruby-ist myself so I might interpret these numbers wrong. But one 26k lines file takes 0.07s while the bunch of separated files take somewhere around 1.6s. If I am not mistaken, combining files into 1 improves loading speed with 2,200%

Another edit. Of course, the previous comparison is pretty slow because the path containing all files is the last in $LOAD_PATH. For that reason I did a comparsion where I reverse the array:
image

0.17s vs 0.07s. Still 2.5 times slower than the single file variant, but, 10 times(!) faster than the non reversed original $LOAD_PATH. That makes me think… What if Sketchup by default would’ve just reversed the order of $LOAD_PATH? The number of requires for libraries is I believe significantly lower than the number of requires for code in your own codebase (if you do not happen to combine everything into one file as I do)

But that also makes me think. Libraries are usually separated in far more smaller files than extension developers separate their files. So that might just reverse the issue.

I believe 1 core thing Ruby does wrong: it makes 0 distinction between source code and runtime code. Source code and the way we prefer to keep things nicely separated in different files only benefits the developer. And there are a few ways to try and solve that problem, such as autoload, or do your requires in a certain method so you can require it gradually, or whatever, I am afraid it will never outperform 1 file per library, instead of multiple. If loading separate files gradually is faster, then there is something wrong with what that code does when it is not needed to do it upon require.

Finally, I have also done a comparison for a library that is present in the stdlib: rexml.
I have combined rexml/document into 1 file and placed it at the same location rexml/document_combined. I will also attach that file here.
edit: other file with added copyright:
document_combined.rb (219.6 KB)

require "benchmark"

time = Benchmark.measure do
  require "rexml/document"
end
puts time

gives:

require "benchmark"

time = Benchmark.measure do
  require "rexml/document_combined"
end
puts time

gives:

@kengey

Thanks for taking the time with your replies.

First, you mentioned the term ‘Rubyist’, which is a term I used in my profile. I should probably change that, but regardless, I’m a coder.

Been coding since the seventies as a teenager, started in CompSci, my ‘Gang of Four Design Patterns’ book is a very early printing, and I’ve written a lot of ‘MSFT’ code, both before and after .NET. I could still be an idiot, but I think most people place me somewhere above that.

First of all, with the multiple files/one file issue, if s is the average size of the files, and n is the number, as the ratio n/s rises, any measurement will be more about timing the OS’s opening of files instead of Ruby’s parsing of them. As to what values of n/s pertain to typical plugins, I don’t know.

.NET languages are essentially strongly typed compiled languages, while Ruby is an ‘untyped’ interpreted language. Ruby is also a ‘console language’, and using a ‘single instance’ of it in a UI based application like SU causes issues not seen when it’s a console app.

So, with SU’s embedded Ruby, one might look at Rubygems and Rails, as both are applications that often run for a long time. Both make use of an ‘autoload’ concept. Multiple files, load on demand.

I think some of the motivation for your ‘one file’ system was based on using gems/std-lib’s. One issue with them is that they may load a lot more files than needed for a particular task that they perform. That makes CI testing easier, as loading the main file also loads everything else. And since many are open source…

So, since one of the articles you cited was .NET with C# examples, maybe we should consider why .NET has so many assemblies/files instead of just one? Do you load all of them in your .NET/C# code? Why don’t c programs load every Windows dll, instead of only the ones they use?

I believe 1 core thing Ruby does wrong: it makes 0 distinction between source code and runtime code.

I having a hard time with that statement. Feel free to cite any articles/blogs, etc that share that opinion. In what way does the phrase ‘distinction between source code and runtime code’ apply to C#? How does it apply to an interpreted language like Ruby?

Normally, C# compiles several source files into a few runtime files. Ruby does the same in it’s VM. Where’s the difference?

Poor code? Use modules for everything

All my extensions up till around 18 months ago used modules quite a lot for what would be singletons, such as data repositories, because that made them globally accessible. That was slow on boot. Because the entire repo would be created and stored in memory even if the user might never use the app.

You seem to be conflating what code runs when it’s loaded by Ruby with whether that code is contained in a class or a module. Modules do not need to run any code when loaded. Most Ruby apps that use classes for singleton objects could be re-written with modules and perform exactly the same way. It’s just the norm to use classes…

Your posts have bridged several topics. Summing up my thoughts:

  1. I believe code should be loaded in an ‘on demand’ fashion.
  2. Using some gems and/std-libs may load a lot more code than is required. How best to address that is messy.
  3. Combining multiple files into one may be helpful, but may also load a lot of code that is rarely used by the SU user.

One topic I haven’t seen addressed is what should object lifetime be? Even with autoload/load-on-demand, one still has the issue object lifetime.

So a plugin loads a file that displays/controls/interacts with a user dialog box. After the user closes the dialog, should the dialog instance be destroyed?

If a user has several plugins installed, and opens several dialogs in the course of a days work, does that affect SU? Same issue with other objects, whether they’re part of the SU API or native to Ruby. I’m guessing not, but I don’t know…

Do not take that personal, at least I did not use that term to address you personally.

Regarding your book, i know you wrote it, you have mentioned that before and i have read it.

What got me into answering here is your title, that states we should stop loading all our code at once. I tried to give several examples that pure the fact of loading all code is perhaps not the real issue.

Does everything I write makes sense or explains it correctly? Sure not.

@kengey

I didn’t take is a slight, I just recalled that I used it. Re the book, I did not write it, I mentioned it because I bought it long before it was considered a ‘standard read’ about design patterns and OOP programming.

You mentioned a design pattern, and the MSFT article about it is not one of their better articles. It mentions a valid idea, but the idea can easily conflict with the idea of encapsulation.

Also, use of a ‘pattern’ is often touted as a solution to problem x, but problem x may be language dependent. The pattern may be being used, but it’s actually solving another issue.

An example would be multiple inheritance. If a Ruby module is included in a class, is it really multiple inheritance? If it’s directly included in several other classes, it is. If it’s only included in one, it’s simply adding methods that could be placed in the class…

Then I misread that.

If a ruby module is included, then it tightly couples that module to that class. If it is only used by that class then why not write it directly in that class? If it is so that you can use it in other classes in the future, then again they are thightly coupled.

Agree. And many forget that a module is an instance of class Module.

It has been argued in the past, I think by Ken, that the difference is the data of state. Using a module, a coder often uses static data variables to hold state, whereas a class instance only holds the data whilst it exists.
It is up to the coder to ensure that they only hold state data when it is needed, either as an instance of some custom class, or an instance of a Ruby core class (Array, Hash, Struct, etc.)
If however, data is and needs to be kept during the life of the session for quick and repetitive access, then IMO there is no difference between wrapping it up in a custom singleton class instance or holding it in a submodule. (I’ve found it is easier just to use a hash and load and save them from/to JSON files.)

A good coder uses either (module or class) where appropriate.

The most interesting thing I found out today was that the index in $LOAD_PATH which is used to find the requested files has a huge impact. Reversing the global array gave me x10 better loading times loading the same set of files. 1.6s vs 0.16s or something. Imagine 20s of extensions all loading x files.

We may not want to reverse the whole thing. And it depends upon what the require call is loading. If from a path relative to the “Plugins” then yes the 4th path in $: we may want to check first. But if loading from the standard library, then we want to check the 1st path in $: first.

But, it is not just $LOAD_PATH that slows require down. It is also the check of the $LOADED_FEATURES array. Both of them hold path strings and string comparison is known to be slow in Ruby.

The require doc says foremost:

If the filename does not resolve to an absolute path, it will be searched for in the directories listed in $LOAD_PATH ( $: ).

So a coder can skip the $LOAD_PATH iteration if they use absolute path arguments.

The same if they use require_relative.

Secondly, if they write their code so as to allow multiple loads and use absolute paths, they can use load instead of require, and skip the check for both $: and $". (This only works for plain .rb files, as Sketchup#load is just an alias for the Sketchup::require method.)

I think I mentioned that case…

If it is so that you can use it in other classes in the future, then again they are tightly coupled.

I suspose that applies to Enumerable? You’re misapplying the concept of ‘tightly coupled’…

Agreed. An importantly, the code only needs to lead the data when needed, which may be on load or later…

?

If class A is including module B, then A is tightly coupled to B

Next:
Traits and Mixins Are Not OOP

Noxw, we are getting somewhat off topic, aren’t we?

@kengey

The top voted answer in the stackoverflow link shows an example that is only applicable to strongly typed languages. The ‘bad’ example is compiling to a class def, the ‘good’ example is compiling to an interface def. Common practice when required.

The second article is ok, I read another by the same author. Main issue I have with his writing is it assumes one wants to code in a very strict OOP style, without really proving why that’s best. Also, discusses mixins violating the principal of encapsulation, but to reach that conclusion, again makes several assumptions. Also makes statements like:

“Such a tight coupling between mixins and object private structure leads to nothing but unmaintainable and difficult to understand code.”

He seems to be saying that mixins must use an object’s private structure. They don’t have to. Note that every interface that Enumerable requires is part of Array’s public interface. Regardless, I guess this somehow leads to "unmaintainable and difficult to understand”…

Anyway, if code doesn’t require polymorphic objects or runtime selection, why is it then considered improperly ‘tightly coupled’? Also, a lot of discussions of coupling totally ignore that lightweight objects normally want a bare minimum of data stored within them. As I see it, any time one ‘loosens coupling’, one often adds data to at least one of the objects…

In that sense, you are misapplying the ‘degree of coupling’ concept.

I think I’ve said something similar earlier, but I want to make clear that implementing a pattern without identifying a clear and distinct problem that it solves in your application, based on your app’s specs or requirements, now or in the future, is a waste of time…

Hi Greg,

The way one codes of course is very subjective. And I am aware that there simply is no correct answer. You say the author is a very strict OOP coder and I try to be that strict as well. Being that strict requires one to mostly ignore the beauty of ruby and what it allows you to do. I try to stick with SOLID (5 SOLID Principles of Object-oriented Design) regardless of all other beautiful wizardry ruby allows you to do. I still am an architect I guess that once was teached Less is More.

Regarding the article that is only applicable to strongly typed languages… yes and no. It uses an interface which we do not have in ruby but the essence of the article is that if one literally types the name of another class (or module) in the code of another, that tightly couples one with the other. This has not so much to do with what I stated in the beginning of this thread, where I was a bit short and said modules are bad coding. It would have been better to nuance for what reasons I think they are bad:
Lot of code I have seen uses modules as business objects and have heavy module variables that get created upon requiring the code. I think that is bad because reading code into memory should be a separate action from executing code and initialising those module variables. This can be solved by the author, I know.
The second reason is the reason we are talking about right now: In strict OOP, it defeats SOLID (see link above)

Thus, we do not need to have Interfaces in Ruby to use the same concept. If 2 classes have the same public methods, they have the same interface.

For me it is not really that which makes modules bad, it is mere the fact that when class A includes module B you are separating the code of at least 1 class in 2 locations. That is more difficult to maintain than having a class in 1 spot and having a direct overview of what that class has and does.

Correct. But, for that reason I use a DI container that carries Proxies instead of the real object. So, imagine 1 object having a dependency on 3 others. Those 3 will have dependencies on their own I assume and so on. By wrapping everything in a proxy, the dependency will only be initialised when the holder will actually call it for a first time, saving a lot of object creations (except for the proxies of course). Here is an example of a DI container in Ruby, but, without proxies: Dependency Injection Containers vs Hard-coded Constants

Are you by any chance present at Devcamp in Leeds? I`d love to meet you in person, share a beer and have some more discussions about this, or maybe we already did at one of the basecamps I attended (in Vail or Steamboat Springs) or one of the other devcamps (Portugal, Italy, Greece or Leeds 2 years ago)? One thing written language does wrong is that it lets the reader do the intonation instead of the writer.

There might be some async nature to this which throws of measurements.

@kengey

I have yet to attend any SU conferences. I’ve attended plenty of trade shows and a handful of technical conferences. It’s always nice to meet people face to face over dinner or otherwise (beer, drinks, etc).

So, not this year, but maybe in the future…

2 Likes

It may make things a bit more indeterminate, but could one use AppObserver#onNewModel or AppObserver#onOpenModel for measurement? Haven’t tried…

I think that the order of things loaded differs a wee bit with the 2 platforms (at least it seemed to in the past.)

On Windows, I believe that just after all the extensions are loaded (and their menus and toolbars built,) the SU core calls any AppObserver#expectsStartupModelNotifications callbacks to find out if it should then call the appropriate other “model” callback. So this one can be used as a signal that the extension load cycle has ended, and that there is a valid model object ready.
Your test script would need to be the very first loaded in the “Plugins” folder.