Spirix_shapefile_importer Large Import Hangs

I have installed spirix_shapefile_importer version 4.4a (beta) - January 12, 2018 and I’m trying to import some very large files, e.g. an assessor’s parcels file with 100,000+ parcels. I modified the Ruby script by adding a debug statement at line 273:

      # keep track of large loads    
      if (i % 500 == 0)
        puts "At " + @@timestamp.to_s + " " + i.to_s

The problem I encounter is that SketchUp hangs after 4,000. I noticed in the Windows Task Manager that SketchUp is slowly increasing its memory. I’m wondering if I’m caught in an expensive loop that tries to increase memory for each new addition and the memory management is what is killing this attempt.

What may I do to further debug the loading problem?

I’m not readily familiar with Ruby, but I am very familiar with Perl and do Python when needed.

Here’s a screenshot of the dialog screen, I used default settings.SketchUp_2018-01-25_04-54-24

Here’s the Ruby Console:
dwm_2018-01-25_05-15-39

Can’t really say without seeing the script and not seeing the file you are trying to process.

If the file IO method used is trying to read the whole file in memory all at once - then you could be running out of memory.

This strategy is suppose to read a csv file 1 line at a time and then discards the line as you go.

open("big.csv") do |csv|
  csv.each_line do |line|
    values = line.split(",")
    # process the values
  end
end

Try removing or commenting out the two lines at the beginning:

    @@model.start_operation("Import Shapefile",true)
    @@model.commit_operation

I tried the suggestions of jimhami42, the script’s developer, of several minutes ago (his suggestion is in the Ruby script comments in the heading, as well). I commented out with a hash tag the two lines below:

    spirix_shapefile_import_init()
    #@@model.start_operation("Import Shapefile",true)
    spirix_shapefile_import_get_input()
    puts "Start Time: " + @@timestamp.to_s
    puts "Processing: " + @@imax.to_s + " records"
    spirix_shapefile_import_create_polygons()
    puts "Skipped:    " + @@error.to_s
    puts "Created:    " + @@poly_count.to_s
    #@@model.commit_operation
    puts "End Time:   " + Time.now.to_s

The result was no difference; in fact, the Ruby console hung after output of 3500. The memory just keeps increasing by 512k blocks.

For posterity, here’s the site I obtained the code from:

https://sites.google.com/site/spirixcode/code

and here’s a direct link to the code:

https://sites.google.com/site/spirixcode/code/spirix_shapefile_importer.rbz?attredirects=0&d=1

The file I’m trying to load is publicly available (after agreeing to their terms) at:

http://gis.co.marion.or.us/GISDownload/gisdownload.aspx

then select:

Parcels Taxlot data used for assessment purposes.
Note: Parcel ownership info. is no longer included in download. If ownership info. is needed, please submit a request to GIS@co.marion.or.us for consideration. 1/21/2018

I have the 3DConnexion and read elsewhere on this forum that it might be a contributor to problems, so I unplugged it.

I also unchecked all the Ruby extensions I had operative so that spirix_shapefile_importer is the only operative one.

Again, hangs after 4,000. Note it processes about 1,000 every 2 seconds or so. I therefore get to the hang point very quickly.

Many lengthy operations will appear to freeze SketchUp after a few moments. I started an import about 15 minutes ago and it’s using about 1.7GB of memory so far (this is normal). The important thing is the overhead available:

image

You can shorten the import time by changing the “Number of PolyItems” value to something like 100 and see the partial results much quicker.

don’t forget a restart is needed, SU can’t ‘unload’ those…

also ‘close’ all ‘Inspector Windows’ and for a large file close ‘Ruby Console’ as puts are stored in memory and will slow everything down…

you could redirect ‘puts’ to a tmp file, but it takes a bit of find and replacing…

john

Thank you, Jim. Yes, I had previously tried various limits, e.g. 100, 3,000, and they worked fine. So, I then used the importer without restrictions and ran into this problem. I let the import run overnight an it remained hung.

I think my output statement in the loop after every 500 objects would be a way of showing life, unless the Ruby Console is also frozen. Here’s a snipped showing my “puts” statement:

# CREATE POLYGONS -----------------------------------------------------------------------
  def spirix_shapefile_import_create_polygons()
    if(@@display_bb == 'Yes')
      group = @@model.entities.add_group
      group.entities.add_edges(Geom::Point3d.new(0,0,0),Geom::Point3d.new((@@bbox[2] - @@bbox[0]) * scale_x(@@bbox[1]),0,0),Geom::Point3d.new((@@bbox[2] - @@bbox[0]) * scale_x(@@bbox[1]),(@@bbox[3] - @@bbox[1]) * scale_y(@@bbox[3]),0),Geom::Point3d.new(0,(@@bbox[3] - @@bbox[1]) * scale_y(@@bbox[3]),0),Geom::Point3d.new(0,0,0))
    end
    for i in 0...@@imax
      # keep track of large loads    
      if (i % 500 == 0)
        puts "At " + @@timestamp.to_s + " " + i.to_s
      end
      if(@@use_data == 1)
        @@height = @@data[@@hdr_length + i * @@rec_length + @@offset,@@data_len].to_f * @@scale
      end

Side question: when SketchUp hangs, is the Ruby Console hung as well?

Another commentator wondered about the file read. It looks like the entire file is read in one operation around lines 346+:

# GET SHAPEFILE DATA --------------------------------------------------------------------
  def spirix_shapefile_import_get_data()
    file = @@basename + '.shx'
    input = File.open(file,'rb')
    @@index = input.read(File.size?(file))
    input.close
    @@type = @@index[32,4].unpack('L').first
    if(@@index[0,4].unpack('N').first != 9994)
      puts "*.shx is not a valid shapefile! Header record has " + @@index[0,4].unpack('N').first.to_s + "."
      exit
    elsif(@@type != 5 && @@type != 15 && @@type != 3)
      puts "*.shx does not contain polygons or polylines! Header record has " + @@type.to_s + "."
      exit
    end
    file = @@basename + '.shp'
    input = File.open(file,'rb')
    @@geom = input.read(File.size?(file))
    input.close 

It looks like all the contents of the shapefile are read into the variable geom and this occurs before the processing loop where I inserted the “puts” debugging line.

Thank you, john_drivenupthewall. I did restart SketchUP.

I’ll try the temp file approach for debug monitoring. As my screenshot in my earlier post shows, though, the Ruby Console is not even filled with output, there remains empty lines below the last output. Often consoles that keep getting lots amount of output then have to manage their own scrollback memory, but in this case, my output only consumes a handful of lines when the hanging of the application commences.

I use this in a few of my extensions…

you will need your own namespace, path to tmp, etc…

# module
module JcB
  OSX = Sketchup.platform == :platform_osx unless defined? JcB::OSX

  # record for debugging...
  module Report
    extend self
    def echo(msg)
      model = Sketchup.active_model
      tmp_dir = File.join(__dir__.sub('/lib', '/tmp'))
      title = model.title
      logfile = File.join(tmp_dir, title + '.log')
      SKETCHUP_CONSOLE.write("#{msg}\n") if SKETCHUP_CONSOLE.visible?
      File.open(logfile, 'a') { |l| l.write("#{msg}\n") }
    end
  end # Report
end # JcB

and the replace p and puts with JcB::Report.echo

i.e. JcB::Report.echo "culling #{to_cull.length}"

john

Yes.

Yes, the file is read into a buffer, but I don’t believe this causes the problem. While the script is running, SketchUp is creating polygons in memory, which is what is growing. I believe that the data (only 56MB) is static in memory in @@geom. The index file for the database (@@index - 860KB) is used to index into the data file so the lines can’t easily be read one line at a time (it’s more like processing two linked XML files than a single CSV file).

There may be a way to force SketchUp to “dump” geometry as it goes along, instead at the end of the run. Maybe @DanRathbun can shed more light on this.

Progress…
I tried writing to a file, opening the file before the main loop and nothing appeared in the file. I did some quick searching about file flushing and presumed that what I had was sitting in an output buffer and not making to the file. So I went the more expensive route of opening and then closing the file in “append” mode within my modulus “if” statement:

    # use a file for monitoring rather than the console
    #my_debug_file = File.new("C:\sketchup_debug.txt", "a")
    for i in 0...@@imax
      # keep track of large loads    
      if (i % 500 == 0)
        my_debug_file = File.new("C:\sketchup_debug.txt", "a")
        my_debug_file.puts "At " +  Time.now.to_s + " " + i.to_s
        my_debug_file.close
      end

With this method, I’m sailing past the 3500/4000 barrier:

At Thu Jan 25 06:45:10 -0800 2018 0
At Thu Jan 25 06:45:10 -0800 2018 500
At Thu Jan 25 06:45:11 -0800 2018 1000
At Thu Jan 25 06:45:11 -0800 2018 1500
At Thu Jan 25 06:45:12 -0800 2018 2000
At Thu Jan 25 06:45:13 -0800 2018 2500
At Thu Jan 25 06:45:14 -0800 2018 3000
At Thu Jan 25 06:45:15 -0800 2018 3500
At Thu Jan 25 06:45:16 -0800 2018 4000
At Thu Jan 25 06:45:17 -0800 2018 4500
At Thu Jan 25 06:45:18 -0800 2018 5000
At Thu Jan 25 06:45:20 -0800 2018 5500
At Thu Jan 25 06:45:21 -0800 2018 6000
At Thu Jan 25 06:45:23 -0800 2018 6500
At Thu Jan 25 06:45:25 -0800 2018 7000
At Thu Jan 25 06:45:27 -0800 2018 7500
At Thu Jan 25 06:45:29 -0800 2018 8000
At Thu Jan 25 06:45:31 -0800 2018 8500
At Thu Jan 25 06:45:33 -0800 2018 9000
At Thu Jan 25 06:45:35 -0800 2018 9500
At Thu Jan 25 06:45:38 -0800 2018 10000
At Thu Jan 25 06:45:41 -0800 2018 10500
At Thu Jan 25 06:45:44 -0800 2018 11000
At Thu Jan 25 06:45:46 -0800 2018 11500
At Thu Jan 25 06:45:49 -0800 2018 12000
At Thu Jan 25 06:45:52 -0800 2018 12500
At Thu Jan 25 06:45:55 -0800 2018 13000
At Thu Jan 25 06:45:59 -0800 2018 13500
…[and running]

I’m starting to think that the output to the Ruby Console was part of the problem as the only thing different is that I have not activated the Ruby Console and am monitoring through file output.

1 Like

BTW, I modeled New York City with over a million buildings. I had to break the results into 73 files to keep the size of each under the 50MB Warehouse limit:

1 Like

I don’t know your purpose for this, but I will point out an additional feature of the importer. I ran it for only the first 110 shapes as an example using these settings:

image

While this was intended to add height to the polygons, your dataset is two-dimensional. However, these settings create “zero height” buildings:

image

The PROPID value is used as the name of each group … you can easily display this with the Text Label tool:

These correspond to the Account Number on the Marion County Assessor’s Property Records site. For example, the R10102 group displayed gives this:

While this is a public record, I’ve blacked out everything except the address.

Note that the “Bounding Box” option produces a handy reference frame to locate a potential subset of the data. You can use this information with the Max/Min values of the dataset to change the max/min values to a smaller set.

image

I suspect that the max/min values relate to the overall shape of the county:

image

Bugsplat occurred at 95,000 records. The end of the undo log at line 584,671:

Start(Group)Commit(292352)
~UndoOp(292252)
Start(Create Face)Commit(292353)
~UndoOp(292253)
Start(Rename)Commit(292354)
~UndoOp(292254)
Start(Group)Commit(292355)
~UndoOp(292255)
Start(Create Face)

What’s interesting is that the Bug Reporter referenced a file, ED053OE2.xml which did not exist. The other two files do exist.
BsSndRpt_2018-01-25_07-54-40

I’m going to try and create a subset of the parcels I’m concerned with which are less than 1% of the file.

[Edit: revised screenshot of Temp directory - now has overlayed markers]

Success - using smaller dataset.

Using OpenJump, I created a bounding box for the area I am interested, just a handful of city blocks.

I then placed my cursor near the left top box and captured the coordinates:

I did the same for the right bottom box:

I then had to map my coordinates as follows:

top left:
x 7548309.111777061 => X [min]
y 473682.17746206274 => Y [max]

bottom right:
x 7549996.106540887 => X [max]
y 472497.60486760817 => Y [min]

thusly:
SketchUp_2018-01-25_08-30-00

It almost instantly create what I desired:

Thank you very much for your timely help.

I can try anything in an attempt to import the entire file is someone has any further suggestions. What I have is a works-for-me solution, yet I still do not know why the import failed at the ~ 80% mark.

2 Likes

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.