[Scons-users] Out of memory writing back the database

Discussion:

Hill, Steve (FP COM)

2017-11-28 15:17:36 UTC

All,

We are using 32-bit Python 2.7.12 and SCons 2.5.1 on Windows (and Linux). We have now reached the case where the database is in excess of 500MB. During the build, the memory usage bubbles along at just over 1GB but, at the end of the run we sometimes get a spike in memory usage that causes an "Out of memory" exception and the build fails.

Looking at the code, it appears that the database file is produced by pickling into memory before writing to the file, hence the memory usage increases by over 0.5GB as it is being written back - and, depending on the database size, this can blow the memory space of a 32-bit process.

Firstly, I do have a medium term plan to move us to 64-bit Python with the move to Python 3 but we still need to stick with 32-bit Python at the moment (for example, we have to load 32-bit DLLs) so I need to fix this for 32-bit Python in the short term.

Secondly, I've had a look at the old (?) .sconsign per directory approach, which would seem to address this problem. I assume that this is still supported and should be as robust as the single file approach?

The only issue with this is that some of the .sconsigns appear in the directories beside the source code, which breaches the contract for our build system, where no file may be produced within the directories containing source code. Where we use VariantDirs (for the actual implementation files), there is no problem but, for header files, the .sconsign ends up in the include or interface directory. I can see how I can reasonably easily monkey patch the DirFile class in SConsign.py to override the directory in which the .sconsign resides but, before I do this, I wanted to check that there is no easier way to achieve the builds without a memory issue.

Thanks,

Steve.

Bill Deegan

2017-12-01 01:36:38 UTC

Permalink

Steve,

How many files are you processing? 500MB is very large for a sconsign.

Could you try mv .sconsign .sconsign.save, let your build run and see how
big the new file is?

-Bill

Post by Hill, Steve (FP COM)
All,
We are using 32-bit Python 2.7.12 and SCons 2.5.1 on Windows (and Linux).
We have now reached the case where the database is in excess of 500MB.
During the build, the memory usage bubbles along at just over 1GB but, at
the end of the run we sometimes get a spike in memory usage that causes an
"Out of memory" exception and the build fails.
Looking at the code, it appears that the database file is produced by
pickling into memory before writing to the file, hence the memory usage
increases by over 0.5GB as it is being written back - and, depending on
the database size, this can blow the memory space of a 32-bit process.
Firstly, I do have a medium term plan to move us to 64-bit Python with the
move to Python 3 but we still need to stick with 32-bit Python at the
moment (for example, we have to load 32-bit DLLs) so I need to fix this for
32-bit Python in the short term.
Secondly, I've had a look at the old (?) .sconsign per directory approach,
which would seem to address this problem. I assume that this is still
supported and should be as robust as the single file approach?
The only issue with this is that some of the .sconsigns appear in the
directories beside the source code, which breaches the contract for our
build system, where no file may be produced within the directories
containing source code. Where we use VariantDirs (for the actual
implementation files), there is no problem but, for header files, the
.sconsign ends up in the include or interface directory. I can see how I
can reasonably easily monkey patch the DirFile class in SConsign.py to
override the directory in which the .sconsign resides but, before I do
this, I wanted to check that there is no easier way to achieve the builds
without a memory issue.
Thanks,
Steve.
_______________________________________________
Scons-users mailing list
https://pairlist4.pair.net/mailman/listinfo/scons-users

Hill, Steve (FP COM)

2017-12-01 08:48:48 UTC

Permalink

Hi Bill,

Our code base is not that big â about 20,000 files, but most files get built multiple times (for different CPUs) and we build it in several different ways, each with many of their own VariantDirs. When the 4 most common builds are done, the .sconsign (having been deleted first) is just under 500MB but, given a bit of history or add in some of the less common builds, and it gets big enough to blow the memory space of Python.

Iâve split the .sconsign up so that largely independent builds on the same codebase are stored in a different .sconsign (initially done due to the time taken to write the .sconsign back), though Iâve had to write some custom Deciders to make this totally reliable. All in all, the .sconsign per directory (though with a .sconsign location outside the main source tree) would seem the best solution for me (and seems like it might be better performing too as not all files will need writing back for incremental builds), unless there is another option that Iâm not aware ofâŠ

Thanks,

S.

Steve,

How many files are you processing? 500MB is very large for a sconsign.

Could you try mv .sconsign .sconsign.save, let your build run and see how big the new file is?

-Bill

On Tue, Nov 28, 2017 at 7:17 AM, Hill, Steve (FP COM) <***@cobham.com> wrote:

All,

We are using 32-bit Python 2.7.12 and SCons 2.5.1 on Windows (and Linux). We have now reached the case where the database is in excess of 500MB. During the build, the memory usage bubbles along at just over 1GB but, at the end of the run we sometimes get a spike in memory usage that causes an "Out of memory" exception and the build fails.

Looking at the code, it appears that the database file is produced by pickling into memory before writing to the file, hence the memory usage increases by over 0.5GB as it is being written back - and, depending on the database size, this can blow the memory space of a 32-bit process.

Firstly, I do have a medium term plan to move us to 64-bit Python with the move to Python 3 but we still need to stick with 32-bit Python at the moment (for example, we have to load 32-bit DLLs) so I need to fix this for 32-bit Python in the short term.

Secondly, I've had a look at the old (?) .sconsign per directory approach, which would seem to address this problem. I assume that this is still supported and should be as robust as the single file approach?

The only issue with this is that some of the .sconsigns appear in the directories beside the source code, which breaches the contract for our build system, where no file may be produced within the directories containing source code. Where we use VariantDirs (for the actual implementation files), there is no problem but, for header files, the .sconsign ends up in the include or interface directory. I can see how I can reasonably easily monkey patch the DirFile class in SConsign.py to override the directory in which the .sconsign resides but, before I do this, I wanted to check that there is no easier way to achieve the builds without a memory issue.

Thanks,

Steve.

Hill, Steve (FP COM)

2017-12-01 13:21:55 UTC

Permalink

Hi Bill,

Just to follow up on this. Iâve just monkey patched the DirFile class in SConsign.py so that it writes the sconsign into a single temporary directory with each sconsign named with an adler32 hash of the directory path.

After doing the 4 basic builds, the directory contains 10,745 sconsigns totalling 674MB. Using âdebug=memory, the peak memory usage of the worst case build was 925 MB, compared to ~1.4GB for the same build with the monolithic sconsign (so the difference is basically the cost of pickling the sconsign into memory first).

Cheers,

S.

Hi Bill,

Our code base is not that big â about 20,000 files, but most files get built multiple times (for different CPUs) and we build it in several different ways, each with many of their own VariantDirs. When the 4 most common builds are done, the .sconsign (having been deleted first) is just under 500MB but, given a bit of history or add in some of the less common builds, and it gets big enough to blow the memory space of Python.

Iâve split the .sconsign up so that largely independent builds on the same codebase are stored in a different .sconsign (initially done due to the time taken to write the .sconsign back), though Iâve had to write some custom Deciders to make this totally reliable. All in all, the .sconsign per directory (though with a .sconsign location outside the main source tree) would seem the best solution for me (and seems like it might be better performing too as not all files will need writing back for incremental builds), unless there is another option that Iâm not aware ofâŠ

Thanks,

S.

Steve,

How many files are you processing? 500MB is very large for a sconsign.

Could you try mv .sconsign .sconsign.save, let your build run and see how big the new file is?

-Bill

On Tue, Nov 28, 2017 at 7:17 AM, Hill, Steve (FP COM) <***@cobham.com> wrote:

All,

We are using 32-bit Python 2.7.12 and SCons 2.5.1 on Windows (and Linux). We have now reached the case where the database is in excess of 500MB. During the build, the memory usage bubbles along at just over 1GB but, at the end of the run we sometimes get a spike in memory usage that causes an "Out of memory" exception and the build fails.

Looking at the code, it appears that the database file is produced by pickling into memory before writing to the file, hence the memory usage increases by over 0.5GB as it is being written back - and, depending on the database size, this can blow the memory space of a 32-bit process.

Firstly, I do have a medium term plan to move us to 64-bit Python with the move to Python 3 but we still need to stick with 32-bit Python at the moment (for example, we have to load 32-bit DLLs) so I need to fix this for 32-bit Python in the short term.

Secondly, I've had a look at the old (?) .sconsign per directory approach, which would seem to address this problem. I assume that this is still supported and should be as robust as the single file approach?

The only issue with this is that some of the .sconsigns appear in the directories beside the source code, which breaches the contract for our build system, where no file may be produced within the directories containing source code. Where we use VariantDirs (for the actual implementation files), there is no problem but, for header files, the .sconsign ends up in the include or interface directory. I can see how I can reasonably easily monkey patch the DirFile class in SConsign.py to override the directory in which the .sconsign resides but, before I do this, I wanted to check that there is no easier way to achieve the builds without a memory issue.

Thanks,

Steve.

Bill Deegan

2017-12-01 16:19:44 UTC

Permalink

Would you consider contributing that logic in a pull request and/or
pointing to a commit(s) in your repo?

Currently in the plans is to transition the .sconsign to another format,
possibly json or sqlite with an eye on speed and/or incrementality.

-Bill

Post by Hill, Steve (FP COM)
Hi Bill,
Just to follow up on this. Iâve just monkey patched the DirFile class in
SConsign.py so that it writes the sconsign into a single temporary
directory with each sconsign named with an adler32 hash of the directory
path.
After doing the 4 basic builds, the directory contains 10,745 sconsigns
totalling 674MB. Using âdebug=memory, the peak memory usage of the worst
case build was 925 MB, compared to ~1.4GB for the same build with the
monolithic sconsign (so the difference is basically the cost of pickling
the sconsign into memory first).
Cheers,
S.
Hi Bill,
Our code base is not that big â about 20,000 files, but most files get
built multiple times (for different CPUs) and we build it in several
different ways, each with many of their own VariantDirs. When the 4 most
common builds are done, the .sconsign (having been deleted first) is just
under 500MB but, given a bit of history or add in some of the less common
builds, and it gets big enough to blow the memory space of Python.
Iâve split the .sconsign up so that largely independent builds on the same
codebase are stored in a different .sconsign (initially done due to the
time taken to write the .sconsign back), though Iâve had to write some
custom Deciders to make this totally reliable. All in all, the .sconsign
per directory (though with a .sconsign location outside the main source
tree) would seem the best solution for me (and seems like it might be
better performing too as not all files will need writing back for
incremental builds), unless there is another option that Iâm not aware ofâŠ
Thanks,
S.
Steve,
How many files are you processing? 500MB is very large for a sconsign.
Could you try mv .sconsign .sconsign.save, let your build run and see how
big the new file is?
-Bill
On Tue, Nov 28, 2017 at 7:17 AM, Hill, Steve (FP COM) <
All,
We are using 32-bit Python 2.7.12 and SCons 2.5.1 on Windows (and Linux).
We have now reached the case where the database is in excess of 500MB.
During the build, the memory usage bubbles along at just over 1GB but, at
the end of the run we sometimes get a spike in memory usage that causes an
"Out of memory" exception and the build fails.
Looking at the code, it appears that the database file is produced by
pickling into memory before writing to the file, hence the memory usage
increases by over 0.5GB as it is being written back - and, depending on
the database size, this can blow the memory space of a 32-bit process.
Firstly, I do have a medium term plan to move us to 64-bit Python with the
move to Python 3 but we still need to stick with 32-bit Python at the
moment (for example, we have to load 32-bit DLLs) so I need to fix this for
32-bit Python in the short term.
Secondly, I've had a look at the old (?) .sconsign per directory approach,
which would seem to address this problem. I assume that this is still
supported and should be as robust as the single file approach?
The only issue with this is that some of the .sconsigns appear in the
directories beside the source code, which breaches the contract for our
build system, where no file may be produced within the directories
containing source code. Where we use VariantDirs (for the actual
implementation files), there is no problem but, for header files, the
.sconsign ends up in the include or interface directory. I can see how I
can reasonably easily monkey patch the DirFile class in SConsign.py to
override the directory in which the .sconsign resides but, before I do
this, I wanted to check that there is no easier way to achieve the builds
without a memory issue.
Thanks,
Steve.
_______________________________________________
Scons-users mailing list
https://pairlist4.pair.net/mailman/listinfo/scons-users
_______________________________________________
Scons-users mailing list
https://pairlist4.pair.net/mailman/listinfo/scons-users

Eric Fahlgren

2017-12-01 16:59:18 UTC

Permalink

I got curious while reading this thread and looked at our .sconsign files.
Most of them are around 300K, but the one in our production directory was
2.5M (pretty tiny compared to Steve's 500M, but interesting in that it's
â10x bigger
even though it's building the same source). All of the small ones are
just test builds
â --
one or two builds in them and then they get deleted. The one in
production is many years old, with one build per day.

python -c 'from SCons import dblite ; x =

dblite.open(".sconsign.dblite") ; print("\n".join("%7s %s"% (len(x[s]), s)
for s in sorted(x.keys())))'

The list of keys is the same for all of those that I examined, but the data
blob seems to grow with use. (I trimmed out three of the most interesting
ones, there were a couple hundred entries in each dump.)

660 win32\build
23546 win32\install
35249 win32\install\bin

In production:

133860 win32\build
2293371 win32\install
83235 win32\install\bin

The growth of the install directory
â blobâ
itself is pretty easy to explain: we create a number of "release versioned"
files there, each has a file name containing the day's build number so we
have something like "WIN32/install/installer1234.exe" with a different
number each day.

Bill Deegan

2017-12-01 17:00:44 UTC

Permalink

There is the sconsign utility to inspect the .sconsign files...

Post by Eric Fahlgren
I got curious while reading this thread and looked at our .sconsign
files. Most of them are around 300K, but the one in our production
directory was 2.5M (pretty tiny compared to Steve's 500M, but interesting
in that it's
â10x bigger
even though it's building the same source). All of the small ones are
just test builds
â --
one or two builds in them and then they get deleted. The one in
production is many years old, with one build per day.

python -c 'from SCons import dblite ; x =

dblite.open(".sconsign.dblite") ; print("\n".join("%7s %s"% (len(x[s]),
s) for s in sorted(x.keys())))'
The list of keys is the same for all of those that I examined, but the
data blob seems to grow with use. (I trimmed out three of the most
interesting ones, there were a couple hundred entries in each dump.)
660 win32\build
23546 win32\install
35249 win32\install\bin
133860 win32\build
2293371 win32\install
83235 win32\install\bin
The growth of the install directory
â blobâ
itself is pretty easy to explain: we create a number of "release
versioned" files there, each has a file name containing the day's build
number so we have something like "WIN32/install/installer1234.exe" with a
different number each day.
_______________________________________________
Scons-users mailing list
https://pairlist4.pair.net/mailman/listinfo/scons-users

Bill Deegan

2017-12-01 17:01:42 UTC

Permalink

You're not building production from a clean sandbox? (and/or via CI)
-Bill

python -c 'from SCons import dblite ; x =

dblite.open(".sconsign.dblite") ; print("\n".join("%7s %s"% (len(x[s]),
s) for s in sorted(x.keys())))'
The list of keys is the same for all of those that I examined, but the
data blob seems to grow with use. (I trimmed out three of the most
interesting ones, there were a couple hundred entries in each dump.)
660 win32\build
23546 win32\install
35249 win32\install\bin
133860 win32\build
2293371 win32\install
83235 win32\install\bin
The growth of the install directory
â blobâ
itself is pretty easy to explain: we create a number of "release
versioned" files there, each has a file name containing the day's build
number so we have something like "WIN32/install/installer1234.exe" with a
different number each day.
_______________________________________________
Scons-users mailing list
https://pairlist4.pair.net/mailman/listinfo/scons-users

Eric Fahlgren

2017-12-01 17:19:06 UTC

Permalink

No, we just update the repository and do the build. The WIN32/install
directory is basically our installer archive, so it just sits there on one
of our servers accumulating gigabytes of garbage year after year...

Thanks for the tip on the sconsign utility, I just ran it on the old
database and it shows files dated as long ago as 2015-02-19, confirming
that it's at least a year and a half old.

(
It also crashes on that old database, but I don't think it's worth looking
into, do you care? (Works fine on all the newer ones.)

File "t:/Python27/Scripts/sconsign.py", line 277, in map_bkids
result.append(nodeinfo_string(bkids[i], bkidsigs[i], " "))
IndexError: list index out of range
)

Post by Bill Deegan
You're not building production from a clean sandbox? (and/or via CI)
-Bill

python -c 'from SCons import dblite ; x =

dblite.open(".sconsign.dblite") ; print("\n".join("%7s %s"% (len(x[s]),
s) for s in sorted(x.keys())))'
The list of keys is the same for all of those that I examined, but the
data blob seems to grow with use. (I trimmed out three of the most
interesting ones, there were a couple hundred entries in each dump.)
660 win32\build
23546 win32\install
35249 win32\install\bin
133860 win32\build
2293371 win32\install
83235 win32\install\bin
The growth of the install directory
â blobâ
itself is pretty easy to explain: we create a number of "release
versioned" files there, each has a file name containing the day's build
number so we have something like "WIN32/install/installer1234.exe" with
a different number each day.
_______________________________________________
Scons-users mailing list
https://pairlist4.pair.net/mailman/listinfo/scons-users

_______________________________________________
Scons-users mailing list
https://pairlist4.pair.net/mailman/listinfo/scons-users

Hill, Steve (FP COM)

2017-12-01 17:31:41 UTC

Permalink

The quick hack was as follows (not committed anywhere as it is very much a hack!):

import SCons

_DirFile = SCons.SConsign.DirFile

hashDir = os.path.join("tmp_files", "sconsigns")

hashFile = os.path.join(hashDir, ".hashes")

sconsignHashes = hash_cache.HashManager(hashFile)

class NewDirFile(_DirFile):

def __init__(self, dir):

"""

dir - the directory for the file

"""

self.dir = dir

internalPath = dir.get_internal_path()

pathHash = sconsignHashes.get_hash(internalPath)

self.sconsign = os.path.join(hashDir, pathHash + '.sconsign')

try:

fp = open(self.sconsign, 'rb')

except IOError:

fp = None

try:

SCons.SConsign.Dir.__init__(self, fp, dir)

except KeyboardInterrupt:

raise

except:

SCons.Warnings.warn(SCons.Warnings.CorruptSConsignWarning,

"Ignoring corrupt .sconsign file: %s"%self.sconsign)

SCons.SConsign.sig_files.append(self)

SCons.SConsign.DirFile = NewDirFile

SConsignFile(None)

Note that the lines in red are the only changes from the DirFile constructor in SConsign.py.

HashManager is a pre-existing class that Iâve reused here. It handles creating (including hash collisions) and caching the (8 character hex string) hash values â plus storing them in a file for the next run.

HTH,

S.

Would you consider contributing that logic in a pull request and/or pointing to a commit(s) in your repo?

Currently in the plans is to transition the .sconsign to another format, possibly json or sqlite with an eye on speed and/or incrementality.

-Bill

On Fri, Dec 1, 2017 at 5:21 AM, Hill, Steve (FP COM) <***@cobham.com> wrote:

Hi Bill,

Just to follow up on this. Iâve just monkey patched the DirFile class in SConsign.py so that it writes the sconsign into a single temporary directory with each sconsign named with an adler32 hash of the directory path.

After doing the 4 basic builds, the directory contains 10,745 sconsigns totalling 674MB. Using âdebug=memory, the peak memory usage of the worst case build was 925 MB, compared to ~1.4GB for the same build with the monolithic sconsign (so the difference is basically the cost of pickling the sconsign into memory first).

Cheers,

S.

Hi Bill,

Our code base is not that big â about 20,000 files, but most files get built multiple times (for different CPUs) and we build it in several different ways, each with many of their own VariantDirs. When the 4 most common builds are done, the .sconsign (having been deleted first) is just under 500MB but, given a bit of history or add in some of the less common builds, and it gets big enough to blow the memory space of Python.

Iâve split the .sconsign up so that largely independent builds on the same codebase are stored in a different .sconsign (initially done due to the time taken to write the .sconsign back), though Iâve had to write some custom Deciders to make this totally reliable. All in all, the .sconsign per directory (though with a .sconsign location outside the main source tree) would seem the best solution for me (and seems like it might be better performing too as not all files will need writing back for incremental builds), unless there is another option that Iâm not aware ofâŠ

Thanks,

S.

Steve,

How many files are you processing? 500MB is very large for a sconsign.

Could you try mv .sconsign .sconsign.save, let your build run and see how big the new file is?

-Bill

On Tue, Nov 28, 2017 at 7:17 AM, Hill, Steve (FP COM) <***@cobham.com> wrote:

All,

We are using 32-bit Python 2.7.12 and SCons 2.5.1 on Windows (and Linux). We have now reached the case where the database is in excess of 500MB. During the build, the memory usage bubbles along at just over 1GB but, at the end of the run we sometimes get a spike in memory usage that causes an "Out of memory" exception and the build fails.

Looking at the code, it appears that the database file is produced by pickling into memory before writing to the file, hence the memory usage increases by over 0.5GB as it is being written back - and, depending on the database size, this can blow the memory space of a 32-bit process.

Firstly, I do have a medium term plan to move us to 64-bit Python with the move to Python 3 but we still need to stick with 32-bit Python at the moment (for example, we have to load 32-bit DLLs) so I need to fix this for 32-bit Python in the short term.

Secondly, I've had a look at the old (?) .sconsign per directory approach, which would seem to address this problem. I assume that this is still supported and should be as robust as the single file approach?

The only issue with this is that some of the .sconsigns appear in the directories beside the source code, which breaches the contract for our build system, where no file may be produced within the directories containing source code. Where we use VariantDirs (for the actual implementation files), there is no problem but, for header files, the .sconsign ends up in the include or interface directory. I can see how I can reasonably easily monkey patch the DirFile class in SConsign.py to override the directory in which the .sconsign resides but, before I do this, I wanted to check that there is no easier way to achieve the builds without a memory issue.

Thanks,

Steve.

Mats Wichmann

2017-12-01 18:50:38 UTC

Permalink

Transition: json won't help with that. Sqlite might.

Post by Bill Deegan
Would you consider contributing that logic in a pull request and/or
pointing to a commit(s) in your repo?
Currently in the plans is to transition the .sconsign to another format,
possibly json or sqlite with an eye on speed and/or incrementality.
-Bill
On Fri, Dec 1, 2017 at 5:21 AM, Hill, Steve (FP COM)

Post by Hill, Steve (FP COM)
Hi Bill,
Just to follow up on this. Iâve just monkey patched the DirFile class

Post by Hill, Steve (FP COM)
SConsign.py so that it writes the sconsign into a single temporary
directory with each sconsign named with an adler32 hash of the

directory

Post by Hill, Steve (FP COM)
path.
After doing the 4 basic builds, the directory contains 10,745

sconsigns

Post by Hill, Steve (FP COM)
totalling 674MB. Using âdebug=memory, the peak memory usage of the

worst

Post by Hill, Steve (FP COM)
case build was 925 MB, compared to ~1.4GB for the same build with the
monolithic sconsign (so the difference is basically the cost of

pickling

Post by Hill, Steve (FP COM)
the sconsign into memory first).
Cheers,
S.
Hi Bill,
Our code base is not that big â about 20,000 files, but most files

get

Post by Hill, Steve (FP COM)
built multiple times (for different CPUs) and we build it in several
different ways, each with many of their own VariantDirs. When the 4

most

Post by Hill, Steve (FP COM)
common builds are done, the .sconsign (having been deleted first) is

just

Post by Hill, Steve (FP COM)
under 500MB but, given a bit of history or add in some of the less

common

Post by Hill, Steve (FP COM)
builds, and it gets big enough to blow the memory space of Python.
Iâve split the .sconsign up so that largely independent builds on the

same

Post by Hill, Steve (FP COM)
codebase are stored in a different .sconsign (initially done due to

the

Post by Hill, Steve (FP COM)
time taken to write the .sconsign back), though Iâve had to write

some

Post by Hill, Steve (FP COM)
custom Deciders to make this totally reliable. All in all, the

.sconsign

Post by Hill, Steve (FP COM)
per directory (though with a .sconsign location outside the main

source

Post by Hill, Steve (FP COM)
tree) would seem the best solution for me (and seems like it might be
better performing too as not all files will need writing back for
incremental builds), unless there is another option that Iâm not

aware ofâŠ

Post by Hill, Steve (FP COM)
Thanks,
S.
Steve,
How many files are you processing? 500MB is very large for a

sconsign.

Post by Hill, Steve (FP COM)
Could you try mv .sconsign .sconsign.save, let your build run and see

how

Post by Hill, Steve (FP COM)
big the new file is?
-Bill
On Tue, Nov 28, 2017 at 7:17 AM, Hill, Steve (FP COM) <
All,
We are using 32-bit Python 2.7.12 and SCons 2.5.1 on Windows (and

Linux).

Post by Hill, Steve (FP COM)
We have now reached the case where the database is in excess of

500MB.

Post by Hill, Steve (FP COM)
During the build, the memory usage bubbles along at just over 1GB

but, at

Post by Hill, Steve (FP COM)
the end of the run we sometimes get a spike in memory usage that

causes an

Post by Hill, Steve (FP COM)
"Out of memory" exception and the build fails.
Looking at the code, it appears that the database file is produced by
pickling into memory before writing to the file, hence the memory

usage

Post by Hill, Steve (FP COM)
increases by over 0.5GB as it is being written back - and, depending

Post by Hill, Steve (FP COM)
the database size, this can blow the memory space of a 32-bit

process.

Post by Hill, Steve (FP COM)
Firstly, I do have a medium term plan to move us to 64-bit Python

with the

Post by Hill, Steve (FP COM)
move to Python 3 but we still need to stick with 32-bit Python at the
moment (for example, we have to load 32-bit DLLs) so I need to fix

this for

Post by Hill, Steve (FP COM)
32-bit Python in the short term.
Secondly, I've had a look at the old (?) .sconsign per directory

approach,

Post by Hill, Steve (FP COM)
which would seem to address this problem. I assume that this is still
supported and should be as robust as the single file approach?
The only issue with this is that some of the .sconsigns appear in the
directories beside the source code, which breaches the contract for

our

Post by Hill, Steve (FP COM)
build system, where no file may be produced within the directories
containing source code. Where we use VariantDirs (for the actual
implementation files), there is no problem but, for header files, the
.sconsign ends up in the include or interface directory. I can see

how I

Post by Hill, Steve (FP COM)
can reasonably easily monkey patch the DirFile class in SConsign.py

Post by Hill, Steve (FP COM)
override the directory in which the .sconsign resides but, before I

Post by Hill, Steve (FP COM)
this, I wanted to check that there is no easier way to achieve the

builds

Post by Hill, Steve (FP COM)
without a memory issue.
Thanks,
Steve.
_______________________________________________
Scons-users mailing list
https://pairlist4.pair.net/mailman/listinfo/scons-users
_______________________________________________
Scons-users mailing list
https://pairlist4.pair.net/mailman/listinfo/scons-users

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Bill Deegan

2017-12-01 21:22:26 UTC

Permalink

Mats,

JSON may help memory, but not incrementality if it's a single file.

Thoughts?

-Bill

Post by Mats Wichmann
Transition: json won't help with that. Sqlite might.

Post by Hill, Steve (FP COM)
Hi Bill,
Just to follow up on this. Iâve just monkey patched the DirFile class in
SConsign.py so that it writes the sconsign into a single temporary
directory with each sconsign named with an adler32 hash of the directory
path.
After doing the 4 basic builds, the directory contains 10,745 sconsigns
totalling 674MB. Using âdebug=memory, the peak memory usage of the worst
case build was 925 MB, compared to ~1.4GB for the same build with the
monolithic sconsign (so the difference is basically the cost of pickling
the sconsign into memory first).
Cheers,
S.
Hi Bill,
Our code base is not that big â about 20,000 files, but most files get
built multiple times (for different CPUs) and we build it in several
different ways, each with many of their own VariantDirs. When the 4 most
common builds are done, the .sconsign (having been deleted first) is just
under 500MB but, given a bit of history or add in some of the less common
builds, and it gets big enough to blow the memory space of Python.
Iâve split the .sconsign up so that largely independent builds on the
same codebase are stored in a different .sconsign (initially done due to
the time taken to write the .sconsign back), though Iâve had to write some
custom Deciders to make this totally reliable. All in all, the .sconsign
per directory (though with a .sconsign location outside the main source
tree) would seem the best solution for me (and seems like it might be
better performing too as not all files will need writing back for
incremental builds), unless there is another option that Iâm not aware ofâŠ
Thanks,
S.
Steve,
How many files are you processing? 500MB is very large for a sconsign.
Could you try mv .sconsign .sconsign.save, let your build run and see
how big the new file is?
-Bill
On Tue, Nov 28, 2017 at 7:17 AM, Hill, Steve (FP COM) <
All,
We are using 32-bit Python 2.7.12 and SCons 2.5.1 on Windows (and
Linux). We have now reached the case where the database is in excess of
500MB. During the build, the memory usage bubbles along at just over 1GB
but, at the end of the run we sometimes get a spike in memory usage that
causes an "Out of memory" exception and the build fails.
Looking at the code, it appears that the database file is produced by
pickling into memory before writing to the file, hence the memory usage
increases by over 0.5GB as it is being written back - and, depending on
the database size, this can blow the memory space of a 32-bit process.
Firstly, I do have a medium term plan to move us to 64-bit Python with
the move to Python 3 but we still need to stick with 32-bit Python at the
moment (for example, we have to load 32-bit DLLs) so I need to fix this for
32-bit Python in the short term.
Secondly, I've had a look at the old (?) .sconsign per directory
approach, which would seem to address this problem. I assume that this is
still supported and should be as robust as the single file approach?
The only issue with this is that some of the .sconsigns appear in the
directories beside the source code, which breaches the contract for our
build system, where no file may be produced within the directories
containing source code. Where we use VariantDirs (for the actual
implementation files), there is no problem but, for header files, the
.sconsign ends up in the include or interface directory. I can see how I
can reasonably easily monkey patch the DirFile class in SConsign.py to
override the directory in which the .sconsign resides but, before I do
this, I wanted to check that there is no easier way to achieve the builds
without a memory issue.
Thanks,
Steve.
_______________________________________________
Scons-users mailing list
https://pairlist4.pair.net/mailman/listinfo/scons-users
_______________________________________________
Scons-users mailing list
https://pairlist4.pair.net/mailman/listinfo/scons-users

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
_______________________________________________
Scons-users mailing list
https://pairlist4.pair.net/mailman/listinfo/scons-users