[Scons-users] scons cache in a CI scenario

Discussion:

Mats Wichmann

2018-10-29 17:04:56 UTC

Recent discussions about the scons cache, which I've never made use of
in the project that brought me over here to scons, got me to thinking...

A continuous integration setup could potentially benefit from the cache
- if there was a way to share it appropriately.

As far as I can see, a cache is a local entity - you give it a directory
name, which could potentially be shared (NFS, etc.), but still depends
on having a standard pathname.

For builds in a CI system, instances of the builder are constantly spun
up, used, and discarded, so you don't have persistence - in fact lack of
persistence is usually considered a benefit, just like scons's own
reproducible build concept of ignoring the shell environment.

Our builders - there are a bunch, for several different target systems,
do very similar things over and over typically with only very small
changes - you submit a patch to our Gerrit instance, it kicks off a
dozen builds, something goes wrong, you find the bug in one file and
update the patch and the process starts again, beginning with
re-provisioning the builders. But the similarity of most builds to
others that have gone before seem like a profitable candidate for
fetching unaffected build results from a cache.

Any thoughts on how one could implement that, assuming I've not
misunderstood what already exists? "Repository" is something different,
right? It seems more oriented to sources, but the doc says it covers
derived files as well. And looks like it also takes a "directory",
rather than a more universal "URL".

Andrew C. Morrow

2018-10-29 17:19:37 UTC

Permalink

My company has done this by using Amazon EFS <https://aws.amazon.com/efs/>
to back the SCons cache for the parts of our CI system that run in AWS, and
the results have been excellent.

We create a new cache directory on EFS for each new build image version
that we run, so that changes to the build image that may not be visible to
SCons are not a concern. We discussed at one point making it so that the
SCons cache could be backed by something other than a filesystem, but EFS
served well enough for us that we never pursued that further. It is also
important to consider how you will prune the cache, or identify old caches
that are no longer in active use, so that the EFS footprint doesn't grow
without bound. We wrote scripts for that which evaluate the most recent
access time of the files (and we had to make small modification to SCons to
get the timestamps right) and only retain the most recently accessed files
in the cache.

Overall, It is definitely a project worth pursuing to reduce CI build
times: it isn't trivial to set up, but the build time savings were for us
very real.

Thanks,
Andrew

Post by Mats Wichmann
Recent discussions about the scons cache, which I've never made use of
in the project that brought me over here to scons, got me to thinking...
A continuous integration setup could potentially benefit from the cache
- if there was a way to share it appropriately.
As far as I can see, a cache is a local entity - you give it a directory
name, which could potentially be shared (NFS, etc.), but still depends
on having a standard pathname.
For builds in a CI system, instances of the builder are constantly spun
up, used, and discarded, so you don't have persistence - in fact lack of
persistence is usually considered a benefit, just like scons's own
reproducible build concept of ignoring the shell environment.
Our builders - there are a bunch, for several different target systems,
do very similar things over and over typically with only very small
changes - you submit a patch to our Gerrit instance, it kicks off a
dozen builds, something goes wrong, you find the bug in one file and
update the patch and the process starts again, beginning with
re-provisioning the builders. But the similarity of most builds to
others that have gone before seem like a profitable candidate for
fetching unaffected build results from a cache.
Any thoughts on how one could implement that, assuming I've not
misunderstood what already exists? "Repository" is something different,
right? It seems more oriented to sources, but the doc says it covers
derived files as well. And looks like it also takes a "directory",
rather than a more universal "URL".
_______________________________________________
Scons-users mailing list
https://pairlist4.pair.net/mailman/listinfo/scons-users

Mats Wichmann

2018-10-31 15:35:30 UTC

Permalink

Post by Andrew C. Morrow
My company has done this by using Amazon EFS <https://aws.amazon.com/efs/>
to back the SCons cache for the parts of our CI system that run in AWS, and
the results have been excellent.
We create a new cache directory on EFS for each new build image version
that we run, so that changes to the build image that may not be visible to
SCons are not a concern. We discussed at one point making it so that the
SCons cache could be backed by something other than a filesystem, but EFS
served well enough for us that we never pursued that further. It is also
important to consider how you will prune the cache, or identify old caches
that are no longer in active use, so that the EFS footprint doesn't grow
without bound. We wrote scripts for that which evaluate the most recent
access time of the files (and we had to make small modification to SCons to
get the timestamps right) and only retain the most recently accessed files
in the cache.
Overall, It is definitely a project worth pursuing to reduce CI build
times: it isn't trivial to set up, but the build time savings were for us
very real.
Thanks,
Andrew

I didn't respond and meant to: thanks for this info! I believe in our
case (I checked with the host of our infra) it's possible to set
something up in OpenStack that will appear as part of the filesystem of
the instances.

I got some surprising pushback when bringing up this possibility in the
project - I brought it up because frankly I'm tired of spending two
hours or more waiting for an answer from the CI system... something that
isn't really important to the end product, but is important to the
developer experience. I'll just paste the relevant bits instead of
paraphrasing:

=========
a cached build is not a good idea.

Number one reason for not using a cached build. "Reliability".
- Case 1: A problem is solved just by doing a clean build.
- Case 2: A problem is hidden because a left over build product from
the cached build items.

A false failure like Case 1 where the build fails but can be fixed with
a clean build is annoying but not a critical problem as long as there is
a way to force a clean build. The big problem is Case 2 where it passes
because of picking up something from the cached build. The error will,
likely, not be caught till it is merged. Both problems I agree are
unusual but they still happen.

I am all for speeding up the build but I suspect adding cached build
would cause more problems than its worth.
=========

I still don't fully grok how the various signatures are generated, which
I presume is the answer to the reliability question. I presume it would
be really hard to get a "wrong answer" in the scons scenario?

I have also worried about what Andrew mentioned - not as a "stopper" but
just something that needs doing - how do you keep the cache fresh and
not just growing over time with eventually unneeded artifacts, etc.

Other (cough) builsystems seem to have paid a lot of attention to
sharing build artifacts through caching. In particular:

https://docs.bazel.build/versions/master/remote-caching.html

Scons does not seem to have any concept of a networked cache - it would
work if you had the network connected to the local filesystem or the
schemes already mentioned in this thread, but to the viewpoint of SCons
that's all local. Does thinking about a network cache make any sense?
I'd worry that time-deciding-if-we're-going-to-use followed by
time-actually-fetching may often be more than just building locally,
esp. if you follow a programming model that leaves you with relatively
small source files. I'm happy to stop thinking about it if we know it's
not a particularly profitable model :)

"Problem solved by doing a clean build" - I presume that would be giving
the --cache-disable flag, since otherwise the files "come back" after a
clean, right?

Andrew C. Morrow

2018-10-31 18:52:11 UTC

Permalink

Thanks for writing back. A few quick things to consider that may help sway
the argument:

- The whole point of using SCons is that its builds are supposed to be 100%
reproducible. If you have a non-reproducible build, then you have a flaw in
your implementation of SCons (or there is an SCons bug). The most common
sort of flaw is failing to declare dependencies. This is especially common
in uses of Command.

- However, the concerns about incorrect builds are fair. Bugs, whether in
SCons or your deployment of it, are a fact of life. One mitigation is to
only use caching for builds that will not go out the door. Our release
builds do not interact with the cache at all. Only developer "patch" builds
and regular commits do. In practice, we have not encountered any adverse
consequences of using the cache, but release bits are simply to precious to
chance it.

- Along with cache pruning work, we have implemented a global kill switch
for the cache, so that if something does go badly wrong, we can quickly
disable it everywhere it is used. We have yet to use it.

- Direct network support for CacheDir in SCons is probably a good idea,
especially if there is skepticism about the semantics of NFS or CIFS. The
first step would be to make the cache dir implementation pluggable and
abstract away the FS operations. Then write a different storage backend for
it (S3?).

- Pruning the cache correctly does take some thought. One way to avoid
bloating the cache is to avoid putting statically linked artifacts into it,
as these tend to be large. Dynamic libraries or things linked to dynamic
libraries should be fine. It is easy to add an optional emitter to disable
the cache for certain types of targets
<https://github.com/mongodb/mongo/blob/master/SConstruct#L1271-L1290>.

- Performance is of course a concern. You need to do your own research and
experiments to determine if it is a win in your build.

Overall, you need to treat moving to a networked/shared cache as a project,
not just something to turn on. Do a proper design, build consensus, have an
ops plan, etc.

Here is a pretty picture of what happened to our CI compile times after
deploying the EFS backed cache:

[image: image.png]

Hope that helps.

Thanks,
Andrew

Post by Andrew C. Morrow

Post by Andrew C. Morrow
My company has done this by using Amazon EFS <

https://aws.amazon.com/efs/>

Post by Andrew C. Morrow
to back the SCons cache for the parts of our CI system that run in AWS,

and

Post by Andrew C. Morrow
the results have been excellent.
We create a new cache directory on EFS for each new build image version
that we run, so that changes to the build image that may not be visible

Post by Andrew C. Morrow
SCons are not a concern. We discussed at one point making it so that the
SCons cache could be backed by something other than a filesystem, but EFS
served well enough for us that we never pursued that further. It is also
important to consider how you will prune the cache, or identify old

caches

Post by Andrew C. Morrow
that are no longer in active use, so that the EFS footprint doesn't grow
without bound. We wrote scripts for that which evaluate the most recent
access time of the files (and we had to make small modification to SCons

Post by Andrew C. Morrow
get the timestamps right) and only retain the most recently accessed

files

Post by Andrew C. Morrow
in the cache.
Overall, It is definitely a project worth pursuing to reduce CI build
times: it isn't trivial to set up, but the build time savings were for us
very real.
Thanks,
Andrew

I didn't respond and meant to: thanks for this info! I believe in our
case (I checked with the host of our infra) it's possible to set
something up in OpenStack that will appear as part of the filesystem of
the instances.
I got some surprising pushback when bringing up this possibility in the
project - I brought it up because frankly I'm tired of spending two
hours or more waiting for an answer from the CI system... something that
isn't really important to the end product, but is important to the
developer experience. I'll just paste the relevant bits instead of
=========
a cached build is not a good idea.
Number one reason for not using a cached build. "Reliability".
- Case 1: A problem is solved just by doing a clean build.
- Case 2: A problem is hidden because a left over build product from
the cached build items.
A false failure like Case 1 where the build fails but can be fixed with
a clean build is annoying but not a critical problem as long as there is
a way to force a clean build. The big problem is Case 2 where it passes
because of picking up something from the cached build. The error will,
likely, not be caught till it is merged. Both problems I agree are
unusual but they still happen.
I am all for speeding up the build but I suspect adding cached build
would cause more problems than its worth.
=========
I still don't fully grok how the various signatures are generated, which
I presume is the answer to the reliability question. I presume it would
be really hard to get a "wrong answer" in the scons scenario?
I have also worried about what Andrew mentioned - not as a "stopper" but
just something that needs doing - how do you keep the cache fresh and
not just growing over time with eventually unneeded artifacts, etc.
Other (cough) builsystems seem to have paid a lot of attention to
https://docs.bazel.build/versions/master/remote-caching.html
Scons does not seem to have any concept of a networked cache - it would
work if you had the network connected to the local filesystem or the
schemes already mentioned in this thread, but to the viewpoint of SCons
that's all local. Does thinking about a network cache make any sense?
I'd worry that time-deciding-if-we're-going-to-use followed by
time-actually-fetching may often be more than just building locally,
esp. if you follow a programming model that leaves you with relatively
small source files. I'm happy to stop thinking about it if we know it's
not a particularly profitable model :)
"Problem solved by doing a clean build" - I presume that would be giving
the --cache-disable flag, since otherwise the files "come back" after a
clean, right?
_______________________________________________
Scons-users mailing list
https://pairlist4.pair.net/mailman/listinfo/scons-users

Bill Deegan

2018-11-01 02:15:20 UTC

Permalink

Copying from cache vs building is hard to answer in general as you say.
Depends on speed of filesystem( SSD or spinning disk or network), and also
speed of compiles.

In theory you should never get a bad build where something doesn't rebuild
which doesn't.
At this point I'm not aware of any outstanding bugs which lead to this.
There are outstanding bugs where something rebuilds when not necessary.

SCons's philosophy is to never allow incorrect builds. Secondarily, avoid
rebuilding when not necessary.

In general using cache is a win for most builds, especially where you have
a CI build doing nightlies your developer builds can pull from when
rebuilding. (Common use model).

Also, most don't use cached builds for production, but mainly for developer
builds.

Yes you can disable and enable using the cache via command line flags.

-Bill

Post by Andrew C. Morrow

Post by Andrew C. Morrow
My company has done this by using Amazon EFS <

https://aws.amazon.com/efs/>

Post by Andrew C. Morrow
to back the SCons cache for the parts of our CI system that run in AWS,

and

caches

Post by Andrew C. Morrow
get the timestamps right) and only retain the most recently accessed

files

I didn't respond and meant to: thanks for this info! I believe in our
case (I checked with the host of our infra) it's possible to set
something up in OpenStack that will appear as part of the filesystem of
the instances.
I got some surprising pushback when bringing up this possibility in the
project - I brought it up because frankly I'm tired of spending two
hours or more waiting for an answer from the CI system... something that
isn't really important to the end product, but is important to the
developer experience. I'll just paste the relevant bits instead of
=========
a cached build is not a good idea.
Number one reason for not using a cached build. "Reliability".
- Case 1: A problem is solved just by doing a clean build.
- Case 2: A problem is hidden because a left over build product from
the cached build items.
A false failure like Case 1 where the build fails but can be fixed with
a clean build is annoying but not a critical problem as long as there is
a way to force a clean build. The big problem is Case 2 where it passes
because of picking up something from the cached build. The error will,
likely, not be caught till it is merged. Both problems I agree are
unusual but they still happen.
I am all for speeding up the build but I suspect adding cached build
would cause more problems than its worth.
=========
I still don't fully grok how the various signatures are generated, which
I presume is the answer to the reliability question. I presume it would
be really hard to get a "wrong answer" in the scons scenario?
I have also worried about what Andrew mentioned - not as a "stopper" but
just something that needs doing - how do you keep the cache fresh and
not just growing over time with eventually unneeded artifacts, etc.
Other (cough) builsystems seem to have paid a lot of attention to
https://docs.bazel.build/versions/master/remote-caching.html
Scons does not seem to have any concept of a networked cache - it would
work if you had the network connected to the local filesystem or the
schemes already mentioned in this thread, but to the viewpoint of SCons
that's all local. Does thinking about a network cache make any sense?
I'd worry that time-deciding-if-we're-going-to-use followed by
time-actually-fetching may often be more than just building locally,
esp. if you follow a programming model that leaves you with relatively
small source files. I'm happy to stop thinking about it if we know it's
not a particularly profitable model :)
"Problem solved by doing a clean build" - I presume that would be giving
the --cache-disable flag, since otherwise the files "come back" after a
clean, right?
_______________________________________________
Scons-users mailing list
https://pairlist4.pair.net/mailman/listinfo/scons-users